(obsolete)
Let's break even more habits in processor/ISA design!
It started with a discussion on the Minimalist Computing Facebook group. Have a read. Duane Sand as always is very helpful, and others have contributed interesting data points. That group is gold!
In the end it appears that a sweet spot exists for displacement (instruction address relative to the current PC) around 14 bits :
- 12 bits of direct/immediate address bits for the LSB
- the remaining 2 MSB are sign-extended to provide actual relative pointers.
With 11 bit of direct address field, it is possible to address a 8KB instruction cache, 12 bits makes it 16KB. More might not be more useful.
The 2 more bits are required to eventually increment or decrement the PC. This could be 3 eventually... But the aim is to have as many "direct" bits, copied verbatim to the Icache address decoder, without bloating the instruction too much.
It becomes amazing when considering a branch that is not in a Branch Target Buffer.
- Cycle 1 : the instruction register's direct field is brought to the Icache line decoder, a read cycle is initiated. In parallel, the MSB are sign-extended and brought to a simplified and shoter adder.
- Cycle 2 : the (partial) computed PC MSB is brought to, and compared with, the tags that have been read during cycle 1. A cache miss can be detected already.
The nice thing is that there is no need to wait for a complete address to be computed, before reading the cache which only needs about 8 address bits (plus 3 MSB for word selection).
The other nice thing is the added would be only 20 bits wide (in a 32-bit machine and 12 bits of direct address (32-12=20). So the adder should be faster, smaller than a typical CPU. This counts because as I said in the comments on Facebook, "At 5GHz, everything is slow".
The range is a bit weird but that's a good compromise anyway. It can be symmetrised by increasing the MSB portion to 3 bits.
The other thing, which the comments brought, is that this almost defeats the "position independence" of code. Well, the size of the direct field will impose a granularity for relocation. At 12 bits, this makes a 4K instructions "step". But PIC (position independent code) is not seen as a critical feature in F-CPU and Y32 because other features provide the equivalent functionality.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.