I've decided to switch to a hardware sequenced microarchitecture for the Kestrel-3 CPU. It will be built just like the 6502 was: it will have a two-stage pipeline (fetch and execute), both operated under a single micro-sequencer, and the microarchitecture will be very tightly coupled to the bus interface.
The CPU's clock will be 25MHz. The bus clock will be 12.5MHz. Since each instruction is 32-bits wide, and the bus is only 16-bits wide, it takes two bus cycles (at a minimum) to execute an instruction (thus, 4 clocks). These four 25MHz cycles conveniently maps to two register-fetch cycles (allowing me a single-ported register file), an execute cycle, and a write-back cycle. Thus, most (if not all) OP-IMM or OP-REG instructions ought to run in two bus cycles (four 25MHz cycles). 25MHz / 4 = 6.25 MIPS, so best-case, we're preserving my intended level of performance.
Loads and stores will necessarily incur a performance hit, though, since they must compete for access to the bus. Bytes and half-words incur an additional 2 cycles, words an additional 4 cycles, and double-words an additional 8 cycles.
Performance of other CPU features is up in the air, but I think this covers 99% of what anyone would run on the CPU anyway.
ALSO, the new supervisor specification is out, so I think now would be a good time to update the emulator to match the new RISC-V supervisor requirements. I'll resume work on the CPU once that's done.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.
Sounds good to me! Make it work first, then make it work fast. There are any number of microarchitectural improvements you might make, but best to walk before you run.
Are you sure? yes | no
A pipeline approach is (once you figure out how to build it in the correct order) arguably the simpler architecture. There's no state-machine to worry about, and many timing issues become automatic.
The microarchitecture now proposed is actually more complex, and will require more intricate tests to be written. And, it'll be so tightly tied to the bus that widening it later (say, if I upgrade to an 8K device) will require a total redesign of the microarchitecture.
This is why the 65816 is still using an 8-bit bus.
Are you sure? yes | no
So, after a fair bit of work, it looks like I'll have approximately 270 minterms in the "PLA", if my spreadsheet is any indication. I most likely won't implement the decode logic as an actual PLA; I don't think I have enough logic cells for that. I'll need to write equations unique for each output to keep LC costs low.
This doesn't include support for external interrupts, but I don't expect it'll take more than about 10 more minterms to support. It will catch illegal instructions, so no "JAM" instructions exist. I also still need to add minterms for catching misaligned memory accesses, and support for CSRs (CPU Specific Registers; think MSRs on Intel CPUs).
Are you sure? yes | no