I have two updates in one here. The first deals with the relationship of my S64X7 CPU to the iCE40-HX8K and -HX4K parts, and a small side-project that I think will help hasten the pace of development overall (regardless of hardware platform I use).
S64X7 Progress Update
My MISC CPU is essentially done. Only a few tweaks need be applied, but it is not only Turing complete, but synthesis shows it's fully operational as a Forth CPU sans interrupts right now. I'm debating whether or not I even need interrupt support at this level, since it's intended task is to emulate the RV64I instruction set.
The bad thing is, it's still disturbingly large for an iCE40 device, coming in at about 6400 LUTs (rounded) best-case after hand-tweaking. It fits in an 8K device, but only just barely. It definitely won't fit on a 4K device (>3000 LUTs too big), per Lattice's iCE40 family documentation. Some people have said that 4Ks are really rebranded 8Ks, but honestly, a semiconductor vendor doesn't bin parts like that unless there's something wrong internally on the chip. I'll trust iCE40's specifications for now.
The size of the part is still a concern for 8K devices though; with it taking up 92% of the fabric's resources, there's not a whole lot of room for a Wishbone 64-bit-to-16-bit bus bridge. I don't think I can pull it off. Not with my skills, at least.
I think I'm going to give up on the use of iCE40 FPGAs for now. The Kestrel Project seems like it's just too ambitious for such a small FPGA. It would certainly work out if I had implemented a 16-bit CPU (perhaps a clone of the 65816), but 64-bits is just tasking these parts too much.
Side Project: SMG
One of the reasons why I had to start the RISC-V implementation over many times is I'd get to a certain point and then lose intellectual control over the project. I'd run into such dead-ends as getting instruction fetching working, and getting it to time the instruction execution for, say, ADDI X0, X0, 0 working correctly, but then I'd lose track of how everything else fits together. I'd also run into problems programming the state machines by hand in Verilog.
So, I decided to write myself a small tool which I think will help me in this endeavor. I called it SMG, for State Machine Generator. The name isn't indicative of what it actually does, but there's no question that it contributes to that task, and so it's the best name I could think of.
Not only will it help with designing whatever CPU I end up using, but its applicable to other parts of the Kestrel Project as well, since it covers a large number of applications of FPGA-based digital logic designs.
It takes input in a more tabular format, such as this 3-bit Gray counter example:
(do (put next bus-spec "[2:0]") [[on [[ctr 3'b000] ~reset] [next 3'b001] [odd 0]] [on [[ctr 3'b001] ~reset] [next 3'b011] [odd 1]] [on [[ctr 3'b011] ~reset] [next 3'b010] [odd 0]] [on [[ctr 3'b010] ~reset] [next 3'b110] [odd 1]] [on [[ctr 3'b110] ~reset] [next 3'b111] [odd 0]] [on [[ctr 3'b111] ~reset] [next 3'b101] [odd 1]] [on [[ctr 3'b101] ~reset] [next 3'b100] [odd 0]] [on [[ctr 3'b100] ~reset] [next 3'b000] [odd 1]] [on [reset] [next 3'b000]]] )
The result is something like this:
wire R1287 = ( ctr == 3'b000 ) & ~(|reset) ; wire R1288 = ( ctr == 3'b001 ) & ~(|reset) ; wire R1289 = ( ctr == 3'b011 ) & ~(|reset) ; wire R1290 = ( ctr == 3'b010 ) & ~(|reset) ; wire R1291 = ( ctr == 3'b110 ) & ~(|reset) ; wire R1292 = ( ctr == 3'b111 ) & ~(|reset) ; wire R1293 = ( ctr == 3'b101 ) & ~(|reset) ; wire R1294 = ( ctr == 3'b100 ) & ~(|reset) ; wire R1295 = (|reset) ; wire [2:0] out1296 = R1287 ? 3'b001 : 0 ; wire out1297 = R1287 ? 0 : 0 ; wire [2:0] out1298 = R1288 ? 3'b011 : 0 ; wire out1299 = R1288 ? 1 : 0 ; wire [2:0] out1300 = R1289 ? 3'b010 : 0 ; wire out1301 = R1289 ? 0 : 0 ; wire [2:0] out1302 = R1290 ? 3'b110 : 0 ; wire out1303 = R1290 ? 1 : 0 ; wire [2:0] out1304 = R1291 ? 3'b111 : 0 ; wire out1305 = R1291 ? 0 : 0 ; wire [2:0] out1306 = R1292 ? 3'b101 : 0 ; wire out1307 = R1292 ? 1 : 0 ; wire [2:0] out1308 = R1293 ? 3'b100 : 0 ; wire out1309 = R1293 ? 0 : 0 ; wire [2:0] out1310 = R1294 ? 3'b000 : 0 ; wire out1311 = R1294 ? 1 : 0 ; wire [2:0] out1312 = R1295 ? 3'b000 : 0 ; assign odd = out1311|out1309|out1307|out1305|out1303|out1301|out1299|out1297; assign next = out1312|out1310|out1308|out1306|out1304|out1302|out1300|out1298|out1296;
You'll notice that I deliberately avoid using case or casez statements. These statements have two problems which impedes my use of Verilog for building CPUs and other components:
First, it builds its decoders using priority encoder logic. This uses more resources than I intend. A simple AND-based pattern-matcher is what I would strongly prefer, as what you'd find in a PLA circuit.
Second, support for multi-hot firings is severely limited and non-portable. Verilog really, really wants single-hot handling of cases. This, again, is not conducive to reusing already written minterms. (Clearly, the simplistic example I gave above does not illustrate this will at all.) One can see the value of this by studying how the 6502 manages to overlap instruction fetch with instruction execution, and how this interacts with its address mode handling. The CPU has only 3510 transistors, so they must be doing something right.
My hope is that SMG will let me develop Verilog modules which are measurably (if not significantly) smaller.
SMG is still not quite ready for prime-time use, but it's coming along quite nicely. I plan on reworking the instruction decode logic for the S64X7 CPU as a first real-world test. If it passes the current bench tests, I'll be ecstatic and quite hopeful for future core development.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.