Project | Kestrel Computer Project

« Back to project details Sort by:

MISC Core Already Sports 8 Instructions
08/12/2016 at 14:49 • 1 comment
NOP, LIT8, LIT16, LIT32, SBM, SHM, SWM, and SDM are implemented.
NOP is the no-operation opcode, of course.
LITx loads an x-bit literal onto the data stack.
SxM stores a byte, half-word, word, for double-word value to memory.
In a mere five hours of hacking, I successfully got further with the MISC core than with several months of attempted hacking on the RISC-V core. Note that my relative time commitment for both projects is the same.
Once the MISC core is complete, I have one of two options:
- I can write a software RISC-V emulator on top of the MISC core (as I was threatening to do last log entry), or,
- I can implement a RISC-V instruction predecode stage, as I was threatening to do back in this log entry.
Which route I take remains unclear at the moment, but either way, we will see RISC-V compatibility. A lot of factors will enter into the decision: how easily a predecoder can be written, how much effort it takes to write a new MISC assemble, etc.
Right now, the MISC core is architecturally quite similar to the S16X4 family of processors used in the Kestrel-2: three data stack registers, no return stack. However, this will eventually change to 8-deep data and return stacks, with their corresponding stack permutation instructions, thus making it a true Forth-capable CPU. I'll implement those features only when I have a demonstrated need, though. Right now, my focus is making it even minimally Turing complete.
Polaris CPU is Dead. Long Live MISC.
08/12/2016 at 02:11 • 1 comment

After several attempts at writing my own RISC-V CPU implementation in Verilog, I'm officially giving up. I've utterly had it.
Instead of showing up to the RISC-V workshop completely empty-handed, though, I've decided instead to implement a 64-bit MISC-architecture (e.g., Forth-language) CPU; this is something I've succeeded in building in the past (albeit only 16-bits wide). On top of this core will sit a 64-bit RISC-V emulator.
The performance hit will be absolutely immense. I estimate, instead of 6 MIPS, I'll get at best 1 MIPS. The computer will, in all likeliness, be unusable for practical hacking purposes at least until I can implement a proper standards-compliant, 64-bit RISC-V core for it.
To have to admit defeat like this makes me feel physically ill. I know that, only 10 years ago, I probably would have/could have succeeded. Today, it seems my mental faculties just cannot handle the complexity behind even what should be a simple, trivial even, RISC CPU.
Kestrel Project Continues as Planned.
07/26/2016 at 14:14 • 0 comments
After having a bit of existential crisis after the announcement of SiFive's own offerings of computers, several influential followers of my work on other social media outlets reminded me of several important facts:
- There's still no 64-bit home-brew computer in the $100-$200 price range. As mentioned in my previous article, you'll need their $3500 dev board to play with their 64-bit offering.
- There's still no readily-available source code for a 64-bit RISC-V processor. You have Rocket and BOOM offerings, but these are written using Chisel, and therefore depend on extremely fickle tooling and are written in an unfamiliar language. (To be fair, I've witnessed Chisel in action and it's pretty nice; but, point taken.) Further, they're ASIC-optimized, and not designed for even reasonably-sized FPGAs.
- There's still no source code to the integrated computer design as a whole, at least none which my followers have found.
- SiFive's platform guides are glossy, but doesn't say much. Kestrel-3's User Guide is incomplete, but word-for-word is comparatively more informative.
- SiFive's financial viability remains unclear, as their business model similarly remains unproven. In fact, they subject SiFive to the same level of concern that RISC-V uses to justify the existence of the RISC-V instruction set to begin with.
- My latest efforts are aiming to be small enough and simple enough to fit on the iCE40HX-series of FPGAs, which are the hottest FPGAs in the open source community today thanks to Project Icestorm. SiFive's offerings almost certainly won't fit on anything smaller than a Xilinx Artix.
- All of my work standardizes on the Wishbone Bus, established by a disinterested third party that, strictly speaking, doesn't exist anymore. This is a standard which is now maintained by the public for the public's benefit, at no cost to anyone. SiFive's work standardizes on interconnects set in stone by a deeply interested competitive party which is known throughout the industry for their proclivities towards legal enforcement.
It is with this recognition and support that I've decided to continue with the Kestrel Computer Project as originally outlined on my website, including development of hardware, software, and documentation. My work still contributes to the community as a whole, and fills a niche nobody else does.
SiFive's Computers
07/26/2016 at 00:45 • 0 comments
I'm going to be blunt. Sifive released a bunch of computer designs around their U500 and E300 RISC-V cores. Should I even bother continuing with the Kestrel project? Does anyone even care?
So, Sifive, Inc. released a handful of development platforms, both of which have FPGA bitstreams available which turns them into full-blown computers. It is said that the platform specification for these are available to the public under open license terms (presumably also BSD licensed, as the RISC-V ISA itself is). The Freedom platform can be had for a paltry $3500 (after rounding up to two significant digits); however, the lower-level Everywhere platform can be had for $130-ish. Before taxes and shipping, of course.
On the one hand, I'm quite happy that these alternatives exist. RISC-V needed a standard development platform for some time now. However, I also feel kind of betrayed a little bit. I mean, I've been working on Kestrel for years now. My project was not a secret; and, nobody thought to contribute.
Nobody.
Not one person from anywhere else in the RISC-V community.
Now that two options exist for RISC-V computing which out-classes the Kestrel in every benchmark you can think of, I just want to gauge who is really interested in the Kestrel now.
And, I'm not talking about the kind of interest that involves lurkers who watch over my shoulder just to monitor progress. Not that there's anything wrong with such folks; it's how some people learn, and let's face it, we've all been there at one point in our lives. I've answered many questions from such folks, and I enjoy the technical discussions that this brings up.
I'm talking about folks who are interested in the Kestrel for its own sake: those who are interested enough to want to own one for themselves, who motivated me to bring my project to this very site even. I'm talking about the folks who expressed to my directly that they wanted to purchase a supported FPGA dev board and program them with a bitstream from me, or who wanted a kit made available on Tindie or something. Remember what you would get though:
- Initial CPU performance levels at 6.25 MIPS.
- Fairly small amount of RAM; between 8MB and 16MB most likely.
- Bespoke and hackable video, GPIO, and PS/2 keyboard/mouse interfaces. Which, if I do say so myself, feel substantially more convenient to program and use than anything "COTS" I've used from the PC world (which Sifive's computers almost certainly will use), since I integrate my peripherals with Kestrel coding in mind.
At no time did I give any illusions about what the Kestrel-3 platform would entail coming out of the gate. I clearly described its CPU as a kind of 64-bit 6502-like CPU, while I described the computer as a whole as roughly matching an Amiga 500 or Atari ST 520 in execution performance. Future revisions of the platform were to bring enhanced performance or capability; e.g., faster CPU to deliver closer to 80 MIPS performance until dedicated silicon came out, RapidIO-inspired (if not -compatible) interconnects, etc. The goal was to incrementally evolve the design to something that would be truly useable on the desktop.
If nobody has any interest in it, then I will just cancel the Backbone development project and not bother putting up a Kestrel kit on Tindie or wherever; I'll just stick with my Digilent Nexys 2, maybe acquire another FPGA board in the future and make it run there as well (since the Nexys 2 isn't sold anymore). I'll just focus on developing the Kestrel for myself by myself (which is how the project started to begin with, and continues to operate), which implies my documentation efforts (the Kestrel-3 User's Guide, for instance) will basically come to a stand-still. I'll forego any ambition to ever run BSD or Linux on it, I'll just keep it running Forth, Oberon (eventually), and/or perhaps a port of Shen OS (where OS == Open Source, in this case. Shen is not an operating system). The CPU will likely remain at 6 MIPS performance until I grow tired of it, but I seriously doubt it'll exceed 20 MIPS long term, as it will probably remain a soft-core CPU for a long, long time. The Kestrel-3 platform is definitely incompatible with the Sifive platform specification as both stand, so don't expect binary interop with that computer. We're talking about computers that are as different from one another as an S-100 bus based computer is from a TRS-80 model I.
If people do express an interest, though, things will change slightly. First and foremost, I'll design the video and other cores with upward mobility features; by which, I mean, future support for eventual design goals like the use of SDRAM instead of asynchronous RAM. (You'll be amazed at how much this changes the ideal register set.) I will research the use of RapidIO or comparable interconnect technologies when the time is right. After the initial CPU is released, I have plans for adding user- and supervisor-modes to it, so that memory protection features can be incorporated into the design. This would allow it to run Linux, if someone (not me) decided to port it. Plan 9 from Bell Labs was on the table as of last year, for instance.
So, let me know what you think I should do. This may well be the last time I ask this question.
Nexys-2 Back in Business!
07/16/2016 at 14:12 • 0 comments
I realized not long ago that the next RISC-V workshop is in November, which by my estimates, leaves me only three months to get a working prototype ready and running. That'd then give me maybe a month or so to get *software* running, or at least partially running enough to show development progress.
With my recent spat of issues, I lamented this was not possible to accomplish with my current capacity. This, of course, assumed I couldn't get WebPack ISE working so as to just use my Nexys-2 in the interim.
Well, late last night, while searching for something else on the Internet (as it happens), I found a site which described the very problem I was having. While his work-around ultimately differed from mine, we had one thing in common: a missing libQt_Network.so file. So after restoring this file and playing around with the command-line tools for a bit, I finally managed to get ISE's license management working, and now it looks like I'm back in business with the Nexys-2!!
Maybe there's still hope to see a Kestrel-3 live in the flesh yet. Here's hoping.
Steps I had to take:
1. Grab a copy of libQt_Network.zip from Xilinx' knowledge base and install the appropriate libraries contain therein in the appropriate places in /opt/Xilinx.
2. ISE at least now prints errors when trying to launch firefox for license management. It still could not launch Firefox on its own because it was not able to find libstdc++ 3.4.9 and 3.4.20 (why it would need both is beyond me). So, I manually created and downloaded a license.
3. I installed the license in $HOME/.Xilinx/Xilinx.lic . At this point, most tools worked. Well, almost. One last step I needed to do.
4. I found out that I also needed to source /opt/xilinx/14.7/ISE_DS/settings64.sh before everything started working together seamlessly.
Still haven't tested downloading a bitstream to the Nexys-2 just yet. I hope to be able to do that sometime soon.
E and the eForth images updated.
07/13/2016 at 20:48 • 0 comments

After completing my morning work-related tasks, I decided to grab some lunch and hack on the RISC-V emulator and the software that runs on top of it to conform with the Privileged ISA Specifications, V1.9. I'm happy to report that the new code is up and merged into master.
This patch also updates the assembler as well, supporting the new opcode mappings.
The (still woefully incomplete) Oberon 2016 port that I was working on in another repository has not yet been updated to conform, so it's highly probable that the binaries generated through that compiler will fail to run on the latest emulator. (Sorry; lunch time is over.)
Time flies when you accomplish nothing of value.
07/13/2016 at 16:00 • 0 comments
Well, it's July, and the 4th RISC-V workshop is well underway. I could not attend as it conflicted with my planned vacation, but it is a constant reminder that I really, really wanted to have working hardware by at least November. I can honestly say, that's not going to happen. Not even close.
Thanks to a total disaster of a chain of unfortunate events relating to bringing up the Kestrel-3 in hardware, I'm basically left in a position where I was when I first started hacking the Kestrel-2 years ago: zilch. I'm literally starting the project, essentially, over from scratch. In the process, I'm building everything up from scratch.
Here's a recap of my misfortune:
1. Xilinx Webpack ISE refuses to license on my platform. Thus, I cannot use my fully functional Digilent Nexys-2 board as a development platform. Ironic, really; the last thing I flashed it with was the bitstream for the Kestrel-2, so in essence, it's now stuck as a Kestrel-2. (And, no, I cannot run Vivado; this product does not support the XC3S1000E).
2. I couldn't even get to the point where I could download Linux compatible Altera IDE software from their website, so that rules out using Altera-based products. Plus, depending on the website you visit, the programmer for these devices are at least as much as the dev boards themselves. No thanks.
3. So that leaves only iCE40 devices and the Yosys toolchain (which works spectacularly, I might add). No devices using this FPGA even compares, flatly, with the most minimal FPGA-based-computer dev boards on the market (even the Oberon Station!), so I was left trying to build my own. Inspired by the RC2014 computer, I started designing a custom backplane and set of CPU and I/O boards that meets my needs. I placed an order for the backplane. When I got it, that order was severely botched, and I need to basically redesign the backside of the board, due to Gerber-rendering bugs on the part of their fabricator. Still a bit miffed that *I* have to compensate for this, but whatever. I've been meaning to do this for a week or two now, but goddammit, life just won't let me! (Sorry, OSH! It's coming, I promise!)
4. To serve as a stop-gap interim solution, I've placed an order for a icoBoard 1.x with 1MB SRAM installed. However, I haven't heard back from my contact there yet. However, even if this board arrives, it will still lack all the components I need to make a functional Kestrel-3 computer with. A single iCE40HX-8K will not fit everything I need to make a Kestrel-3 -- that much is certain. It also won't have any flash ROM to run eForth out of, and the RAM is 1/16th that in the emulator, so I'll need to create custom images just to run on this restricted board. At an estimated cost of 120 EUR, it's quite expensive for me. Recall, I wanted the total cost of Backbone, plus three FPGA plug-in cards for it, to not exceed US$200. So, yes, getting this board is an act of desperation for me.
5. The CPU microarchitecture is, I'm confident, now sized appropriately for the iCE40HX-4K FPGAs, although at the expense of it never being portable to wider bus widths. However, I've been pretty confident about things before, so we'll see where this new, hard-wired microsequencer takes us. With the register file taken out of logic cells, this leaves the ALU and supporting circuitry to contend with. I predict I'll need to dedicate one entire FPGA just to the CPU. This is fine; it's why I created the Backbone project, to serve as a backplane for connecting several different FPGA circuits together which cooperate and comprise the total Kestrel-3 computer.
The CGIA is currently on-hold, as I work through the CPU. I thought at one point that the CGIA was the highest risk device in the system; only when I discovered how badly I botched the CPU sizing estimate did I realize how wrong I was. Yes, the CGIA will also be a big circuit to fit into an FPGA; I'll probably need to move the line buffers into block RAM as well. However, looking at MGIA components leads me to believe that CGIA won't be nearly as much of a monster as the CPU, so I'm confident working on that after I get some more CPU work invested. I want evidence that the CPU design is realizable and practical before going back to the CGIA.
If worse comes to worse, I can just synthesize an MGIA in the mean-time. The emulator already provides MGIA support. This is perhaps the best decision I've made so far.
The KIA and GPIA will be used as-is. Since the Kestrel-3 will run on a 16-bit wide data path, these do not need modification. I do need to make changes to the emulator and system software though: I got register address offsets wrong. E.g., the KIA addresses appear at $0100000000000000 and $0100000000000001, when the true addresses are $0100000000000000 and $0100000000000002. The e emulator provides a 64-bit GPIA variant, while the real computer will continue to use the 16-bit variant. Whether or not I can play address decoding tricks with the GPIA to make it "look" like a 64-bit equivalent part remains to be seen. I suspect it's possible (unlike with the KIA).
So, yes, I'm in a bit of panic right now, as it looks like I'm going to blow my deadline like an overfilled balloon. :/
CPU PLA terms almost done.
07/12/2016 at 17:17 • 0 comments

Over the weekend, I spent time in LibreCalc to create a set of instruction decoder "PLA" minterms and the desired output enables they imply. I'm only missing CSRR* instructions at this point, but hopefully I can get those implemented soon. These are still in "human" form; I still need to spend the time translating them into equivalent Verilog codes.
I haven't quite figured out how to get CSRs to work in the new microarchitecture yet. I think I can make all CSR instructions run in 4 cycles, same as any other ALU-based instruction. However, CSR instructions are more "complicated"; not nearly as RISCy as I'd like them to be. Every one involves either an exchange or some bit-level mutilation of various fields. For example, CSRRC is equivalent to DEST = CSR{n]; CSR[n] = CSR[n] AND NOT SRC. It's rather annoying, frankly, and it will involve creating a whole new set of buses and consuming resources. Not only that, but unrecognized CSRs need to cause illegal instruction traps, which means I have no choice but to recognize this condition in the first cycle. So, CSR decoding must be fast, and that means it will not generally be optimized for small size.
On the emulator front, I've started to update the "e" Kestrel emulator to use the new Privileged ISA 1.9 specification. I'm rather surprised that eForth still boots without any changes so far. I'm sure I'll need to fix something once everything is settled in though, but as of right now, the new machine-mode semantics seems to be behaving exactly as one would expect.
One change which I know will cause breakage is the old ERET instruction. The bit pattern for ERET matches the new SRET instruction (return from supervisor mode, not the V1.7 "system return" instruction). Since I will not be implementing S-mode, SRET will have to cause an illegal instruction trap (since SPP, SPIE, and SIE bits in MSTATUS are not implemented). Therefore, any software which uses SRET will need to be recompiled/reassembled to use MRET instead. This is a relatively simple matter: a global search and replace ought to cover it. I just need to remember this detail.
Which means I need to update my assembler to include the new mnemonics too. Bleh. Forgot about that. Good thing for Github issues.
Meet the New Boss: Same as the Old Boss
07/08/2016 at 18:46 • 3 comments

I've decided to switch to a hardware sequenced microarchitecture for the Kestrel-3 CPU. It will be built just like the 6502 was: it will have a two-stage pipeline (fetch and execute), both operated under a single micro-sequencer, and the microarchitecture will be very tightly coupled to the bus interface.
The CPU's clock will be 25MHz. The bus clock will be 12.5MHz. Since each instruction is 32-bits wide, and the bus is only 16-bits wide, it takes two bus cycles (at a minimum) to execute an instruction (thus, 4 clocks). These four 25MHz cycles conveniently maps to two register-fetch cycles (allowing me a single-ported register file), an execute cycle, and a write-back cycle. Thus, most (if not all) OP-IMM or OP-REG instructions ought to run in two bus cycles (four 25MHz cycles). 25MHz / 4 = 6.25 MIPS, so best-case, we're preserving my intended level of performance.
Loads and stores will necessarily incur a performance hit, though, since they must compete for access to the bus. Bytes and half-words incur an additional 2 cycles, words an additional 4 cycles, and double-words an additional 8 cycles.
Performance of other CPU features is up in the air, but I think this covers 99% of what anyone would run on the CPU anyway.
ALSO, the new supervisor specification is out, so I think now would be a good time to update the emulator to match the new RISC-V supervisor requirements. I'll resume work on the CPU once that's done.
Polaris CPU Needs Serious Redesign
07/08/2016 at 06:49 • 4 comments

Tonight, I completed the memory stage of the 5-stage pipeline for the Redtail microarchitecture used by Polaris CPU. I then decided, let me run it through Yosys to get a ballpark idea of how many logic cells this thing will use. Pleasantly, I found it needed only 165. Not bad!
Then I figured, "Let's look at the register file."
What I found was that the 32-deep, 64-bit wide, 2-read-1-write register file that you would typically find in just about any RISC consumes 5666 logic cells in an iCE40-class device. This immediately rules out the 4K FPGAs that I was planning on using for the Backbone implementation of the Kestrel-3. By my math, I was banking on it needing around 2500 cells. 5666 is just too much. I'm guessing it needs this many logic cells because the chip doesn't have sufficient routing resources, and Yosys had to resort to using LUTs as wires. Not only that, but for a chip capable of handling 500MHz clocks off its PLLs, the top speed that the register file could work at is only 42MHz (according to the estimate provided by the icetime command). This, alone, means that the Backbone bus clock will need to drop to 25MHz.
Ouch!!

I next figured, let's look at the ALU. This required a gob-smacking 1726 logic cells. Again, I didn't expect it to be ultra-tiny, but this is just too big. That's more than half of the logic cells found in a iCE40HX-4K. And this is just the ALU; this doesn't even consider the magnitude comparators needed in the execute stage for conditional branching support.
Thankfully, instruction fetch takes only 65 logic cells, and decode needs only 212. At least I can be proud of something.
So, the writing is on the wall; if I want a CPU that fits in a 4K device, and I do, then my previous estimates about its performance are right out the window. Gone. Kaput.
I simply must architect it like the 6502, using hardwired state-machines or vertical microcode in place of a 5-stage pipeline. The register file must be single-ported to eliminate as many wires as I can (at the expense of two extra cycles per instruction: one to fetch an operand, and one to write the result back). The ALU and magnitude comparators have to be written at the gate and/or LUT4 level; I cannot trust the Verilog compiler or Yosys to produce the most efficient logic possible. Finally, the entire microarchitecture has to be tailored just for the 16-bit Backbone bus it'll connect to. This may mean I have to shrink the ALU to just 16 bits. Yuck.
I was really, really, really hoping I didn't have to go this route. With a CPU that delivers around 1.5 to 3 MIPS performance (thanks to the lack of a pipeline), the computer will feel about one quarter to one half as fast as a Commodore 64, which I think is unacceptable for a general purpose hacking computer. But, I don't know what else I can do. :( :( :( :( :(

Prev Next