Project | Kestrel Computer Project

« Back to project details Sort by:

Multi-Purpose Experimental Serial Transmitter
03/27/2017 at 05:50 • 0 comments

Taking a break from job-hunting and my resume editor project, I wondered if I could do better than the Spacewire/IEEE-1355 when making a serial transmitter. To find out, I created the Experimental Serial Transmitter repository to find out.
This code is not production-grade. It's pretty amateurish, actually. It's probably buggy in certain edge-cases as well.
This transmitter should support between 1 and 64 bit transfers. I know it works between 1 and 63 bits; 64 bits is as-yet unproven and probably buggy. But, that's OK for now; this is just a prototype. Think "hack-day" project.
To use as an EIA-232/422/423/485 transmitter (which shifts data LSB first), you load the TXREG register with a bit pattern like follows:
63 : 10
9
8 : 1 0
11111....1111 1 Data 0
Bit 0 is the start bit, and must be 0 (since TXD idles high). Bits 8:1 comprise the 8-bit word you wish to send. Finally bit 9 is the stop bit, and must be set to 1. Bits 63:10 don't need to be set to anything per se, but it's good practice to set them to 1 just in case.
If you want to add parity, then you'll just stuff the parity bit in bit position 9, and the stop bit in bit 10. Simple.
The BITS parameter tells the engine how many bits to shift out (for 8N1 transmissions, you'll set this to 10. For 8E1 or 8O1, 11. Add one more again for each additional stop bit).
TXBAUD tells it how fast (how many system clocks per bit cell). The TXC output is automatically generated, and the circuit tries hard to maintain 50% duty cycle (regrettably, it cannot do this with odd baud rates, but it comes as close as it can).
As data is shifted out, the value of the RXD input is shifted in at bit 63. For EIA-232 uses, this is almost certainly not useful. It's best to treat this as garbage. However, if you loop TXD back to RXD, you could perhaps use this circuit as a crude 1-bit DAC as well.
To use this circuit as an SPI controller (which typically shifts data MSB first), you use the TXREGR register instead. This register is exactly like TXREG, except the bits are reversed:
0 : 7 8 : 63
Data 00000.....000000
Note that the data you want to send now occupies the highest bits of the register, rather than the lowest. Be sure BITS is set to 8, or whichever is appropriate for the slave device. Note that you'll need a general purpose output to serve as slave-select. XST does not provide this signal on its own.
XST only supports SPI CPHA=1, CPOL=0 (mode 1). I'll play around with the circuit to see if I can also support the other three modes. CPHA=1/CPOL=1 (mode 3) should be trivially easy to support. CPHA=0, however, will require a bit of thought. The Verilog implementation, I think, is a bit too simplistic to support it without larger adjustments to the code.
Tip: If you want to cheaply bit-swap a word, write the value into TXREG, and read back via TXREGR (or vice versa). You'll need a way to disable the transmitter shift register engine, though.
Since RXD (which doubles as MISO) is always shifted into the register at bit 63, after an SPI word is sent, the received data will appear in the lower bits of the TXREGR register.
Credit where it's due: the primary inspiration is the Commodore-Amiga's PAULA chip's UART design.
Mothballing Kestrel Computer Project.
03/07/2017 at 21:13 • 4 comments

Abstract
I’ve been unemployed since November 2016, and Kestrel-3 progress has slowed to a crawl despite all my efforts devoted exclusively towards it. Without small wins, I lose hope and it manifests when attempting to look for a job. Mothballing this project in favor of other projects is the only way forward. I’ll be resurrecting my old attempt at self-employment, RezuRezu, in the hopes that it either helps me land another job soon-ish, or I actually succeed in running my own company.
http://kestrelcomputer.github.io/kestrel/2017/03/07/kestrel-winter
Kestrel-1/3?
03/05/2017 at 06:12 • 0 comments

Before I talk about what I'm doing now, let me talk about what I've done since my last update.
The Remex RX pipeline hasn't changed; it still receives characters and places them into a queue. I still have not yet designed a Wishbone interface for this queue yet. It's coming though.
The TX pipeline remains incomplete. I have the transmitting PHY/serializer, I have parity tracking, and I have the ability to transmit any N-Char or L-Char provided something spoon-feeds it. But, at the moment, I do not yet have a "what do I transmit next?" circuit that functions autonomously. It's designed, and I've written some Verilog for it, but it remains untested. I'm blocking on this, in part, because I'm not sure this is the direction I want to go. There's something nibbling at my gut that says the circuit I've designed is too complex and can be greatly simplified somehow. So, I'm meditating on it before I proceed further. If worse comes to worse, I can hook what I have up to a Wishbone interface, and let the CPU decide what to transmit and when. This will completely break compatibility with Spacewire and IEEE-1355, basically turning the interface into an RS-232 interface with data-strobe signalling. Not what I'd like to do if I can avoid it.
Per my previous post, since the RX and TX pipelines are cross-coupled with each other, and interactions exist both locally and remotely, you can imagine that testing this arrangement is on the more difficult side. Part of me is thinking that this is why IEEE-1355 interfaces have failed in commercial industry. EIA-232, T1, E1, SONET, and several ATM-based interfaces are based on a strictly unidirectional, synchronous or plesiochronous relationship between the bits sent by a transmitter and the bits received by another receiver. No feedback loops exist (at least at the physical and data-link layers), and therefore, are much simpler to test and predictably build hardware for. Because they were designed for time-division multiplexing, frame rates are (more or less) isochronous, and so buffer management is (ostensibly) simpler, since the need for deep buffers doesn't exist as long as you can service the bit-stream fast enough. This is now more appealing to me; however, the only thing stopping me from dropping IEEE-1355 and going back to telco-style, TDM-based protocols is, frankly, not knowing how to solve the auto-baud problem. So, IEEE-1355 it is for now.
So, what am I up to now? Honestly, trying for a small victory. My goal is, in essence, to reproduce the Kestrel-1 in the icoBoard that I've received. My plan is to embed a KCP53000 CPU and all the necessary bridging to a 16-bit Wishbone bus, couple it to the on-board SRAM chip, a 256 byte bootstrap ROM, and one GPIA core to provide general purpose I/O. The goal is to blink some LEDs under CPU control. That's it.
Unfortunately, I have no idea what the CPU core's timing is like, since the icotime utility reports a timing loop somewhere. Since this isn't necessarily a problem in practice, I'm planning on starting the CPU off at 1MHz, and ramping the clock up from there using a binary search to quickly determine, empirically, its maximum clock speed. I figure, at 1MHz, it will run at around 100,000 instructions per second, and should be plenty slow enough for the core to boot up. I doubt I'll be able to get the core running at 25MHz like on the Xilinx FPGA, but we'll see how well it fares. If it fares at all.
I'm hoping this works, for if I can't get something this simple working in a reasonable amount of time with a reasonable amount of effort, I see no further reason to continue to work on this project.
On IEEE-1355 vs UARTs
03/01/2017 at 18:13 • 0 comments

I think I know why IEEE-1355 didn't take off. While cores for this interconnect are quite small (truly, about on par with EIA-232 UARTs with similarly sized FIFOs), they're not necessarily as easy to test as EIA-232. EIA-232 links are just about as simple as SPI, when push comes to shove: you have a dumb transmitter that isochronously sends out bits. It doesn't care what those bits are. You have a dumb receiver that plesiochronously attempts to sample bits. As long as the transmitting and receiving clocks are relatively synchronous with each other (the error is small enough), everything works and you get a reliable serial communications stream. The receiver's higher layers ultimately are responsible for packet framing. These two components, the receiver and transmitter, are otherwise 100% isolated from each other. That makes them easier to both validate and verify.
IEEE-1355 has separate TX and RX pipelines just like EIA-232; but, they're cross-coupled, and that means they interact. A feedback loop implicitly exists, which makes validation and verification a much more complicated affair. Transmitter A has a credit counter which is replenished by receiver B, while transmitter B's credit counter is replenished by receiver A. It does this through (preferably) hardware-scheduled transmission of FCT tokens.
Part of me wonders if I should have just stuck with an E1, ATM, or SONET-inspired frame structure, relying on scrambling to help ensure synchronization between TX and RX components. It sure seems like it would produce simpler hardware, be easier to test, and be easier to document as well. The problem remains of how to maintain synchrony between the transmitter and receiver after negotiating a higher throughput. Even at relatively modest speeds, the FT-232 chip on my Arduino Uno loses framing with my (then) host PC's serial port, apparently due to differing baud rate base frequencies.
Remex RX Pipeline Update
02/16/2017 at 16:42 • 0 comments

RX Pipeline
I managed to implement a Remex receive pipeline which I'm happy with. It's capable of supporting arbitrary bit rates up to RxClk(Hz)/4 bits per second throughputs safely, although you can probably push it to RxClk/3 bits per second. It deposits all data characters (all N-chars and EOP and EEP characters) into an 8-deep 9-bit FIFO. The FIFO has a (very!) degenerate Wishbone B4 interface on it, so it should be quite easy to couple to a Wishbone B3 or B4 interface later on.
Because of the high peak throughputs on the Remex interconnect combined with a very shallow FIFO, traffic over the interconnect will "stutter" quite frequently, consisting of bursts of activity separated by intervals of idle activity. I expect real-world throughput to not be that fast until I deepen the FIFO and/or attach a DMA interface to the pipeline. Both are planned, but I need to make sure I have enough room for them first!
TX Pipeline
My next set of tasks includes getting the transmit pipeline working. TxClk will be derived from RxClk using a programmable down-counter. This lets the host control transmit data rate with about as much control as you'd typically find in a UART. I'm still trying to figure out overall architecture of the TX pipeline.
Miscellaneous
I should note that the RX pipeline, having only an 8-deep queue, consumes around 300 logic cells in the iCE40 parts. I'm guessing that the TX pipeline will take up about as much space, but I won't know until its done. I have no estimate for the Wishbone bus interface yet. This already means I cannot implement a lot of independent channels, so I'll probably restrict myself to just 3 or 4. It could be as small as 2.
(EDIT: Through a conversation I had on IRC shortly after posting this article, I was referred to this paper which suggests a reasonable implementation size for a complete SpaceWire interface comes to around 460 LUTs. I think it's reasonable, then, to speculate my implementation will weigh in around 600 LUTs, accounting for my relative lack of experience with FPGA design engineering. Further, the same paper suggests a maximum RX throughput of RxClk*2/3, rather than RxClk/3. Exciting!)
Pragmatically, it's not be as bad as it sounds; yes, it cramps my style, but we must remember that IEEE-1355 is designed to be a packet switched protocol. This means all packets have a (possibly source-routed) destination address field as the first n bytes of a frame. Thus, we can still support a large number of expansions by making use of switches. I was hoping to avoid having to do things this way especially at first, but having a smaller number of channels than planned is not a deal-breaker for me. Even one channel is, while inconvenient, still viable.
Remex I/O Channels
02/05/2017 at 08:02 • 9 comments

IBM mainframes have some pretty nice names for their channel architectures. The original, of course, simply is known as "channels." But, when they needed higher performance, IBM released something called ESCON. Later, when that wasn't enough, they released a fiber-optic and substantially faster version called FICON.
As you might guess, I'm not particularly interested in being sued by IBM for infringing on their trademarks, so KESCON or some similar portmanteau or initialism is simply out of the question. Thankfully, it's not a big problem to come up with a decent name of my own: Remex channels.
I selected the name remex because it is the flight feathers of a bird; in a way, it's one of the "primary interfaces" between a bird and its environment.
Kestrel-3's I/O channels are based on 1x6 Pmod connectors, 3.3V logic, IEEE-1355 DS-SE-02 signalling, and using a modified Spacewire-like protocol for communications between the computer and peripherals. The result is not compatible with Spacewire or even stock IEEE-1355, due to my insistence for supporting bit-banged peripherals on Arduino-class microcontrollers, which depending upon how they're programmed, can operate at best in the kilobits per second range. However, if the device relies on an FPGA or a GA144-type microcontroller, performance can easily reach many tens of megabits per second.
As I type this, I have completed a preliminary data-strobe decoder and character decoder for the receive-pipeline, which is arguably the most performance critical part of a Remex link. (See Github repo.) Right now, icetime reports that the top clock rate for the receiver is 157 MHz, which means you could theoretically feed it a 51 Mbps input data rate. (Unlike IEEE-1355 links made professionally, I'm not using self-clocked receiver logic due to the innate difficulty with getting such a thing working on a single development tool-chain, much less across a plurality of different FPGA development systems!) The icoBoard Gamma has a 100MHz oscillator standard, so I expect to drive it at 100MHz to achieve a top throughput of 33 Mbps. That's not a fantastically high data rate (a smidge over 2.5 MB/s peak data rate; real-world performance remains to be measured); but, for an amateur production like mine, it should be plenty powerful enough for a long time to come.
Besides, if we really need 200Mbps throughput, someone can release an FPGA-/toolchain-optimized revision to the core which enables the receiver to be truly self-clocked. One thing is for sure: 2.5 MB/s isn't fast enough to support even monochrome 640x480 bitmapped displays at 60fps. However, it is capable of 30fps (needs only 1.6 MB/s), so basic animations should still be doable.
I'm still playing around with the circuit details as I develop it, since this is the very first time I've ever made any IEEE-1355-compliant link. It's also why I'm not writing any unit tests at this time; things are prone to change quite drastically as I learn more about the requirements of the circuit. For now, all the test-benches just generate waveforms for viewing in gtk-wave or similar tool.
Moving Kestrel-3 to icoBoard Gamma with New Architecture
01/31/2017 at 20:24 • 0 comments

From the abstract:
Problems bringing the Kestrel-3 up on the Nexys-2 board forces me to try bringing it up on a new FPGA board instead, the icoBoard Gamma, based around the iCE40HX8K FPGA. However, the limitations of this FPGA seriously constrains the design of the computer, as the CPU just barely fits as it is. I’ve decided to brutally murder my darlings, shed all unnecessary I/O features that basically defined the Kestrel-3 as a home computer, and focus instead on pure compute and aggregate I/O capability. Off-loading non-essential I/O to intelligent peripherals, via I/O channels, brings the design of the Kestrel-3 closer to that of an older IBM mainframe, a la System/360 or System/370.
Check out the full article here.
More Thoughts on Kestrel-3 Evolution over the Years
01/30/2017 at 06:40 • 0 comments

What follows was originally formatted as a tweet-storm, but I felt it important enough to be preserved here.
All of the problems I'm running into with the Kestrel-3 have led me to "down-spec" several times. I've felt (and still feel) really horrible about down-spec'ing the computer. It makes me feel like a failure. I hate promising A, but delivering B, and even then, only after a year-long fight. But, what if fate is causing me to evolve the design to something more open, or more flexible?
It's occurred to me that the constraints imposed by icoBoard, etc. forces me towards something closer to a mainframe architecture (especially IBM System/360) than to an Atari ST. Maybe this isn't the architecture I wanted, but the architecture the open-source hardware community will accept.
Not that I'm bashing mainframes, mind you. I love learning about them, and always wanted to play on one. It's just that I expected a desktop to be easier to build. Maybe I was wrong?
I figure a "desktop" Kestrel will just consist of the mainframe with an embedded terminal hard-wired to the system console. This might not be a bad way to go: it isolates the terminal's capabilities from the core of the computer, freeing the Kestrel from any possibility of firmware coming to depend upon specific hardware attributes. This allows the terminal to evolve independently of the core computer: the Nexys-2 has a 256-color display, the DE-1 a 4096 color display, but the myStorm and icoBoard both have nothing (at least without an external attachment). Maybe someone would want to use a Gameduino-1 or -2; all it'd take is a microcontroller to sit between the user's monitor and the Kestrel-3.
My problems could very well be a blessing in disguise.
icoBoard Received; Kestrel-3 Thoughts.
01/28/2017 at 17:07 • 2 comments
A few days ago, I received an icoBoard Gamma with 1MB of SRAM on board. I'm thinking I should port the Kestrel-3 directly to this platform, and just ditch the Nexys-2 (at least as the primary development platform). The latter is costing me much time and frustration, and honestly, I'm growing impatient. It *sucks* that I won't have 16MB of RAM to play with, though.
The problem with the icoBoard is the same problem I have with my homebrew, BackBone-based design for the Kestrel --- the FPGA is an iCE40HX8K device, and that means only 7000-ish 4-input LUTs at my disposal. Kestrel-3, as it's currently defined, will push this chip to its absolute limits. My full vision for the Kestrel-3, where it's powerful enough to run Plan 9 for its OS on a color display with decent resolution and audio capabilities, won't fit on this at all. Guaranteed.
According to Xilinx's tools, I'm already using somewhere around 5500 LUTs. If I were to add a page-table-walker to implement support for walking page tables, even without changing anything else, the resulting design is not likely to fit on the icoBoard without off-loading at least I/O onto a separate module.
An additional challenge is the potential lack of ROM resources. I don't believe I can place code in ROM like I can with the Nexys-2-based design, for that'll cost yet more LUTs. I'll try, but I won't expect much success here. I fully anticipate I'll need to add hardware to IPL the computer off of a serial interface or something.
Finally, there's the small issue of general purpose computing I/O: keyboard, mouse, video display, and SD card access for mass storage. Keyboard and mouse are easily achievable with the addition of a pair of $15 boards from Xess. Video display is also achievable with a similarly priced board that takes up two PMODs to give me only 1024 colors on the screen (due to what I feel is a design error). Alas, the icoBoard only has four PMODs, so that exhausts its easily accessible I/O capacity. To go beyond this, you'll need to solder something directly to its 4*17 interface, or use flat cables (which provides vastly more I/O capacity, but it's not cheap and it's fragile).
So, I've been thinking, maybe I should just ditch the idea of a home computer design all-together, and just go with a mainframe-inspired design. E.g., the core of the computer is a processor, some tightly bound memory, and a number of DMA-driven I/O channels that communicates with the outside world somehow. Period. That's all you get.
The idea being, some PMODs can be configured as dumb, RS-232, XON/XOFF-flow-controlled channels, while other PMODs can be built out for beefier I/O requirements. For example:
- PMOD 0 can be wired as two, 3-wire RS-232 ports talking to dumb terminals.
- PMODs 1, 2, and/or 3 can be wired as SPI ports (supporting, e.g., Digilent's SD card PMod adapter).
- Extra credit: PMODs 1, 2, and/or 3 can potentially be (software-)configured to serve as IEEE-1355/SpaceWire links, providing access to six channels of switched (and, thus, infinite) I/O expansion. Mapping RapidIO here could be simpler than a native RapidIO serial link.
This would mean that you now need a separate and dedicated terminal to talk to the Kestrel-3's compute core. It would also destroy any hope for 60fps video refreshes except when using external GPU devices. OTOH, it does allow the use of external GPU devices, such as Gameduino and Gameduino-2 boards, provided suitable drivers are written to support them.
I hate having to make these compromises. I know my original vision is possible. It's just frustrating that I seem inept or incapable of achieving my goals.
MyLA: Debugging Tool for PSRAM (I Hope).
01/12/2017 at 08:03 • 0 comments
I've adapted the KIA core for a new role as a logic analyzer data acquisition core, now called MyLA. I just pushed the core up to GitHub. It's not part of the Kestrel suite of cores, but maybe it should be. We'll see. For now, I'm hosting it under my personal account.
The plan is this:
1. Take PSRAM off the primary CPU bus, and leave only block RAM. Relocate block RAM to $0000000000000000-$000000000000BFFF. This gives 48KB of system RAM to play with.
2. Introduce a new core to expose PSRAM to the processor via Memory-Mapped I/O (MMIO). Once again, we can try async to start with, and maybe switch to sync later on. If going synchronous, then we need to ensure the clock is slow enough to afford MyLA some reasonable ability to sample the transaction. Maybe driving the PSRAM chip at 6.25MHz?
3. Install the MyLA as another MMIO device, perhaps with some additional help from the GPIA as well. Configure MyLA to monitor the PSRAM controller's RAM-side interface.
4. Implement enough firmware to interactively trigger PSRAM operations of various types (write BCR, read data from an address, write data to an address), as well as to arm/disarm the MyLA core.
5. Implement enough firmware to visualize captured MyLA traces and to explore them.
This is going to take some time to complete. This weekend will be bad for me as I'll be at a fursuiting convention. Come Monday, I'll be back at it. Here's hoping I make good progress.

Prev Next

63 : 10	9	8 : 1	0
11111....1111	1	Data	0

0 : 7	8 : 63
Data	00000.....000000

Abstract