A 50% size replica of my favourite 8-bit computer
To make the experience fit your profile, pick a username and tell us what interests you.
We found and based on your interests.
CAS files are a sort of container format for storing data as it was saved on cassette tapes. It was used primarily by BASIC, and there are several variants. The native BASIC used two formats for CSAVE (tokenized Basic program) and BSAVE (raw data image), the modulation used was PSK, same as ROM loader. BASIC-Korvet used a more sophisticated MSX CAS file structure, although probably only a subset of it. These files are FSK-modulated, which is not typical for Vector-06c.
I had 12K allocated for instruction ROM in my NEO430-based subsystem. And even with the incredibly compact msp430 code, the new features grew up in size and I have reached the 12K limit. It turned out however, that due to BRAM memory structure, allocating 16K (which is 8K 16-bit words) takes the same amount of BRAM blocks, so I was able to expand available ROM for even more features.
Supporting CAS files means generating a sound based on CAS file content on the fly. Loading happens as if it was a sound recording.
So far I have implemented loading BASIC 2.5 CAS files. BASIC-Korvet is a bit more involved, but it will be added soon.
Here's a little video update which I hope you'll find enjoyable.
I have added a couple of features that I should have added a decade ago, namely ROM file load and WAV file load. They are far from perfect, but it's a start. For example, ROM load requires that it's reset into tape loading state first (F1+F12), and there's no good visual indication of operation progress. For some reason I've yet to discover, not all WAV files load well. For example, I can't load a turbo load wavs, nor can I load a program saved from Basic-Korvet. I will demo all that in due time. Meanwhile I've been stuck on something completely ridiculous...
I've been playing with sound, and even though I had nice improvement in my PWM output, it's still pretty noisy. So I wanted to measure the noise when powered by PC vs powered by a power bank. For this to work I needed to program the flash so that it cold boots. I did that many times already.
So I flash it, but suddenly I get an error and a Status Code: 0x00035421. Awesome.
GowinSemi tech support promptly responded. Turns out that if I updated the programmer, I could also click on Status Code line in the log and see it decoded. It decoded to CRC ERROR. So I updated the programmer, tried it again. This time it decoded to BAD CMD.
I immediately guessed that the flash must be dead. But I decided to try to flash another project, and tried it with an LCD example. It worked just fine (not without some cable juggling and every version of the Programmer has its own ritual, but eventually it did).
So I thought maybe my project is too big and it tries to write over some bad sector in the flash. I disabled Floppy support. Flashed the project, it works.
Oh well I think. Let's enable floppy support back and try it again. Flashed the project, it works.
¯\_(ツ)_/¯
Meanwhile the support responds something about flash needing 1MHz max, and that's the problem with this cable. The cable has a frequency setting in the programmer. But they say it's some different setting. But they give no clues about what I should do to make it work. Oh well... Moving on I guess.
P.S. Noise (after normalisation) left: PC, right: power bank.
We do not recommend programming the eFlash of 9K in Tang Nano board due to the USB-Jtag chip.
If you want program the eFlash, we recommend either make your own board and choose GWU2X or FTDI chip or remove the USB-JTAG chip and put a 4 pin Jtag connector and using our download cable to program in the Tang Nano Board.
I guess this is the correct answer, even if this isn't the answer I want to hear. I will continue using the tools at hand and hope for the best.
In my previous update I already had a PWM sound output which was okay for testing, but it was noisy. Regardless of the state of the output, there would be a constant whine and grumble. Not too loud, but it was at the levels impossible to ignore. I connect vector06cc to the same pair of speakers which I have for my main PC and it made it very annoying.
Of course I tried filtering the power, tried bypassing the virtual ground and all that stuff.. But nothing really helped. The noise was filtering right through the signal lines. I searched the net for a solution. Maybe the search terms weren't right, but somehow it seemed that there's no solution for this. You could find a flat out dismissal ("PWM is digital, your argument is invalid") or a non-solution ("you can't make PWM clean, use a DAC"). I didn't want to put up with that, at least not so easily.
PWM works by slicing the voltage on the power rail. And if the power rail is dirty, the PWM carries all this dirt farther down the pipeline. So the dirt kind of travels on top of the square wave. And because the dirt can have audible frequencies, you can't really filter it. The solution is actually fairly simple: make a clean power source and use it to re-cut the dirty PWM signal into a clean PWM signal.
Here's the schematic that implements this idea. It's very simple, but it's a fruit of a lot of frustrating experiments.
The crucial part here is the humble 1117-3.3 LDO. I used a drop-in 78xx replacement module which already has ceramic capacitors on it. The output of this module is seriously cleaner than you get on any of the power rails near FPGA. It powers a 74HC14 (which I picked simply because it was on the desk already anyway). You can actually skip the middle part, but it will probably radiate something unwanted and be more quiet than ideal. So there's a very simple filter and a buffer amplifier. I didn't want to bother with virtual grounds so I used the most basic single supply DC amplifier configuration. It limits the range somewhat, especially considering we only have 3V3 power. But simplicity wins, especially for stereo where you have to repeat everything twice.
I can hear some high pitched whine when I max out the volume on my speakers. It's not ideal. But it's perfectly acceptable for what it is. I think this whine could actually be PWM related and can be helped by a better filter circuit, but that's for another day.
Another important update is a working keyboard. After some libations I decided in favour of Pi Pico as a USB host. All it does for now is, it converts pressed keys from the HID report into the full keyboard matrix (which is just 9 bytes) and sends updates to the core via a serial line, which is just a single wire. I experimented with other devices, some of which I documented in previous updates. This one is by far the most accessible and universal and it offloads a lot of complex logic from the FPGA. It also opens possibilities to use other kinds of USB devices, for example a mouse and a USB mass storage device.
I recorded a video about recent developments as of mid-August 2024.
After some superficialy study of WM8978 I have ruled against it. It is very complex and setting it up is a huge pain in the arse. Worst of all, it requires too many pins to communicate. Pins, especially 3.3V-compatible pins, are a precious commodity on Tang Nano 9K. Looks like this neat little codec creates more problems than it solves, so it's out to the parts bin.
I might want a little builtin speaker some time later, but for now I just wanted a nice buffered output where I could plug my active speakers without fearing too much of sparkles going up delicate 1.8V I/O lines. So I came up with this fairly basic circuit.
The input is 1.8V GPIO (pins 80, 79 on Tang Nano 9K).
The opamp is LM358. There's a passive filter that doesn't pass high frequencies into the amplifier. The gain is set experimentally, I'm still undecided on the value of feedback resistor. Currently it's 220K and it seems to produce levels comparable to what my laptop outputs, albeit a bit quieter. At the same time, they swing wider than what "line out" standard dictates. At the same time, 1 + 220/47 = 5.6, seems that the gain here is a bit high.
My USB hub that I tend to use for debugging seems to add a horrible bit of extra noise in the power lines, so I have a basic 20 ohm / 100uF filter on the 5V line. Not the ideal way of dealing with this, but it removes a lot of annoying hum.
The signal part of the circuit is repeated twice for stereo output. AY mix used in vector06cc is L=C+B R=A+B. The 8253 and beeper output are always mono. Stereo AY sounds neat as it turns out.
I still have many questions and I may need to revisit this circuit later.
Two useful parts have arrived. One is a USB keyboard to serial adapter. The keyboard (and mouse) host looks like this:
It uses CH9350DS to do the dirty job of being a USB host to the mouse and keyboard and converts their output to scancode data on a serial port. Rather nice if true. The documentation is hard to decipher. Here's a link to a copy of the PDF: CH9350DS-Qinheng_KL-024-0000864.pdf There are DIP switches to select baud rate, also the function of two of them is to "set the state" whatever that means. There are only 32 combinations so it should be fine.
If the docs are to be believed, the output serial levels are 3V3 so that's nice.
The other part is a WM8978 audio codec. It's a neat little board (or an MP3 learning pole, apparently) that looks like this:
Unfortunately input and output jacks are not colour-coded. There's also a little microphone soldered from the opposite side for some reason. I don't really think it would be very useful in a vector-06c, but we'll see, maybe we can tape load over-the-air.
The codec chip is a bit more advanced than I need for this project. I could easily get away with a 1-bit sigma-delta DAC, but this board also provides with a nice pair of 3.5mm jack connectors so it seemed like a nice thing to have. The datasheet is rather intimidating though. I've found a couple of projects with initial setup code that I hope would serve as a reference.
The datasheet link: https://www.mouser.com/datasheet/2/76/WM8978_v4.5-1141768.pdf
The pinout is labelled from the reverse side, but you can't see it fully because of the microphone, so here's the pinout picture:
It's also a rare case when a schematic is available, so here it is:
I have not yet tried any of these boards.
vector06cc relied on existing CPU cores. For the main computer it was the good old T8080 by MikeJ, which I fixed in a few places to work as a proper 8080 back in the day. For the floppy emulation I used 65c02 by Peter Wendrich. Both are excellent cores and they worked well. However I started noticing instabilities when building my project for Tang Nano 9K. The problem is that the chip was over capacity, and Gowin tools start to behave unpredictably when this happens. I needed to free up some floor space. For example, a perfectly working floppy subsystem stops reading SD card when I add a sound chip. Or the main computer can't exit reset state when I add a floppy. And it's different every time. It's impossible to advance when things turn up like that.
Disclaimer: I don't know how to properly count the size of synthesized cores, but Gowin Floor Planner shows a number next to each module which seems to represent occupied resources well. I'll call them simply "units".
As good as these aforementioned cores are, they seemed pretty big on the floor. T8080 around 1800 units and 65c02 around 1700. Back in the day (around 2008) there weren't many alternatives. Today we have more choices.
As for the main CPU, there is a brilliant die shot reverse engineered kr580vm80a by Vslav: https://github.com/1801BM1/vm80a It was many times proven to be accurate, so it was time for me to adopt it for vector06cc. vm80a is an interesting model because it follows the original design closely, so it needs two phase clocks like the original. The changeover procedure was relatively painless and the new core worked almost right away. The difference on the floor plan was amazing: 1800 down to 860. This created some elbowroom for the fitter to work, but I needed more.
vector06cc uses a second CPU to read floppy image files from an SD card and feed the data to a WD1793 floppy controller emulator. It also provides OSD interface to select disk images and provide some control of the system.
The floppy CPU is 65c02 originally, which synthesizes to 1700 units, which is too fat for a helper CPU. First thing, I updated it to Arlet's 6502 Verilog model. I already did that before for https://github.com/svofski/vector06c-lesshadoks The replacement wasn't all roses, but eventually I managed. This core compiles to around 1100 units, which is much better than 1700. Because this service CPU can be changed to anything, I looked around for other options. These days everyone is supposed to be a fan of RISC-V, but I found it difficult to do on this occasion. They are perhaps compact compared to other modern 32-bit cores, but they tend to be overspecified for my task. SERV is a really interesting option, but I think it wouldn't perform great at my comfortable 24MHz clock rate.
One of the considerations for making a choice of the floppy CPU was also the code size because BRAM is at a premium. 6502 is an amazing CPU maybe, but compiled C code for it is not the best. I quickly built my emulator code for potentially available CPUs:
6502 (cc65) | 14891 | Arlet's 6502 | baseline |
8080 (z88dk) | 16449 | light8080 | |
msp430 (gcc) | 9184 | neo430 | a clear winrar |
avr (gcc) | 10688 | avr_core_v14 | can't really remember which AVR core I tested |
zpu (gcc) | 14445 | zpu-avalanche | couldn't link but added up object code sizes |
light8080 is very compact, but it's not better than 6502 at running C code. avr is good, but avr cores that I found were not smaller than 6502 and they looked intimidating. I had some hopes for ZPU, but the model that I found wasn't easy to use as well and the code size didn't impress me at all. The compiler is also a bit difficult to set up.
msp430 code size impressed me. But most of all I was impressed by the size of the neo430 CPU core, it was just around 600 units! (* 632 in the current build) The only inconvenience is that the author decided to arrange it as a complete enclosed SOC, like as it were a microcontroller in an FPGA. This is probably great for some...
Read more »This was a big update which hackaday bloody engine screwed up and failed to save. I understand it's a rebel site and it can't be all rosy-pop bubblegum happy, and I appreciate the service anyway, but nnnnggggggghhh!....
Anyway, implemented some bearable vertical 3:5 scaling so that the entire visible height of v06c screen area fits in 480 lines available on my LCD. Input data: TV-like picture, 312 lines. For VGA we do scan-double and hope that the TV/monitor somehow stuffs it up in a kind of PAL 752x568 mode. Here I don't have that, just a raw LCD glass so I need to scale it myself. Visible line count = 16 + 256 + 16 = 288. 288 * 5 / 3 = 480. This is how the numbers came to be.
Left: simple 1-2-2 repeated lines, right - a slightly broken due to physical limitations linear interpolation. It's easy to tell that it exists using tests, but otherwise it's quite good. By the way, it's difficult to capture this difference on a phone camera, which I believe uses its own anti-moire filters that make everything look more or less okay. Real-life difference between the left and the right pictures is more obvious.
Some trick that I'm using: for every 3 source lines (TV-speed lines) I need to produce 5 LCD lines. So it's not a simple scan doubler. And I found that I can simply skip 6th line on the LCD. It works. But it breaks something in its glassy brain, so you need to feed it with more HSYNC pulses later. If you don't it will skip VSYNC and you'll be only getting even frames, and the liquid crystals will twist too far and there will be image retention artifacts.
My theory is that in proper scan mode, it twists/untwists them in opposite direction. If you somehow miss the frame, it continues to twist them in the same direction. Would be interesting to find out more about this. I had the same kind of artifact on ESP32S3, but there I had little control over the scanning process and I think it was always overtwisting. At least it was very prone to image retention.
Proper interpolation would require some real maths. I simplified things a little bit. When computing 5 vertical pixels, every pixel is a sum of 4 terms. So by altering which source pixels go in I can adjust their relative weight. I distributed 3 input pixels evenly, so e.g. 7+7+6 and that's it. The actual code looks like this:
pipmix4 ma1(clk24, rc_a[0], rc_a[0], rc_a[0], rc_a[0], bmix[0]); pipmix4 ma2(clk24, rc_a[0], rc_a[0], rc_a[0], rc_a[1], bmix[1]); pipmix4 ma3(clk24, rc_a[1], rc_a[1], rc_a[1], rc_a[1], bmix[2]); pipmix4 ma4(clk24, rc_a[1], rc_a[1], rc_a[2], rc_a[2], bmix[3]); pipmix4 ma5(clk24, rc_a[2], rc_a[2], rc_a[2], rc_a[2], bmix[4]); ... // pipelined mix = a + b + c + d in 3 stages // input components are bgr233, output mix is bgr555 // s1 = a + b // s2 = s1 + c // s3 = s2 + d + 1 module pipmix4(input clk, input [7:0] a, input [7:0] b, input [7:0] c, input [7:0] d, output [14:0] mix); reg [4:0] rp [2:0]; reg [4:0] gp [2:0]; reg [4:0] bp [2:0]; reg [7:0] aq [1:0]; reg [7:0] bq [1:0]; reg [7:0] cq [1:0]; reg [7:0] dq [1:0]; always @(posedge clk) begin aq[1] <= aq[0]; aq[0] <= a; bq[1] <= bq[0]; bq[0] <= b; cq[1] <= cq[0]; cq[0] <= c; dq[1] <= dq[0]; dq[0] <= d; rp[0] <= a[2:0] + b[2:0]; // stage 0 rp[1] <= rp[0] + cq[0][2:0]; // stage 1 rp[2] <= rp[1] + dq[1][2:0] + 1'b1; // stage 2 gp[0] <= a[5:3] + b[5:3]; gp[1] <= gp[0] + cq[0][5:3]; gp[2] <= gp[1] + dq[1][5:3] + 1'b1; bp[0] <= a[7:6] + b[7:6]; bp[1] <= bp[0] + cq[0][7:6]; bp[2] <= bp[1] + dq[1][7:6] + 1'b1; end
Resolution:
One expansion that's a must for every Vector-06c is a 256K ramdisk, colloquially known as kvaz (quasi-disk). It's in fact a ram expansion which presents itself as 4 64K pages of stack-addressable ram. Part of each 64K page can also be mapped as a 16K window in screen space area. A common improvement to this disk is a so-called Barkar Extension, which allows opening extra 2x 8K windows into 64K address space, thus making 128K of 256K directly addressable.
A common v06c would have one such attachment. However with some craftsmanship, two kvazas can be used with one machine. They would map to the same address space and be configured via separate I/O ports. However, there exists one test program that can simultaneously probe 8x kvaz, making up a total of 2MB of RAM. Unfortunately I don't know of any software that would make use of such a vast amount of memory, except for the test itself. Take note that even just zeroing 2M is a pretty serious undertaking for a 8080-based computer.
With 32x2Mbit PSRAM available on the FPGA chip it would be a crime not to support this feature (and with a lot of room to spare), so here we go.
So although the last update already had a lot of promise, the machine could not really survive much farther past the initial boot screen. After a couple of days of juggling around with various priority combinations and delays trying to squeeze memory accesses with PSRAM clocked at 48MHz I decided to finally give 72MHz a try. First attempt worked, but seemed flaky. After some trial and error I found a delay that seems to work well, at least mostly.
Now it's not ideal yet. There are sometimes video ram accesses that don't make it just in time. So it's subject for a deeper scrutiny later. I already found one program that fails terribly. And of course there's the eternal bane of all things Vector-06c: palette register write delay. When it's off, it looks like this:
Gowin version of Verilog is a bit weird, but after spending some very frustrating days hunting down the gotchas, I have the first signs of life. Behold, vector06cc on Tang Nano 9K with 800x480 RGB panel. The chip is GW1NR-LV9QN88PC6/15.
Quirk 1: `default_nettype none
It's been customary for me to use this to avoid implicit inference of forgotten or just mistyped nets. If the default type is none, a signal without an explicit declaration is an error and you instantly know. In Gowin flavour of Verilog if you set default nettype none, you become obliged to also declare each wire input / output of a module explicitly as wire. This looks exceedingly ugly and makes zero sense. But the worst part is that it makes your project incompatible with GAO. So I had to say goodbye to this and be extra alert with the warnings.
Quirk 2: discarded assignments after implicit net instantiation
For example if you have code like this:
mymodule instance(.input(important_wire), .output(something));
wire important_wire = a & b;
From the point of view of Quartus, it's perfectly fine. But in Gowin this results in important_wire flapping around in the breeze without a driver. Everything compiles but doesn't work and you're frustrated for many hours.
Apparently what happens is this. The compiler finds important_wire, instantiates the net as a wire. Then when it finds the actual declaration with assignment, it treats it as a duplicate and discards not just the declaration but the assignment as well. The solution here is to use declaration with assignments before use, like this:
wire important_wire = a & b;
mymodule instance(.input(important_wire), .output(something));
It also seems that a separate assign is safer than assignment in declaration because it will not get discarded in a similar situation.
PSRAM/HyperRAM
There's some confusion about what kind of memory different Tang Nano boards have. Sipeed's own documents don't do a good job explaining the differences.
TL;DR Tang Nano 9K == PSRAM, Tang Nano 20K == SDRAM.
To go a bit in detail you need to look at the chip family specs, "GW1NR series of FPGA Products Data Sheet", DS117-3.0E, 9/25/20. Table 1-1 "Product Resources" may make you believe that GW1NR-9 has both SDRAM and PSRAM. However table 1-2 "Package-Memory Combinations" on the next page shows that PSRAM and SDRAM are mutually exclusive. QN88P package used in Tang Nano 9K has 2x32Mbit PSRAM dies. From obscure sources, the die seems to be a copy/paste of W955D8MBYA.
I have the 9K, PSRAM.
A huge shout out to [Feng Zhou] for his PSRAM controller: https://github.com/zf3/psram-tang-nano-9k It was easy to tailor it to my needs with 32-bit reads.
I'm clocking PSRAM at 48MHz. A 16-bit read is completed in 10 cycles. For the video controller vector06cc to make it in time, it needs to read 32-bit words. A 32-bit read is done in 11 cycles.
Create an account to leave a comment. Already have an account? Log In.
Become a member to follow this project and never miss any updates