Preamble
vector06cc relied on existing CPU cores. For the main computer it was the good old T8080 by MikeJ, which I fixed in a few places to work as a proper 8080 back in the day. For the floppy emulation I used 65c02 by Peter Wendrich. Both are excellent cores and they worked well. However I started noticing instabilities when building my project for Tang Nano 9K. The problem is that the chip was over capacity, and Gowin tools start to behave unpredictably when this happens. I needed to free up some floor space. For example, a perfectly working floppy subsystem stops reading SD card when I add a sound chip. Or the main computer can't exit reset state when I add a floppy. And it's different every time. It's impossible to advance when things turn up like that.
Disclaimer: I don't know how to properly count the size of synthesized cores, but Gowin Floor Planner shows a number next to each module which seems to represent occupied resources well. I'll call them simply "units".
As good as these aforementioned cores are, they seemed pretty big on the floor. T8080 around 1800 units and 65c02 around 1700. Back in the day (around 2008) there weren't many alternatives. Today we have more choices.
Main CPU
As for the main CPU, there is a brilliant die shot reverse engineered kr580vm80a by Vslav: https://github.com/1801BM1/vm80a It was many times proven to be accurate, so it was time for me to adopt it for vector06cc. vm80a is an interesting model because it follows the original design closely, so it needs two phase clocks like the original. The changeover procedure was relatively painless and the new core worked almost right away. The difference on the floor plan was amazing: 1800 down to 860. This created some elbowroom for the fitter to work, but I needed more.
Floppy CPU
vector06cc uses a second CPU to read floppy image files from an SD card and feed the data to a WD1793 floppy controller emulator. It also provides OSD interface to select disk images and provide some control of the system.
The floppy CPU is 65c02 originally, which synthesizes to 1700 units, which is too fat for a helper CPU. First thing, I updated it to Arlet's 6502 Verilog model. I already did that before for https://github.com/svofski/vector06c-lesshadoks The replacement wasn't all roses, but eventually I managed. This core compiles to around 1100 units, which is much better than 1700. Because this service CPU can be changed to anything, I looked around for other options. These days everyone is supposed to be a fan of RISC-V, but I found it difficult to do on this occasion. They are perhaps compact compared to other modern 32-bit cores, but they tend to be overspecified for my task. SERV is a really interesting option, but I think it wouldn't perform great at my comfortable 24MHz clock rate.
One of the considerations for making a choice of the floppy CPU was also the code size because BRAM is at a premium. 6502 is an amazing CPU maybe, but compiled C code for it is not the best. I quickly built my emulator code for potentially available CPUs:
6502 (cc65) | 14891 | Arlet's 6502 | baseline |
8080 (z88dk) | 16449 | light8080 | |
msp430 (gcc) | 9184 | neo430 | a clear winrar |
avr (gcc) | 10688 | avr_core_v14 | can't really remember which AVR core I tested |
zpu (gcc) | 14445 | zpu-avalanche | couldn't link but added up object code sizes |
light8080 is very compact, but it's not better than 6502 at running C code. avr is good, but avr cores that I found were not smaller than 6502 and they looked intimidating. I had some hopes for ZPU, but the model that I found wasn't easy to use as well and the code size didn't impress me at all. The compiler is also a bit difficult to set up.
msp430 code size impressed me. But most of all I was impressed by the size of the neo430 CPU core, it was just around 600 units! (* 632 in the current build) The only inconvenience is that the author decided to arrange it as a complete enclosed SOC, like as it were a microcontroller in an FPGA. This is probably great for some projects, but I rather needed a general purpose CPU. So I had to rip it out and reintegrate it in my own fabric. It wasn't as straightforward as it seemed to be at first. I do not really do VHDL except when at gunpoint, and I don't understand some of the concepts used in this alien language. But fast forward a few days, turned out that I needed to write a new top-level entity for neo430 with signals that are somehow "resolved". Without that everything would be just swept away during synthesis.
So I managed to synthesize it, but it was still a long way to a working system. I had to replace basically everything in the floppy emulator in order to switch to the new 16-bit CPU. I can't simulate Verilog and VHDL together. Fortunately, the great guys who write GHDL implemented a fantastic feature that allows you to resynthesize Verilog back from VHDL source. The output is not exceptionally readable, but you can feed it to Icarus Verilog and it just works. Hoorj.
set -x
PATH=/opt/ghdl/bin:$PATH
rm neo430-obj93.cf neo430_cpu.v
NEOSRC="core/neo430_package.vhd core/neo430_addr_gen.vhd core/neo430_alu.vhd core/neo430_control.vhd core/neo430_cpu.vhd core/neo430_reg_file.vhd "
for f in $NEOSRC ; do
ghdl -a --work=neo430 --std=08 $f
done
ghdl -a --work=neo430 --std=08 neo430_cpu_std_logic.vhd
ghdl synth --std=08 --work=neo430 --out=verilog neo430_cpu_std_logic > neo430_cpu.v
Note that Ubuntu seems to have in its repositories some prehistoric version of GHDL which can't do any of this.
Fortunately, this worked. And I was able to debug the basic system to move on to FPGA.
Unfortunately, Gowin's idea of Verilog sucks (it's not the first time, I had problems with Arlet's 6502 too) and it sees assignment loops where there's none. So this Verilog model can't work in Gowin.
Fortunately, Gowin still can mix VHDL with Verilog. So after simulating some, I could use the original neo430 (with my custom top-level entity) in the project.
I had to change the RAMs, the OSD RAM, basically everything was updated in the floppy subsystem and in the OSD to match this fancy new CPU. And I'm really happy with the result, because the shiny new floppy subsystem takes only 1489 units, of which the CPU is just 632! The amount of used BRAM also can be reduced now because the code is 1.6x more dense than the original 6502. Crazy.
To sum it up, I shaved down around 2100 units by using more compact cores.
The real result
The real useful result is that now the entire project synthesizes (cautiously) reliably, floppy emulator and sound subsystems work together. No unexplained behaviour can be seen.
What's next
My project is still completely mute. I have some options of making a sigma-delta DAC output, or buying a little board with I2S sound codec which would also work as an input.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.