Retro challenge 2017/04 project to create a TMS9900 compatible CPU core. Again in a month... Failure could be an option...
To make the experience fit your profile, pick a username and tell us what interests you.
We found and based on your interests.
If you are interested, I have created a Verilog version of the TI-99/4A. It can be synthesised with the open source IceStorm toolchain. Supports the cool ULX3S board.
Very much work in progress, but due to some interest I decided to release the code in its current version. Please see:
Happy New Year!
I've been preoccupied with other stuff, but found a little time to work on my TI-99/4A clone. One of the things I've been wanting to do a long time is to understand how cache memories work, so I created one for this system. My update at GitHub explains the details, but there is now a 1K byte combined code and data cache, and system performance is up by 22%.
Adding one cache is cool, but having more than one is better, so more coming - stay tuned. The cache is really the enabling piece in increasing concurrency within the CPU, since having caches allows multiple memory accesses to be processed simultaneously. The TMS9900 is a very memory intensive processor, as the architectural registers (the so-called workspace) is actually located in main memory. Thus it benefits from cache memories perhaps even more than many other systems.
Even with only the simple system-level cache that I created there are now many more opportunities to optimise the CPU's execution, as now memory reads no longer dominate execution time as much as in the past.
Already second project update for the day! I added way more test cases to run through more instructions.
Now testing includes instructions ANDI, CB, SB, AB, XOR, INC, DEC, SLA, SRA, SRC, MOV, MOVB, SOCB, SZCB and X instruction comparisons in addition to the earlier tests for A, S, SOC, SZC, DIV, MPY, C, NEG and SRL instructions.
These were good additions, as I found two more bugs with CPU flags: the CB instruction did not set parity correctly, and the ABS instruction did not set overflow flag at all. I fixed those two, interestingly CB instruction sets parity according to source data byte instead of ALU subtract output, so that needed special casing. I suspect in the original TMS9900 there is only one parity generation circuit and it is sampled at a different time, I simply added a 2nd parity calculation.
After fixing these two bugs now the problem I had has disappeared, so now PRINT 1*-1 returns -1. I suspect this must be the ABS bug fix that helped.
I guess these fixes mean I need more test cases, since I am sure there are more bugs.
After remembering where I was in the project I started to look for bugs in my CPU. I know it does not fully work, for example since running in TI BASIC I get:
PRINT 1*-1 1
So something is not working in the CPU. In order to work on this, I took advantage of my previous design, which combines a real TMS99105 CPU with my FPGA implementation of the TI-99/4A. Running the above on it yields the correct result, -1.
So I wrote a piece of test code, ran it both on the FPGA CPU and the TMS99105 while capturing the results (by dumping a section of memory on both systems to a file), below is the comparison:
Left hand side is TMS99105, right hand side is my soft CPU, i.e. the FPGA TMS9900 core. Each instruction is tested 8 times with different data. The source code of this test is below, after the explanation below.
Each instruction test output takes 8 bytes or 4 words. The last word of the output are the flags (only top 6 bits preserved). The result words are R1,R2,R3 and flags. The instruction is always executed like SUB R1,R2. Thus R2 is the result and R1 shows the source operand.
With that, we can see that there is a difference with the second instruction under test. It is the subtract instruction.
The first subtraction works fine (SUB 1,2, i.e. 2-1) but the second has a difference in the flags (SUB >7FFF,1) where the soft CPU has >2800 while the real CPU has >2000.
The flag that is different is ST4 i.e. the overflow flag. Also in the the other SUB instructions the overflow flag is sometimes bogus, here is a table of the eight cases:
SUB >1,>2 OK (note that with the TMS9900 this actually is the 2-1 operation)
SUB >7FFF,1 Bug: soft-cpu asserts overflow
SUB >8000,>7FFF Bug: soft-cpu does not assert overflow
SUB >7FFF,>8000 Bug: soft-cpu does not assert overflow
SUB >FFFF,>8000 OK
SUB >8000,>FFFF Bug: soft-cpu asserts overflow
SUB >8000,>8000 Bug: soft-cpu asserts overflow
SUB >0,>8000 OK
Looking at the data sheet carefully, there is a difference how ST4 (overflow) is asserted for adds and subs, the first condition is inverted. I bet I don’t do this.
Adds: If MSB(SA) == MSB(DA) and MSB of result != MSB(DA)
Subs: If MSB(SA) != MSB(DA) and MSB of result != MSB(DA)
The only other difference with this data is at >0120, and here the result is wrong (but flags ok). Since R3 has changed, it must be a DIV or MPY instruction.
First instruction test output at 0..>3F, 2nd at 40..7F, 3rd at 80..BF, 4th at C0..FF, 5th at 100..13F. So the fifth instruction.
And indeed it is the DIV instruction - like PNR reported. One of my test cases at least now catches the problem. It is fifth test case, from above it is
DIV >FFFF,>8000 i.e. >8000 divided by >FFFF. The result should be >8000 as quotient and >8000 as remainder.
But my code gives >FFFE as quotient and >FFFE as remainder too.
Here is the TMS9900 assembler test code (story continues after the listing):
; EP 2018-09-23 - run through a sequence of instructions with data and write
; results to RAM. This is to enable comparing the FPGA CPU and TMS9900.
LI R5,>2000 ; point to result table
LI R7,TEST_ROUTINES ; point to test routines
RUN_TEST
MOV *R7+,R8 ; address of routine to test
LI R6,TEST_DATA_SEQ
!
MOV *R6+,R1 ; fetch test parameters
MOV *R6+,R2
CLR R3
; perform operation under test
BL *R8
; save results
MOV R1,*R5+
MOV R2,*R5+
MOV R3,*R5+
STST R3
ANDI R3,>FC00 ; only keep meaningful flags
MOV R3,*R5+
CI R6,TEST_DEND
JNE -!
CI R7,TEST_ROUT_END
JNE RUN_TEST
; write end marker to memory
LI R3,>1234
MOV R3,*R5+
MOV R3,*R5+
MOV R3,*R5+
MOV R3,*R5+
And here is the data:
TEST_DATA_SEQ ; Parameters to pass two various instructions
DATA 1,2 ; First data set
DATA >7FFF,1 ; 2nd
DATA >8000,>7FFF
DATA >7FFF,>8000
DATA >FFFF,>8000 ; 5th
DATA >8000,>FFFF
DATA >8000,>8000...
Read more »
I wrote the following as my comments to the GitHub commit I just made (formatted better here). I should additionally say that there are four branches at GitHub, the master branch and the soft-cpu-tms9902 branches are at the moment the ones I checked and/or worked with today.
Commit 2018-09-22:
Good to be back with the project!
A quick addition of the day - this one was really easy to do as interfacing to a normal TI keyboard from the FPGA is way easier than communicating with the PC's keyboard through USB and the server process.
The implementation quite literally only involved in bringing out the keyboard row / column wires from my TMS9901 interface chip implementation inside the FPGA. There are no external active or passive components other than the keyboard switches, thanks to the internal pull-ups of the FPGA.
An update after a long last!
The next step for the design is to make the FPGA system stand-alone, i.e. able to boot and operate without a host PC. A USB connection will still be needed, but only to provide power. Today I implemented a new feature, where after reset the FPGA logic will load 256K of data from the SPI flash ROM to the SRAM of the system. That allows the system get the TI-99/4A system ROMs and GROMs to the static RAM in appropriate places. After the download one of the DIP switches controls the CPU's automatic boot - if switch zero is set the CPU in the FPGA will automatically boot and start executing the code that was transferred to SRAM.
The 256K of data is divided into three regions:
The Pepino board has 1M of static RAM overall. I had forgotten that the board has actually 16 megabytes of SPI flash storage so there is plenty of potential here.
The design of the SPI flash interface is from Magnus Karlsson, the designer of the Pepino FPGA board. I used the code from his Mac Plus example, and modified the code for my purposes. His code is written verily while my code is in VHDL, so I wrote the standard VHDL component header to enable me to interface the Verilog code from VHDL.
I pushed to GitHub an update to my TMS9918 VHDL core, adding support for undocumented but somewhat widely used and known graphics mode 2 masking features. The lack of this feature was the culprit of making the megademo (see my previous update) not working properly in quite a few screens in a systematic way.
With these fixes the megademo works much better, but there are still some problems (including the fact that the demo gets stuck at a certain point after running successfully through quite a few demo phases - the CPU core continues to run, but it appears to be in some kind of a loop that it cannot escape). So as always, fixing some bugs means its time to fix the next bugs...
The character masking feature appears in two places in the VHDL code, using low bits of registers 4 and 3 as character cell masks, the example below illustrates the use of register 4 during character cell address calculation in graphics mode 2:
-- Graphics mode 2. 768 unique characters are possible. -- Implement UNDOCUMENTED FEATURE: bits 1 and 0 of reg4 act as bit masks for the two -- MSBs of the 10 bit char code. This allows character set to be limited even in this mode. vram_out_addr <= reg4(2) -- MSB of the address & (char_addr(9 downto 8) and reg4(1 downto 0)) -- Character code with masks for bits 9 and 8 & char_code & ypos(2 downto 0); -- 8 bit code and line in character
I pushed to GitHub an update to my TMS9918 VHDL core, adding support for undocumented but somewhat widely used and known graphics mode 2 masking features. The lack of this feature was the culprit of making the megademo (see my previous update) not working properly in quite a few screens in a systematic way.
With these fixes the megademo works much better, but there are still some problems (including the fact that the demo gets stuck at a certain point after running successfully through quite a few demo phases - the CPU core continues to run, but it appears to be in some kind of a loop that it cannot escape). So as always, fixing some bugs means its time to fix the next bugs...
The character masking feature appears in two places in the VHDL code, using low bits of registers 4 and 3 as character cell masks, the example below illustrates the use of register 4 during character cell address calculation in graphics mode 2:
-- Graphics mode 2. 768 unique characters are possible. -- Implement UNDOCUMENTED FEATURE: bits 1 and 0 of reg4 act as bit masks for the two -- MSBs of the 10 bit char code. This allows character set to be limited even in this mode. vram_out_addr <= reg4(2) -- MSB of the address & (char_addr(9 downto 8) and reg4(1 downto 0)) -- Character code with masks for bits 9 and 8 & char_code & ypos(2 downto 0); -- 8 bit code and line in character
I pushed to GitHub an update to my TMS9918 VHDL core, adding support for undocumented but somewhat widely used and known graphics mode 2 masking features. The lack of this feature was the culprit of making the megademo (see my previous update) not working properly in quite a few screens in a systematic way.
With these fixes the megademo works much better, but there are still some problems (including the fact that the demo gets stuck at a certain point after running successfully through quite a few demo phases - the CPU core continues to run, but it appears to be in some kind of a loop that it cannot escape). So as always, fixing some bugs means its time to fix the next bugs...
The character masking feature appears in two places in the VHDL code, using low bits of registers 4 and 3 as character cell masks, the example below illustrates the use of register 4 during character cell address calculation in graphics mode 2:
-- Graphics mode 2. 768 unique characters are possible. -- Implement UNDOCUMENTED FEATURE: bits 1 and 0 of reg4 act as bit masks for the two -- MSBs of the 10 bit char code. This allows character set to be limited even in this mode. vram_out_addr <= reg4(2) -- MSB of the address & (char_addr(9 downto 8) and reg4(1 downto 0)) -- Character code with masks for bits 9 and 8 & char_code & ypos(2 downto 0); -- 8 bit code and line in character
Create an account to leave a comment. Already have an account? Log In.
Become a member to follow this project and never miss any updates
This project has the potential for all of us TI 99/4A types. I am not technically anywhere near your league, but find this project very interesting. I saw your posts in Atari Age. I'm x24b over there, and here. Let me introduce myself, I'm Robert Webb.
The potential for a follow on machine to the 4A using this base is important, if that is a goal in the back of your mind. The now funded Kickstarter "ZX Spectrum NEXT" is a fascinating project breathing life back into a 30 year old product line. Maybe we have our own version, right here.
Thanks for your interest and motivation in one of my favorite machines. If nothing else, I'll follow your work here and smile every time I check in.
Best of Luck.