-
Looking at porting the 4 Bit CPU to a FPGA
02/11/2023 at 12:08 • 0 commentsA Virtual PCB
At least that is how I explained it to my partner!
It seems as if FPGA boards are pretty rare, almost all the main suppliers are out of stock, but I did find a Tang Nano 9k on ebay in Australia. So that is what I will use.
Gowin (the FPGA manufacture) has a free and no licence educational version of the IDE for the Tang Nano (only 382 Mb) which works except for the bit stream uploader. Checking the Internet, for Linux there is no solution except for the third party programmer: openFGPALoader. Found instructions to compile openFPGALoader and it works fine. Of note, openFPGALoader's command line arguments are human readable.
So next is to learn Verilog, but I will cheat and download a 74xxx library. But one thing I noted is that the list of implemented logic gates avoids those with tri-state outputs?
Checking the Internet, it is strongly recommended not to use tri-state outputs as FPGA capacity to model them is quite limited. So okay I will redesign the 4 Bit CPU to use multiplexers:
I changed the opcodes to suit the multiplexer, here is the control unit:
I will have to swap the 74173 with a 74377 reduced to 4 bits.
I will have to study the Gowin IP with regard to BROM and BRAM.
74xxx Verilog Code
Although I am not familiar with the syntax of Verilog, the 74xxx code is very easy to understand.
There appears to be option with regard to ROM/RAM, you can roll your own or use the specialised RAM on chip. I have to review:
- Shadow SRAM (17280 bits)
- Block SRAM (468k/26)
- PSRAM (64M bits)
- Flip-Flops (6480, roll your own RAM)
- and set up ROM
TBC ...
AlanX
-
Waiting on a Part
01/28/2023 at 01:10 • 0 commentsPCB Assembly
I have started assembly of the PCBs but I am waiting on a part.
Instruction Set
I have had time to think about improving the instruction set.
The idea came from the number of steps required to swap out the accumulator to memory. Better if I have a register and a swap opcode.
The idea actually frees up the opcode space as I can use the opcode "data" to specify the register and the registers can serve other purposes (e.g. the page register).
I need to look at how the memory data flow works as well.
Rebuilding the Micro-Architecture
I took the well published academic RISC micro-architecture:
And derived the CHUMP micro-architecture:
But along the way I saw an alternative memory arrangement:
This configuration has three benefits:
- The SRAM write logic is simplified (perhaps I over designed the write logic in the first place).
- The opcode logic is unchanged.
- The accumulator data can be written to SRAM in the same instruction cycle (CHUMP does it on the next instruction cycle).
I have tested the new micro-architecture in Logic-Sim (and it works fine.
Adding Registers to the CPU
This allows the inclusion of register read/write logic:
In the above drawing I will probably keep the old PC and JNC logic.
Now I can replace the Page opcode with a register(s) read/write opcode.
Working LogiSim Version 7
Have been working on the Version 7, it now has register read/write opcodes. Added four registers of the eight available slots:
- Write Page/(no page read)
- Write Output/Read Input
- Write Reg A/Read Reg A
- Write Reg B/Read Reg B
Swapped the Page opcodes (Ex/Fx) with JNC opcodes (8x/9x).
Replaced the new 8x/9x opcodes with 8r/9r were r is a register constant or memory reference:
- Page: W/X = 0/4
- I/O: W/R = 1/5
- Reg A: W/R = 2/6
- Page: W/R = 3/7
Here is the Top Level:
The Control Unit:
The ALU:
And the PC:
Overall a pretty significant improvements on CHUMP V5 and the 4 Bit CPU V6.
Here is the test code (the multiply algorithm):
This algorithm uses the new JNC and REGs opcodes, and multiplies F x E (13 x 14), the result is D2 (210).
Parts have Arrived
The parts have arrived. Finished off one of the Diode ROM boards, next is the CPU board:
TBC ...
AlanX
-
Version 6
12/26/2022 at 10:28 • 0 commentsVersion 6
Version 6 is like version 5 except the expanded ALU is not used. Here are the op codes:
I wrote up an 8 bit multiplication routine, first in C:
#include <stdio.h> #include <stdlib.h> #include <stdint.h> unsigned short mul(unsigned char A,unsigned char B) { // Returns: // A = A * B unsigned short res=0; unsigned char i=8; // 8 bit LOOP: res=res+res; if (A>=0x80) { res=res+B; } A=A+A; i=i-1; if (i>0) goto LOOP; return res; } int main(void) { unsigned char A,B; unsigned short M; int i,j; for (i=0;i<=255;i++) { for (j=0;j<=255;j++) { A=(unsigned char)i; B=(unsigned char)j; M=mul(A,B); if (i*j!=M) printf("%6d %6d\n",i*j,M); } } return 0; }
I tested all cases so I know it works. I then dumbed it down to 4 bits:
unsigned short mult(unsigned char A,unsigned char B) { // Returns: // C = A * B unsigned short C=0; unsigned char D=4; // 4 bit LOOP: C=(C+C)&0X0F; // 4 bit adjustment if (A>=8) { // MSB of 4 bit C=(C+B)&0X0F; // 4 bit adjustment } A=(A+A)&0X0F; // 4 bit adjustment D=(D-1)&0X0F; // 4 bit adjustment if (D>0) goto LOOP; // 4 bit adjustment return C; } int main(void) { unsigned char A,B; unsigned short C; A=(unsigned char)5; B=(unsigned char)3; C=mult(A,B); printf("C=5*3 %d\n",C); return 0; }
Although the code handles overflow into the high order bit of the result variable (C), in this implementation, I have not considered overflow of C. Therefore 3x5 is big as the algorithm can handle:
Here is the code running:
In the RAM window: A, B, C & D are updated as the program runs. At the end, the output port displays the answer.
The run starts with:
- A=5; Operand
- B=3; Operand
- C=0; Result
- D=4; Bit count
At the end:
- A=0
- B=3
- C=15 ; Correct!
- D=0
- Output=15
Here is a version that handles lager numbers:
If you used A=13 (D) and B=15 (F), then the result would be C=3 and D=12 (C) or 195.
Refer to the Simulation below:
Unfortunately both of these programs are too big for my 32 byte PROM design.
I will check the schematic tomorrow for any missed errors.
Yeah, found two errors, fixed and forwarded for manufacture.
AlanX
-
Schematic and PCB
12/22/2022 at 12:40 • 2 commentsSchematic and PCB
Started the schematic design, the ALU is pretty well all new, so it will take time.
The layout will be important as the auto-router will struggle with this many chips.
---
The 16 byte diode PROM boards arrived today. Two boards will have nearly 400 components, so they will take a while to solder.
---
Some updates to the simulation model.
---
Some progress on the schematic, trying to group the chips:
Instruction Set Again
The minimum instruction set is:
- LOAD
- ADD
- NAND
- ?
- JNC
- STORE
- READ
- PAGE
Missing are instructions like LEA, CALL and RTN, etc, but these require structural changes.
Subtraction is pretty easy to do, I will use reference variables here.
A = A SUB B:
- READ A
- LOAD M
- NAND F
- READ B
- ADD M
- NAND F
- STORE C
Test if equal:
- READ A
- LOAD M
- NAND F
- READ B
- ADD B
- NAND F
- ADD F
- JNC [A != B]
- ... [A == B]
Test if bits are HIGH:
- READ A
- LOAD M
- NAND F
- NAND MASK
- ADD 1
- JNC [false]
- ... [true]
Test if bits are LOW:
- READ A
- LOAD M
- NAND MASK
- ADD 1
- JNC [false]
- ... [true]
Other logic gates can be derived from the NAND gate, but may require memory to store intermediate results. Although NAND can replace XOR in many cases, XOR is "necessary" for efficient toggling of bits.
An alternate instruction set is:
- LOAD
- ADD
- AND
- XOR
- JNC
- STORE
- READ
- PAGE
Subtraction is using XOR.
A = A SUB B:
- READ A
- LOAD M
- XOR F
- READ B
- ADD B
- XOR F
- STORE C
Test if equal:
- READ A
- LOAD M
- XOR F
- READ B
- ADD M
- XOR F
- ADD F
- JNC [A != B]
- ... [A == B]
Test if bits are HIGH:
- READ A
- LOAD M
- AND MASK
- ADD F
- JNC [false]
- ... [true]
Test if bits are LOW:
- READ A
- LOAD M
- XOR F
- AND MASK
- ADD F
- JNC [false]
- ... [true]
OR is awkward, using memory reference here again:
- READ A ; Set memory reference A
- LOAD M ; Using memory reference
- XOR F
- STORE C ; Save intermediate result
- READ B ; Set memory reference B
- LOAD M ; Using memory reference
- XOR F
- READ C ; Set memory reference C
- AND M ; Uses address from previous store
- XOR F
- STORE C ; Save result to C
Compared to AND:
- READ A ; Set memory reference
- LOAD M ; Using memory reference
- READ B ; Set memory reference
- AND M ; Uses address from previous store
- STORE C ; Save result to C
Compared to XOR using NANDs:
- READ A ; Operand A
- LOAD M
- READ B ; Operand B
- NAND M
- STORE C ; Intermediate result
- READ B
- NAND M
- STORE D ; Intermediate result
- READ C
- LOAD M
- READ A
- NAND M
- READ D
- NAND M
- STORE C ; Save result in C
AND versus NAND
While AND has the advantage that the logic in the general use case is slightly simpler.
If I need to free up an op code slot (i.e. the XOR slot), the NAND op code is the way to go.
Second Thoughts
Not a lot to gain from the second op code page. I think I should have spent my time looking at structural changes.
There are efficient algorithms for multiplication and division that only use ADD and NAND, so SAR and SAL and not required op codes.
Where to Next?
May be a stack to push/pop return addresses and intermediate results?
Eventually I want to look at a single cycle Von Neumann architecture.
AlanX
-
Instruction Set Shuffle
12/21/2022 at 13:21 • 0 commentsInstruction Set Shuffle
Having ADD and SUB in different op code pages seems wrong, as SUB (ACC = Value - 1) can be coded as:
- LOAD Value
- XOR F
- ADD 1
- XOR F
Yes, the carry flag works.
The current op code set would be:
- LOAD Value
- PAGE 8
- SUB 1
- PAGE 0
No saving!
To test for a value you could use:
- LOAD Value
- XOR F
- ADD Test
- XOR F
- ADD F
- JNC [Value == Test]
- [Value != Test]
Or:
- LOAD Value
- SUB Test
- ADD F
- JNC [Value == Test]
- [Value != Test]
A better set of op codes would be:
- 0-1 LOAD/LOAD
- 2-3 ADD/SHR using Carry
- 4-5 SUB/XOR
- 6-7 AND/OR
- 8-9 JNC
- A-B STORE
- C-D READ
- E-F PAGE
AND has been promoted over NAND as it can test for bit states:
- LOAD Value
- AND Bit Mask
- ADD F
- JNC [False == 0]
- [True != 0]
Also:
- ADD 0 clear the Carry flag
- SUB 0 sets the Carry flag.
This will be version 5.
Here is the simulation of up counting followed by down counting, then repeat:
The code for the animation is:
E0 PAGE 0 ; Select Op Code Set 0 20 ADD 0 ; Clear Carry Repeat: 00 LOAD 0 ; Clear ACC Loop1: AF SAVE F ; Output ACC A0 SAVE MEM[0] ; Save to RAM 21 ADD 1 ; Increment 83 JNC 3 ; Loop1 20 ADD 0 ; Clear Carry 0F LOAD F ; Set F Loop2: AF SAVE F ; Output ACC A0 SAVE MEM[0] ; Save to RAM 41 SUB 1 ; Decrement 89 JNC 9 ; Loop2 20 ADD 0 ; Clear Carry 82 JNC 2 ; Repeat
The top level:
The ALU:
And control:
AlanX
-
Subtraction
12/21/2022 at 00:52 • 0 commentsSubtraction
I have had a bit of a problem getting my head around subtraction and the borrow/carry flag. With some CPUs (such as the 6502) the carry flag is set and when underflow occurs then the carry flag is cleared. This works of course but makes JNC is not so useful.
The 8086 works the other way. The carry flag is cleared and when underflow occurs the the carry flag is set. Since the 8085 and 8086 were my first microprocessors, I will go with this system, and JNC works better here.
For the time being I am going to "jumper out" ADC and SBB, for ADD and SUB, as these instructions are more trouble then they are worth, at the moment.
I have used the most significant bit of the page register as an OPCODE flag.
Here is the top level view:
Here is the PC:
And the ALU:
The PAGE op code E0 sets ADD and op code E8 set SUB.
Note E0 00 clears carry while E8 00 sets carry.
It is clear we can add other instructions to the ALU if desired.
Here is some code to count up and then count down. JNC is tripped on overflow/underflow:
E0 Set ADD 00 Clear Carry 20 Clear ACC AF Output ACC A0 Store to Mem[0] 01 ADD 1 83 JNC 3 00 Clear Carry E8 Set SUB 2F Set ACC to F AF Output ACC A0 Store to Mem[0] 01 SUB 1 8A JNC A E0 Set ADD 00 Clear Carry 82 JMP 2
AlanX
-
Jump Logic
12/18/2022 at 00:26 • 0 commentsJump Logic
The current jump logic uses JNC (Jump on Not Carry). It is a complete system but not that intuitive. For multi-nimble/byte arithmetic it is straight forward, if no carry adjustment is required (i.e. not carry), skip the add carry code.
A JGE (i.e. Jump on Greater or Equal) can be constructed by complementing one of the operands, and adding the other operand. Following is an example of testing the input for a program number 1 to 6:
Address OpCode Const Comment CODE # READ PORT 0 C F READ F 1 3 F LOAD M (PORT) 2 0 0 CLEAR CARRY # TEST PROGRAM 6 3 0 A ADD 10 # NOT 5 4 8 7 JNC 7 # JGE 5 5 E 7 SET PAGE # JUMP PROGRAM 6 6 8 0 JMP ADDR # TEST PROGRAM 5 7 0 1 ADD 1 # NOT 4 8 8 B JNC B # JGE 4 9 E 6 SET PAGE # JUMP PROGRAM 5 A 8 0 JMP ADDR # TEST PROGRAM 4 B 0 1 ADD 1 # NOT 3 C 8 F JNC F # JGE 3 D E 5 SET PAGE # JUMP PROGRAM 4 E 8 0 JMP ADDR # SET NEW PAGE 1 F E 1 PAGE 1 # TEST PROGRAM 3 10 0 1 ADD 1 # NOT 2 11 8 4 JNC 4 # JGE 2 12 E 4 SET PAGE # JUMP PROGRAM 3 13 8 0 JMP ADDR # TEST PROGRAM 2 14 0 1 ADD 1 # NOT 1 15 8 8 JNC 8 # JGE 1 16 E 3 SET PAGE # JUMP PROGRAM 2 17 8 0 JMP ADDR # TEST PROGRAM 1 18 0 1 ADD 1 # NOT 0 19 8 C JNC C # JGE 0 1A E 2 PAGE 2 # JUMP PROGRAM 1 1B 8 0 ADDR 0 # RETURN - NO PROGRAM SELECTED 1C E 0 SET PAGE 0 1D 8 0 JMP ADDR 0 1E 0 0 NOP 1F 0 0 NOP
One option is to add a comparator to the ALU and to use one of the flags (i.e. A<B, A=B and A>B) or the compliment to trigger the jump. It would have the advantage of not altering the accummulator.
---
I have been thinking more about this. It would be an efficient and useful to have a "TEST" register but it requires freeing up an op code. XOR being the one to use:
But rather than doing that I am thinking of an op code to select an alternate op code set.
PAGE and READ are similar so they could share the same op code, and the PAGE slot used to set alternate op codes:
- ADC/SBB
- LOAD/LOAD
- NAND/NOR
- XOR/XOR?
- JNC/JGE?
- STORE/STORE?
- READ/PAGE
- OPCODE?/OPCODE?
Another option is to use the most significant bit of the PAGE register.
ADC/SBB, the ADC requires an inverter on the input and on the output for SBB (have to check this). Does JNC become JNB which is JGE? No need for a dedicated comparator?
AlanX
-
Add with Carry
12/17/2022 at 00:18 • 0 commentsAdd with Carry
To date I have used plain and simple ADD with my DIY CPUs. ADC (i.e. add with carry) is useful for multi byte (nibble) addition as the carry is automatic. The downside is that the humble counter does not work, it skips 0. This can be fixed of course.
Here is the old counter using ADD:
20 LOAD 0 ; CLEAR ACC A0 STORE MEM[0] ; SAVE TO MEM[0] AF STORE MEM[F] ; OUTPUT 01 ADD 1 ; INCREMENT ACC 82 JNC 2 ; JUMP ON NOT CARRY 00 ADD 0 ; CLEAR CARRY 82 JNC 2 ; JUMP UNCONDITIONALLY 2
On on overflow you need to clear the carry so that the next JNC is an unconditional jump.
Here is the new counter using ADC:
00 ADC 0 ; CLEAR CARRY 20 LOAD 0 ; CLEAR ACC A0 STORE MEM[0] ; SAVE TO MEM[0] AF STORE MEM[F] ; OUTPUT 01 ADC 1 ; INCREMENT ACC 82 JNC 3 ; REPEAT 00 ADC 0 ; CLEAR CARRY 82 JNC 2 ; JUMP UNCONDITIONAL 2
So on overflow you need to both clear the carry and clear the accumulator.
Converting the ALU from ADD to ADC involves linking the Carry signal from the Control unit to the Carry In on the Adder in the ALU unit:
For completeness here is the Control unit:
The carry logic holds the carry (CY) until the next ADC instruction.
Multi-Nimble Arithmetic
Before moving on I tested an 8 bit counter using ADC:
00 ADC 0 ; CLEAR CARRY 20 LOAD 0 ; CLEAR ACC A0 STORE 0 ; CLEAR MEM[0] // LOW NIMBLE A1 STORE 1 ; CLEAR MEM[1] // HIGH NIMBLE AF STORE F ; CLEAR OUTPUT // LOW NIMBLE LOOP: C0 READ 0 ; PRESET ADDR TO MEM[0] 30 LOAD 0 ; LOAD LOW NIMBLE 01 ADC 1 ; INCR LOW NIMBLE A0 STORE 0 ; SAVE LOW NIMBLE AF STORE F ; OUTPUT LOW NIMBLE C1 READ 1 ; PRESET ADDR TO MEM[1] 31 LOAD 1 ; LOAD HIGH NIMBLE 00 ADC 0 ; ADD CARRY TO HIGH NIMBLE A1 STORE 1 ; SAVE HIGH NIMBLE JUMP LOOP: 00 ADC 0 ; CLEAR CARRY 85 JNC 5 ; UNCONDITIONAL JUMP
The ADC does not specifically need the JNC instruction.
The above code is 16 bytes long. A pretty strong justification for a 32 (or more) byte PROM system.
AlanX
-
Reverting Back to H/W CPU V1
12/15/2022 at 11:22 • 0 commentsReverting Back to H/W CPU V1
H/W CPU V1 has the paged Program Counter (PC) and most of the control hardware of H/W CPU V2/V3, I just have to delete the ALU tristate buffer, and edit the memory model.
I will call this version V4:
If you want to see meaningless "computer lights" flashing, here is the LogiSim animation:
The code been run is slightly more complicated than the minimum just to check some other instructions:
20 LOAD 0 ; Clear ACC AF Store MEM[F] ; Output ACC 01 ADD 1 ; ACC<=ACC+1 A0 STORE MEM[0] ; MEM[0]<=ACC C0 READ 0 ; ADDR<=0, preset memory fetch address 3X LOAD MEM[ADDR] ; ACC<=MEM[ADDR] AF STORE MEM[F] ; Output ACC 82 JNC X2 ; Jump if no carry 00 ADD 0 ; Clear carry 81 JNC X1 ; Unconditional jump
The op codes are:
Missing Op Codes?
You may think I am missing op codes such as SHR etc. But no, ADD and NAND are all you need. The rest can be synthesized from ADD and NAND. XOR is included because it needs 14 instructions to read from memory two operands, and to write to memory the resultant. XOR needs 5 instructions to do the same task.
Still, XOR could be swapped out for another instruction.
AlanX
-
32 Byte Diode PROM
12/15/2022 at 01:50 • 0 comments32 Byte Diode PROM
I felt that a 16 byte diode PROM was too small. You cannot fit a program with a subroutine in 16 bytes. So I designed a 32 byte diode PROM, but unfortunately I could not get the auto-router to work. SO the answer is two 16 byte daughter boards:
Here is the schematic:
One nice thing is that the boards are stackable.
I will have to put back the page circuity on the mother board.
---
I have sent the PCB off for fabrication. Usually takes 5 or 6 days.
---
I have made a mistake with regard to stacking the boards. You cannot get 20 pin long headers, typically they are 10 pin headers, and you need a gap between them. Oh well!
I designed a PCB for the 10 long pin headers:
But I am not getting them made until I use up the previous design.
AlanX