-
Run-time visualization of memory and I/O space
05/15/2023 at 03:34 • 0 commentsOperation
Visibility into what is exactly going on inside the target machine is crucial for debugging and development. There are few ways to accomplish this:
- give a host a real-time access into target system (e.g. dual port memory)
- host can access target system to take a "snapshot" as needed (e.g. DMA access)
- host can infer state of target system by "spying" on each memory or I/O access
If the tracer is configured to trace each memory access and/or I/O access, given that the trace goes to host, it is possible to store the address and value of such access and as the execution is ongoing create an accurate map of the memory.
In the trace above it is easy to see that memory content of 0x0503 is 0xCD (and it is instruction), and 0x0501 is data memory, content 0x37. Also port 0x10 has value of 0x02 etc.
Mapped onto memory grid which has been added to tracer:
And I/O:
In this sample session, it is easy to observe how and where a Tiny Basic program is stored in the memory, and how it gets there from input buffer when the input line is terminated (Enter key)
Implementation
There are 3 components that allow the run-time visualization to work:
"debugtracer" component embedded in target device - as described in the other project log, the logic in this component is comparing the state of control signals (M1, /IORD, /IOWR, /MEMRD, /MEMWR) at each CPU clock with the state of 5-bit internal register reg_sel. If any match, this will flip the READY signal low, "freezing" the CPU in the current cycle (actually due to a bug in my design this is done by stopping the clock signal, which would not be possible in read device because 8080-family processors were not static (few were at the time, except the CDP 1802 unique in that regard too).
reg_sel can be loaded from 5 on/off switches on the FPGA baseboard, but this is done when the "load" signal (connected to push button) is asserted. This way one trace configuration can be running while the other is being prepared. With reg_sel cleared, CPU runs at full speed, without traces being sent out.
UART sender - a simple 16-to-1 MUX sends out the ASCII character assembled from data on the bus and template that depends on actual CPU cycle. The operation is as follows:
- match is detected, CPU is stopped, bus is stable and not changin
- a 12 character record is output to UART. A simple 8-bit counter clocked by txd_clk (e.g. 38400 or 57600 Hz). Lower 4-bits drive the UART 16-to-1 MUX and upper 4 bits the character selection.
signal counter: std_logic_vector(7 downto 0); alias chrSel: std_logic_vector(3 downto 0) is counter(7 downto 4); alias bitSel: std_logic_vector(3 downto 0) is counter(3 downto 0);
- 13th character (chrSel == 0xC) is "null" ASCII (0x00) - this resets the sender circuit and also resets the ready signal. debugtracer is now ready to "catch" the next bus cycle match.
Note that the UART is one way only from target device to host, and is completely different serial channel from the bidirectional UART that allows interaction with Basic on the single board microcomputer.
Tracer utility running on the host - The code for this C# utility is here. Let's say that the record received is:
MR,0761 ED<cr><lf>
Upon assembling the record incoming on serial port the record descriptor (memory read == MR) will be detected:
case "MR": // read memory (except M1) if (CheckRecipientAndRecord(memoryMap, recordValue.Split(' '), out address, out data)) { CheckLimit(memoryMap.UpdateRead(address, data, ref pause), traceRecord); } Console.ForegroundColor = ConsoleColor.Blue; // BLUE for not implemented trace record type Console.WriteLine(traceRecord); break;
If the address and data look valid, check limit will validate if the address is within displayable range (64k for memory, 256 for I/O, although restricted in this case to 4k because the SBC has 2k ROM and 2k RAM). Then the memoryMap will be updated to hold the record of read operation:
public bool UpdateRead(int address, byte data, ref bool pause) { if (address < this.Size) { if (fetchDictionary.ContainsKey(address)) { ReportMemoryIssue(false, $"Reading (not executing) code at {address:X4}"); fetchDictionary.Remove(address); } if (writeDictionary.ContainsKey(address)) { if (data != writeDictionary[address]) { ReportMemoryIssue(false, $"Reading {data:X2} from {address:X4}, expected {writeDictionary[address]:X2}"); pause = true; } writeDictionary.Remove(address); } AddOrUpdateEntry(readDictionary, address, data, 'R'); return true; } return false; }
Interesting system memory consistency checks can be now done:
- if previously instruction fetch came from same location, probably there is runaway execution into non-program memory, and terrible things will happen soon
- if previously memory write to same location wrote different byte value, possibly the memory is corrupted, or hardware is malfunctioning
Receiving characters from target device is done in a event handler which triggers at any COM5 port character. In the meantime, the main loop of the program is checking the keyboard:
while (!exit) { key = Console.ReadKey(); switch (key.KeyChar) { // TODO: clear instruction counter on some key case ' ': comPort.RtsEnable = !comPort.RtsEnable; break; case 'c': case 'C': case 'm': case 'M': case 'i': case 'I': if (inspector == null) { inspector = new InspectorForm(sourceFileName, $"Tracer inspector window for {comInfo}", memoryMap, ioMap); System.Threading.Thread formShower = new System.Threading.Thread(ShowForm); formShower.Start(inspector); } else { inspector.BringToFront(); } inspector.SelectTab(key.KeyChar); break; case 'x': case 'X': // leave it in enabled state exit = true; comPort.RtsEnable = true; GenerateProfilerReport(); break; default: break; } }
Pressing c (code), i (I/O), m (memory) will bring up the InspectorForm, and switch to the desired tab.
InspectorForm is a windows form window, which has 3 tabs, 1st contains a read-only text box to display the loaded .lst file content, 2nd contains a DataGridView to display memory, and 3rd one DataGridView for I/O:
internal InspectorForm(string codeFile, string caption, StoreMap<StoreMapRow> memoryMap, StoreMap<StoreMapRow> ioMap) { InitializeComponent(); this.Load += new EventHandler(InspectorForm_Load); this.Text = caption; this.tabPageMem.Controls.Add(dataGridView1); this.tabPageIO.Controls.Add(dataGridView2); this.codeFile = codeFile; this.memoryMap = memoryMap; this.ioMap = ioMap; }
While the 1st tab with the .lst file text box can be initialized right away and never changes again, the DataGridViews need to respond to any changes in the underlying StoreMap objects , therefore they subscribe to events generated by memory / IO store to refresh the display.
private void InspectorForm_Load(object sender, EventArgs e) { // 1st tab contains code text if (!string.IsNullOrEmpty(codeFile)) { string fileNameAndExtension = codeFile.Substring(codeFile.LastIndexOf("\\") + 1); textBox1.Text = File.ReadAllText(codeFile); textBox1.Font = new Font(FontFamily.GenericMonospace, 12.0f, FontStyle.Regular); tabControl1.TabPages["tabPageCode"].Text = $"Code ({fileNameAndExtension})"; } // 2nd tab contains Memory data grid InitGridView(this.dataGridView1, memoryMap); // Connect the virtual-mode events to event handlers. this.dataGridView1.CellValueNeeded += new DataGridViewCellValueEventHandler(dataGridView1_CellValueNeeded); this.dataGridView1.NewRowNeeded += new DataGridViewRowEventHandler(dataGridView1_NewRowNeeded); this.dataGridView1.RowDirtyStateNeeded += new QuestionEventHandler(dataGridView1_RowDirtyStateNeeded); // subscribe to store map changes! this.memoryMap.StoreUpdatedEvent += MemoryMap_StoreUpdatedEvent; // 3rd tab contains IO data grid InitGridView(this.dataGridView2, ioMap); // Connect the virtual-mode events to event handlers. this.dataGridView2.CellValueNeeded += new DataGridViewCellValueEventHandler(dataGridView2_CellValueNeeded); this.dataGridView2.NewRowNeeded += new DataGridViewRowEventHandler(dataGridView2_NewRowNeeded); this.dataGridView2.RowDirtyStateNeeded += new QuestionEventHandler(dataGridView2_RowDirtyStateNeeded); // subscribe to store map changes! this.ioMap.StoreUpdatedEvent += IoMap_StoreUpdatedEvent; }
-
CPU - the execution unit
04/07/2023 at 02:18 • 0 commentsContext
Given that this CPU implementation is an almost canonical example of microcoded design as envisioned by AMD - and a showcase of their Am29XX and Am25XX ICs - it is very helpful to go over at least chapters I and II of the "Bit-slice microprocessor design" book for better understanding. After that, the application note provides a great explanation of this specific CPU implementation. All source files to implement the CPU are under this folder.
(for control unit, which the other major part of the CPU, see this log)
Execution unit
This part of the CPU is where registers (both program accessible AF, BC, DE, HL, PC, SP) and temporary / internal are held, and modified by passing through ALU and other data paths.
The central component of the execution unit is a set of 4 Am2901 bit-slices. This fascinating chip was the de-facto standard during the heyday of the era (1970ies), although Intel, MMI, and Texas Instruments had bit-slices too.
The most important question when designing with bit-slices is how to map the design registers (program accessible and internal only) to the available set of slice registers. Intel 8080 (and 8085) has 6 16-bit program accessible registers so they can be mapped in different ways, for example:
Mapping
--------- (registers 0..15)Number of slices Pros Cons By 8-bit register
----------
B
C
D
E
H
L
M
A
SP.H
SP.L
PC.H
PC.L
?
?
?
?2 Cost savings! (only 2 slices), fast for 8-bit operations, max register utilization Slow for 16-bit operations, additional external 16-bit register needed By 16-bit register pair
----------
BC
DE
HL
MA
SP
PC
??
??
??
??
??
??
??
??
??
??4 Fast for 16-bit operations, simpler design 4 slices needed, additional external MUXs and other logic for 8-bit operations, slower for 8-bit operations, many unused registers (could be viable for Z80) Mixed
---------
BC
CB
DE
ED
HL
LH
?A
A?
??
SP
??
??
0038
3800
??
PC4 Overall good speed for both 8 and 16-bit operations
Note: this approach was adopted by AMD engineers for this design
4 slices needed, additional external MUXs and other logic(for comparison, see the deep dive into real implementation of registers in Intel 8085 which was improved version of 8080)
To see how register mapping works in hardware and microcode, here are 2 examples:
8-bit operation, MOV B, E
Op-code format is 01 ddd sss (B = 000, E = 011) => 01000011 => 043H
Looking up 043H in mapper ROM we find the start address of the microcode routine to implement the operation which is 014H (1 cycle after which there is jump to label HLDF):
;0014 MOVRR: ALU,,,FTOB.F & ALUC & BASW SW,SW & OR & ZA & IOC & /IF R.F, INV,HOLD & NUM, HLDF & NOC 0014 1100000000101111 1010101111110000 0111010101010100 11011100
The action part is 9-bit Am2901 operation (highlighted):
DST = 011 = RAMF
OPR = 011 = OR
SRC = 100 = ZA
Which means, register addressed by 4-bit address A (am2901_a) will be OR'd with 0 (so no change) and deposited to register addressed through B 4-bit address (am2901_b). As the upper 8-bytes are ordered B, C, D, E, it is clear that right 8080 internal register transfer will occur (these bytes are in the HOP = high order part 2 slices):
-- HOP slices --- u33: Am2901c port map ( clk => CLK, a => am2901_a, b => am2901_b, d => am2901_data(11 downto 8), i(8 downto 6) => pl_alu_destination, i(5 downto 3) => pl_alu_function, i(2 downto 0) => pl_alu_source, c_n => u64_pin4, oe => '0', ram0 => signal_b, ram3 => am2901_ram11, qs0 => signal_a, qs3 => am2901_q11, y => am2901_y(11 downto 8), g_bar => open, p_bar => open, ovr => open, c_n4 => am2901_c11, f_0 => u33pin11, f3 => open, -- DEBUG PORT -- debug_regsel => am2901_dbg_sel, debug_regval => am2901_dbg_val(11 downto 8) );
But looking at the lower 8-bits, they are reversed (C, B, E, D). That's why in the actual wiring of the processor upper 2 Am2901 slices directly get the A, B fields from microcode or instruction, but lower 2 get from signal that is sometime reversed, dependent on 8/16 bit mode (these bytes are in the LOP = low order part 2 slices):
u63: Am25LS153 port map ( sel(1) => pl_bswitch, sel(0) => pl_not8or16, n1G => '0', n2G => '0', in1(3) => am2901_a(0), in1(2) => u62_pin2, in1(1) => am2901_a(0), in1(0) => u62_pin2, in2(3) => '0', in2(2) => u62_pin4, in2(1) => am2901_b(0), in2(0) => u62_pin4, out1 => u63_pin7, -- am2901_a(0) for LOP slices out2 => u63_pin9 -- am2901_b(0) for LOP slices ); -- LOP slices --- u43: Am2901c port map ( clk => CLK, a(3 downto 1) => am2901_a(3 downto 1), a(0) => u63_pin7, b(3 downto 1) => am2901_b(3 downto 1), b(0) => u63_pin9, d => am2901_data(3 downto 0), i(8 downto 6) => pl_alu_destination, i(5 downto 3) => pl_alu_function, i(2 downto 0) => pl_alu_source, c_n => pl_carryin, oe => '0', ram0 => am2901_ram0, ram3 => am2901_ram3, qs0 => am2901_q0, qs3 => am2901_q3, y => am2901_y(3 downto 0), g_bar => open, p_bar => open, ovr => open, c_n4 => am2901_c3, f_0 => u43pin11, f3 => open, -- DEBUG PORT -- debug_regsel => am2901_dbg_sel, debug_regval => am2901_dbg_val(3 downto 0) );
With this clever register mapping, all 8-bit operations can be done in 1 cycle (much faster than original processor). Note that otherwise for instruction such as ADC L (source is in low byte, destination in high) some sort of swapping would be required, adding 1 more microcode clock cycle to execution.
16-bit operation INX B
Op-code format is 000000rr (B = 11) => 0000011 => 0003H
Looking up location 3 in the mapper:
; http://www.pastraiser.com/cpu/i8080/i8080_opcodes.html ;PC MICROWORD IN HEX SOURCE CODE ; 0000 086; NOP: FF H#086 0001 022; LXIB: FF H#022 0002 0DF; STAXB: FF H#0DF 0003 06D; INXB: FF H#06D 0004 0AB; INRB: FF H#0AB 0005 0AA; DCRB: FF H#0AA 0006 01B; MVIB: FF H#01B 0007 05A; RLC: FF H#05A 0008 000; FF 12X 0009 071; DADB: FF H#071 000A 0DC; LDAXB: FF H#0DC 000B 06F; DCXB: FF H#06F 000C 0AB; INRC: FF H#0AB 000D 0AA; DCRC: FF H#0AA 000E 01B; MVIC: FF H#01B 000F 05D; RRC: FF H#05D
Means the the microcode start address is 06DH:
;006D INXB: ALU DOUBLE,B,,FTOB.F & PLUS & ZB & ALUC & BASW & IOC & HLD 006D 1100000000101001 0110100111110000 0011011101000000 11000011 ; ;006E ALU DOUBLE,B,C,FTOB.A & OR & DZ & ALUC,SWAP & BASW & /IOC & NOC & IF R.F.INV,HOLD & NUM,HLDF 006E 1100000000101111 1010100111110000 0011111000000010 10011111
We find two microinstructions (note that jump to label HLDF is part of the execution of the 2nd microinstruction, an important feature of microcode, usually CPU instruction sets on assembly level execute operation and then jump in two separate steps)
DST = 011 = RAMF
OPR = 000 = ADD
SRC = 011 = ZB
Which is R[B] = R[B] + 0 + carry_in, where carry in will be driven to 1, and B address is coming from microcode directly as 00, resulting in 16-bit increment executed in 1 microinstruction.
But now the flipped-pair CB is out of sync, so needs to be updated by a copy operation:
DST = 010 = RAMA
OPR = 011 = OR
SRC = 111 = DZ
DST code 010 is interesting because it allows simultaneous appearance of R[A] on the output Y, while loading R[B]. Register pair BC (located at 0000) is made to appear on Am2901 Y outputs, and register pair CB (located at 0001) is loaded from D 16-bit inputs. It is clear that the D in this case must be driven by swapped bytes from Y:
-- 2901 data mux --- u53: Am25LS157 port map ( a => bl(3 downto 0), b => am2901_y(11 downto 8), s => signal_swap, nG => '0', y => am2901_data(3 downto 0) ); u54: Am25LS157 port map ( a => bl(7 downto 4), b => am2901_y(15 downto 12), s => signal_swap, nG => '0', y => am2901_data(7 downto 4) ); u55: Am25LS157 port map ( a => bl(3 downto 0), b => am2901_y(3 downto 0), s => signal_swap, nG => '0', y => am2901_data(11 downto 8) ); u56: Am25LS157 port map ( a => bl(7 downto 4), b => am2901_y(7 downto 4), s => signal_swap, nG => '0', y => am2901_data(15 downto 12) );
-
CPU - the control unit
04/04/2023 at 05:22 • 0 commentsContext
Given that this CPU implementation is an almost canonical example of microcoded design as envisioned by AMD - and a showcase of their Am29XX and Am25XX ICs - it is very helpful to go over at least chapters I and II of the "Bit-slice microprocessor design" book for better understanding. After that, the application note provides a great explanation of this specific CPU implementation. All source files to implement the CPU are under this folder.
(for execution unit, which the other major part of the CPU, see this log)
Control unit
(for good discussion of these refer to "Bit-Slice Design: Controllers and ALUs" by D. E. White)
The key to micro-coded CPUs/controllers is their control unit. Typically this control unit has a very limited set of instructions it can recognize:
- has a program counter ("micro-program counter" == uPC)
- uPC can be modified in a very simple way: increment, hold the same, jump, jump to an externally provided address
- sometimes subroutine calls and returns are supported (to a limited level, 2-4)
This is how the control unit for Am9080 looks like:
Let's identify and describe each element as defined in the code:
- Instruction register
8-bit register with input connected to the D-bus (instructions come always from there, either through memory read, or presented by external hardware as a result to INTA (interrupt acknowledge cycle)
-- instruction register --- u1516: am25ls377 port map ( clk => CLK, nE => pl_instregenable, d => DBUS, q => current_instruction );
current_instruction is the 8-bit opcode of the currently executing machine instruction. Where is the nE (load enable) signal coming from? From microcode instruction which is executed during instruction fetch:
;0004 FETCH: ALU DOUBLE,PC,PC,FTOB.F & OR & ZA & ALUC & BASW & /IOC IN,,TO.A & MEMR & IF ,INV,READY & NUM, $ 0004 0100000000010000 0010010111010001 0011011111111110 11011100; ; ;0005 INCPC & IF D.R. ,HOLD & NUM,HLDD & NOC 0005 1100000000110000 1110100111110001 0011011111111110 11000100;
in other words, signal pl_instregenable is microinstruction bit 55 (MSB) and this is the only time when it appears low.
- mapping PROM
as explained here, this memory is a "many to one" lookup table.
-- u11, u12, u13 ---------- mapper_rom: rom256x12 Port map ( address => current_instruction, data => instruction_startaddress(11 downto 0) );
For each op-code presented as 8-bit address (current_instruction) a 12-bit data will appear on the output (only 9-bits are of interest as the microprogram memory is 512 words deep). The instruction_startaddress is then presented as one input of the Am2909-12 device:
- Sequencer
In original schema these are 3 4-bit Am2909 devices, merged here together:
-- to save some FPGA area, 3 * 2909 = 1 * 2909-12 u21u22u23: am2909x12 port map ( S => sequence(1 downto 0), R => u_immediate, D => instruction_startaddress, ORi => interrupt_or_mask, nFE => sequence(3), PUP => sequence(2), nRE => '0', nZERO => nRESET, nOE => '0', CN => '1', CLK => CLK, -- Output ports Y => ma, C4 => open );
Looking inside the Am2909-12, we see that the 12-bit uPC inside the Am2902-12 will be loaded when sequence(3 downto 0) == "1011"
The sequence is coming from another simple lookup table:
--- sequencer rom ---- u14: rom32x8 port map ( -- TODO: it is actually 16*5 only nCS => '0', address(3 downto 1) => pl_nextinstrselect, address(0) => u8474_u8475_pin15, data(4 downto 0) => sequence );
This sequence is marked "D" and the address input must be 0010 (2) to return it:
type rom is array(0 to 15) of std_logic_vector(4 downto 0); constant lookup: rom := ( "01000", -- C "01001", -- R "01011", -- D "01001", -- R "01000", -- C "00101", -- SBR "01001", -- R "00010", -- RTN "11010", -- F "00101", -- SBR "00000", -- POP "00001", -- PR "01001", -- R "00100", -- PUSH "01001", -- R "11010" -- F );
However we see that the microinstruction word only defines upper 3 bits of this lookup address, the lowest address bit of the lookup table is actually the result of the condition check (u8474_u8475_pin15 is output of a 16-to-1 MUX). This means that the sequencer instruction are "paired" - XXX0 defines what Am2909-12 should execute when condition fails and XXX1 when condition passes. For example:
if (condition) then C else R //continue else load from R reg)
if (condition) then D else R //load from D else load from R reg)
if (condition) then C else SBR //continue else jump to subroutine
etc...
Searching for D/R combination in microcode, we find it only in instructions 005 and 00C:
;0004 FETCH: ALU DOUBLE,PC,PC,FTOB.F & OR & ZA & ALUC & BASW & /IOC IN,,TO.A & MEMR & IF ,INV,READY & NUM, $ 0004 0100000000010000 0010010111010001 0011011111111110 11011100; ; ;0005 INCPC & IF D.R. ,HOLD & NUM,HLDD & NOC 0005 1100000000110000 1110100111110001 0011011111111110 11000100; ; ;0000 ;0000 ;HOLD AND MEMORY REFERENCE SUBROUTINES AND HANDLERS: ;0000 ; ;0000 ORG 10 ; ;000A HLDSB: NALU & IOC & HLDA & IF R.RTN, INV, HOLD & NUM, $ 000A 1100000000101001 1010100111111000 0011010101010100 01XXXXXX ; ;000B HLDF: NALU & IOC & HLDA & IF R.F, INV,HOLD & NUM, $ 000B 1100000000101111 1010100111111000 0011010101010100 01XXXXXX ; ;000C HLDD: NALU & IOC & HLDA & IF D.R,,HOLD & NUM, $ 000C 1100000000110000 1110100111111000 0011010101010100 01XXXXXX
Highlighed are bit 41, 40, 39 because this 3 bit field is the pl_nextinstrselect:
-- 39-41 3 Next Instruction Select alias pl_nextinstrselect: std_logic_vector(2 downto 0) is pl(41 downto 39);
As expected, start address of the instruction is loaded into uPC right after fetch, and in the same cycle PC is incremented.
- microprogram memory
with size of 512*56, it is clear that the address will be lower 9 bits coming from Am2909-12 sequencer output, and the data will be the 56-bits that drive every other signal in the design
--- microcode rom --- microcode_rom: rom512x56 Port map ( address => ma(8 downto 0), data => u );
The 56-bit word is split into "fields" of various sizes. Unlike some more complex microcode architectures which have variable meaning fields of differing sizes, here every microinstruction has the same fields:
signal u: std_logic_vector(55 downto 0); -- microcode output signal pl: std_logic_vector(55 downto 0); -- microcode register --------------------------------------- -- Bits Length Description I --------------------------------------- -- 0-2 3 ALU Source (I0-I2 of the Am2901A's) alias pl_alu_source: std_logic_vector(2 downto 0) is pl(2 downto 0); -- 3-5 3 ALU Function (I3-I5 of the Am2901A's) alias pl_alu_function: std_logic_vector(2 downto 0) is pl(5 downto 3); -- 6-8 3 ALU Destination (I6-I8 of the Am2901A's) alias pl_alu_destination: std_logic_vector(2 downto 0) is pl(8 downto 6); -- 9-12 4 ALU "B" Address alias pl_alu_b: std_logic_vector(3 downto 0) is pl(12 downto 9); -- 13-16 4 ALU "A" Address alias pl_alu_a: std_logic_vector(3 downto 0) is pl(16 downto 13); -- 17 1 Single/Double Byte alias pl_not8or16: std_logic is pl(17); -- 18 1 Cn for least significant Am2901A slice alias pl_carryin: std_logic is pl(18); -- 19 1 Rotate and Swap Control (formatted) alias pl_rotateorswap: std_logic is pl(19); -- 20-21 2 Update/keep flags alias pl_updateorkeepflags: std_logic_vector(1 downto 0) is pl(21 downto 20); -- 22 1 "A" Address Switch alias pl_aswitch: std_logic is pl(22); -- 23-24 2 Am2901A Output Steering Control alias pl_outputsteer: std_logic_vector(1 downto 0) is pl(24 downto 23); -- 25-26 2 Data Bus Enable Control alias pl_databusenable: std_logic_vector(1 downto 0) is pl(26 downto 25); -- 27-32 6 HLDA, MEMW, MEMR, I/OW, I/OR, INTA (Am9080A System Control Outputs) alias pl_syscontrol: std_logic_vector(5 downto 0) is pl(32 downto 27); -- 33 1 "B" Address Switch alias pl_bswitch: std_logic is pl(33); -- 34-37 4 Condition Code Select alias u_condcode: std_logic_vector(3 downto 0) is u(37 downto 34); -- 38 1 Condition Code Polarity Control alias u_condpolarity: std_logic is u(38); -- 39-41 3 Next Instruction Select alias pl_nextinstrselect: std_logic_vector(2 downto 0) is pl(41 downto 39); -- 42-53 12 Numerical Field alias u_immediate: std_logic_vector(11 downto 0) is u(53 downto 42); -- 54 1 Numerical Field to Data Bus Control alias pl_immediatedatabus: std_logic is pl(54); -- 55 1 Instruction Register Clock Enable alias pl_instregenable: std_logic is pl(55); -----------------------------------------
To note:
- Control unit is driven by fields in the microinstruction too - bits 34 to 53 are "looped back" to control execution. Essentially this is defining a "if (condition) then / else" statements. For this to work, u_condcode, u_polarity, pl_nextinstrselect, u_immediate fields are needed. u_immediate is a 12-bit data field which allows "GOTO xxx" or "GOSUB xxx" by providing the target microinstruction address xxx.
- Control signals that come out from microprogram memory are consumed:
- directly, as "u_xxx" signals - this means that they are available in the same cycle (with delay) as the microinstruction that is executing
- indirectly, through the "pipeline" register (pl_xxx signals) - this register is 44 bits long and is holding the control signals stable while the controller is deciding on which next microinstruction to execute.
Some pipeline pl_xxx signals are directly connected to CPU control pins:
u7172: Am25LS374 port map ( clk => CLK, nOE => pl(27), d(0) => '0', -- ignored d(1) => '0', -- ignored d(2) => u83_pin6, d(3) => u(32), d(4) => u(31), d(5) => u(30), d(6) => u(29), d(7) => u(28), y(0) => open, y(1) => open, y(2) => WAITOUT, y(3) => pl(32), --nINTA, y(4) => pl(31), --nIOR, y(5) => pl(30), --nIOW, y(6) => pl(29), --nMEMR, y(7) => pl(28) --nMEMW ); HLDA <= not pl(27); nINTA <= pl(32); nIOR <= pl(31); nIOW <= pl(30); nMEMR <= pl(29); nMEMW <= pl(28);
-
Initializing ROM during build-time from files
03/29/2023 at 06:42 • 0 comments(for related subject, run-time initialization of ROMs/RAMs, see here)
FPGA-based designs often have various read-only stores. Their content can be defined in different ways:
- Inline in the source code (appropriate for smaller ROMs)
- Using vendor-specific file formats which are integrated in the toolchain
- Custom parsing and loading during build time
Last option was used in this project. There are 3 ROM stores that needed initializing and they have different formats:
Tiny Basic ROM (2k*8)
Assembling Tiny Basic source using zmac assembler produces multiple output files, one of which is in Intel hex format. Given that Xilinx was (and after AMD takeover) remains Intel / Altera competitor, it's freeware ISE 14.7 does not offer direct support for .hex files. However, creating a parser for it is relatively straightforward, esp. if some fancier features (not used in the file that needs to be ingested) are left out. The "magic" is visible in the ROM source code file:
architecture Behavioral of rom1k is -- function defined in the package pulls in the content of the -- hex file in generic parameter constant rom: filemem(0 to (2 ** address_size) - 1) := init_filememory(filename, 2 ** address_size, default_value); --attribute rom_style : string; --attribute rom_style of rom : constant is "block"; begin D <= rom(to_integer(unsigned(A))) when (nOE = '0') else "ZZZZZZZZ"; end Behavioral;
The usual inline ROM initialization similar to (note type is an array fixed in both depth and width dimensions):
type mem16x16 is array(0 to 15) of std_logic_vector(15 downto 0); constant decode4to16: mem16x16 := ( "1111111111111110", "1111111111111101", "1111111111111011", "1111111111110111", "1111111111101111", "1111111111011111", "1111111110111111", "1111111101111111", "1111111011111111", "1111110111111111", "1111101111111111", "1111011111111111", "1110111111111111", "1101111111111111", "1011111111111111", "0111111111111111" );
is replaced by a function call (note array type has variable depth but fixed 1 byte width):
type filemem is array(natural range <>) of std_logic_vector(7 downto 0); impure function init_filememory(file_name : in string; depth: in integer; default_value: std_logic_vector(7 downto 0)) return filemem;
This function will be invoked during build time, and it will use file_name, 2^address_size and default byte value as parameters to run. The init_filememory() function can be found in the package source file which is included in each project source as needed.
The gist of the function is a line-by-line read of the file referenced in the parameter. The beginning of the record (<colon><bytecount><address><recordtype>) is always the same, and then record type is inspected to be either 00 (data) or 01 (end of file), other types are not supported. Bytes are then parsed from rest of the line (the number of expected hex digits is known) and written to the temporary variable. When whole file is parsed, input file is closed and the temporary variable returned as result of the function call, which creates data structure compatible with the constant ROM definition file.
This build-time initialization works for RAMs too. In that case "constant rom" should be replaced by "signal ram" and code to write content when write and select are asserted added.
The other 2 ROMs that needed initialization are in the CPU itself.
Mapping PROM (256*12)
The contents of this memory is taken directly from the article, and the format is:
<AAAA> <DDD>[;comments]
AAAA, DDD are hex characters to define address (00H-FFH, upper 2 are not used) and data (000H-FFFH). This simple format is easy to parse, and the code is under load_mem() function in the VHDL component file.
load_mem() and dump_mem() are wrapped into common init_wordmemory() function:
constant data_from_file: t_mem256x12 := init_wordmemory("../am9080/prom/mapper.mif", "../am9080/prom/mapper.hex", 256, uPrgAddress_nop);
This way, the side effect of loading the memory during the build time is also a generation of a "check contents" file that allows to see if all data has been properly ingested - a big sanity saver!
: 10 0000 00 086 022 0DF 06D 0AB 0AA 01B 05A 000 071 0DC 06F 0AB 0AA 01B 05D B9 : 10 0010 00 000 0E5 0DF 0F1 0AB 0AA 01B 05F 000 073 156 0F6 0AB 0AA 01B 060 ED : 10 0020 00 000 0E9 0D3 0F3 0AB 0AA 01B 13F 000 074 0C9 0F8 0AB 0AA 01B 0C8 35 : 10 0030 00 000 0ED 02D 0F5 0A6 0A2 01E 09C 000 075 026 0FA 0AB 0AA 01B 09D 4D : 10 0040 00 014 014 014 014 014 014 018 014 014 014 014 014 014 014 018 014 B8 : 10 0050 00 014 014 014 014 014 014 018 014 014 014 014 014 014 014 018 014 B8 : 10 0060 00 014 014 014 014 014 014 018 014 014 014 014 014 014 014 018 014 B8 : 10 0070 00 015 015 015 015 015 015 082 015 014 014 014 014 014 014 018 014 47 : 10 0080 00 034 034 034 034 034 034 035 034 03A 03A 03A 03A 03A 03A 08C 03A 3D : 10 0090 00 0AC 0AC 0AC 0AC 0AC 0AC 0AD 0AC 0B2 0B2 0B2 0B2 0B2 0B2 0B4 0B2 0D : 10 00A0 00 09F 09F 09F 09F 09F 09F 0B9 09F 0A0 0A0 0A0 0A0 0A0 0A0 0BC 0A0 D2 : 10 00B0 00 0A1 0A1 0A1 0A1 0A1 0A1 0BF 0A1 076 076 076 076 086 076 079 076 17 : 10 00C0 00 10D 07C 107 042 10A 064 038 055 114 050 10E 000 111 047 03F 055 D5 : 10 00D0 00 11B 0FB 115 08B 118 064 0B0 055 122 000 11C 087 11F 000 0B7 055 D9 : 10 00E0 00 129 101 123 092 126 064 0C2 055 130 09B 12A 159 12D 000 0C4 055 EC : 10 00F0 00 137 150 131 090 134 068 0C6 055 13E 091 138 08F 13B 000 077 055 64 : 00 0000 01 FF
Looking at the contents of mapper.hex file above, it becomes obvious which 8080 instructions are implemented at which locations of microcode. Many instructions map to same microcode entry points, because they only differ by some parameter (e.g. register number) in instruction itself, for example:
014H - MOV r, r
015H - MOV M, r
018H - MOV r, M
082H - HLT (this would theoretically be MOV M, M looking at the register encoding, but such an instruction does not make sense (except as a long delay) so has been repurposed into HLT)
Microprogram memory (512*56)
This memory is the brains of any micro-coded design. 56-bit width immediately indicates that it is a "horizontal" microcode store/design. For example, a much more complex 8086-processor has 512*21 bit microcode, which gives it away as "vertical". The contents of this data file have been created by AMDASM assembler, which I don't have. But the original article has both the listing and resulting output. So I commented the source lines and ingested only the data part in following format:
<AAAA> <b16> <b16> <b16> <b8>[; comments]
AAAA - address in hex (000H-01FFH valid)
b16 - 16-bit binary (X don't care are interpreted as 0)
b8 - 8 bit binary (X don't care are interpreted as 0)
The source code compiled during build-time again contains a helpful "dump data" function, and in this case it tries to reverse look-up the values from the microcode fields to print them in a human readable form. This provides a simple but very helpful microcode debugging and documentation technique.
constant data_from_file: t_uinstruction512 := init_wordmemory("../am9080/prom/microcode.mif", "../am9080/prom/microcode.lst", 0, 512, uCode_default); attribute rom_style : string; attribute rom_style of data_from_file : constant is "block"; begin data <= data_from_file(to_integer(unsigned(a8))); end Behavioral;
Below is the snippet from the generated microcode.lst file for the implementation of MOV r,r MOV r, M and MOV r, M instructions mentioned in the mapping description above. With the listing file format they start to make sense.
----------------------------------------------------------------------------------------- I D DIRECT-VALUE NXT P COND B SYSCTL OE OS A UK S C W AADR BADR DST FCT SRC ----------------------------------------------------------------------------------------- 0000 - m 038 C/R - TRUE 0 NOC -- INTE 0 11 0 1 16 R_PC R_PC RAMF AND ZA 0001 - - 000 C/R - TRUE 0 NOC -- ---- 0 11 0 1 8 RAS1 R38Z RAMF OR DZ 0002 - - 000 C/R - TRUE 0 NOC -- ---- 0 11 0 1 8 RAS1 RZ38 RAMF AND ZA 0003 - - 000 R/PUSH ! TRUE 0 NOC -- ADDR 0 11 0 1 8 RAS1 RAS1 RAMF AND ZA 0004 i - =*= C/R - RDY 0 MEMR -- ADDR 0 11 0 1 16 R_PC R_PC RAMF OR ZA 0005 - - 00C D/R ! HOLD 0 NOC -- ADDR 0 11 0 1 16 R_PC R_PC RAMF ADD ZA 0006 (uninitialized) 0007 (uninitialized) 0008 (uninitialized) 0009 (uninitialized) 000A - - =*= R/RTN - HOLD 0 HLDA -- ---- 0 11 0 1 8 RAS1 RAS1 NOP EXNOR DZ 000B - - =*= R/F - HOLD 0 HLDA -- ---- 0 11 0 1 8 RAS1 RAS1 NOP EXNOR DZ 000C - - =*= D/R ! HOLD 0 HLDA -- ---- 0 11 0 1 8 RAS1 RAS1 NOP EXNOR DZ 000D - - =*= R/RTN ! RDY 0 MEMR -- ADDR 0 11 0 1 16 R_PC R_PC RAMF OR ZA 000E - - =*= R/RTN ! RDY 0 MEMW YH ADDR 0 11 0 1 16 R_PC R_PC RAMF OR ZA 000F - - =*= R/F ! RDY 0 MEMW YH ADDR 0 11 0 1 16 R_PC R_PC RAMF OR ZA 0010 - - =*= R/F ! RDY 0 MEMR -- ADDR 0 11 0 1 16 R_PC R_PC RAMF OR ZA 0011 - - =*= R/RTN ! RDY 0 MEMR -- ADDR 0 11 0 1 16 R_SP R_SP RAMF OR ZA 0012 - - =*= R/RTN ! RDY 0 MEMW YH ADDR 0 11 0 1 16 R_SP R_SP RAMF OR ZA 0013 - - =*= R/RTN ! RDY 0 MEMW YL ADDR 0 11 0 1 16 R_SP R_SP RAMF OR ZA 0014 - - 00B R/F - HOLD 1 NOC -- ---- 1 11 0 1 8 RAS1 RAS1 RAMF OR ZA 0015 - - 00A C/SBR ! HOLD 0 NOC -- ADDR 0 11 0 1 16 R_HL RAS1 NOP OR ZA 0016 - - 00A C/SBR ! HOLD 0 NOC -- DATA 1 11 0 1 8 RAS1 RAS1 NOP OR ZA 0017 - - =*= R/F ! RDY 0 MEMW YH ADDR 0 11 0 1 16 R_PC R_PC RAMF OR ZA 0018 - - 00A C/SBR ! HOLD 0 NOC -- ADDR 0 11 0 1 16 R_HL RAS1 NOP OR ZA 0019 - - 00D C/SBR - RDY 0 MEMR -- ADDR 0 11 0 1 16 R_PC R_PC RAMF OR ZA 001A - - 00B R/F - HOLD 1 NOC -- ---- 0 11 0 1 8 RAS1 RAS1 RAMF OR DZ 001B - - 00D C/SBR - RDY 0 MEMR -- ADDR 0 11 0 1 16 R_PC R_PC RAMF ADD ZA 001C - - 00B R/F - HOLD 1 NOC -- ---- 0 11 0 1 8 RAS1 RAS1 RAMF OR DZ 001D - - FFF R/F ! TRUE 0 HLDA FL INTE 0 11 0 1 8 RAS1 RAS1 NOP EXNOR DZ 001E - - 00D C/SBR - RDY 0 MEMR -- ---- 0 11 0 1 8 RAS1 RAS1 NOP EXNOR DZ 001F - - 00A C/SBR ! HOLD 0 NOC -- DATA 0 11 0 1 8 RAS1 RAS1 RAMF OR DZ
-
Symbolic tracing
03/26/2023 at 05:14 • 0 commentsMuch of the time developing any computer - related project is spent debugging. Hobby projects with FPGAs are no exception, if anything the debugging time is even more as the computer itself is defined in software.
In initial stages, simple LEDs are very useful for basic debugging (e.g. "does PC even increment?", "which microinstruction address is it executing" etc.), and switches and buttons allow for doing this cycle by cycle or at any speed. But at some point, more is needed. Luckily, very custom debugging circuits can be written together with the actual device and tailored to exactly its function.
In this case, a "debugtracer" component was added to the system. Its function is pretty simple:
- Upon reset (or special "load" signal), load the internal 5-bit reg_match register
- Each bit in the reg_match allows matching M1, MEMRD, MEMWR, IORD, IOWR signals (in other words, any combination of I/O or MEM cycles)
- If the match between reg_match and control signals on the CPU control bus is detected, lower the READY signal for CPU and "freeze" the cycle
- With CPU cycle frozen, start a sequence (driven by baudrate clock, in this case 38400) to sample state on the CPU bus (16 bit address, 8 bit data, control) and output that to provided serial TXD output
- Finish the record by sending CR and LF sequence for convenient text tracing (0DH 0AH)
- Check the continue signal, if low repeat (6) otherwise (7)
- Raise READY high (so CPU can continue with next cycle), go to step (3)
Given that the debugtracer already listens to (almost) whole CPU bus, it is easy to also respond to special instructions to turn on/off the tracing. This is useful in poll-type loops (example when waiting for a character from ACIA in the modified Tiny Basic source):
GETLN: RST 2 ;*** GETLN *** LXI D,BUFFER ;PROMPT AND INIT. GL1: CALL CHKIO ;CHECK KEYBOARD ;------------------------------------------------------ OUT 00H; TRACE OFF ;------------------------------------------------------ JZ GL1 ;NO INPUT, WAIT ;------------------------------------------------------ OUT 01H; TRACE ON ;------------------------------------------------------ CPI 7FH ;DELETE LAST CHARACTER? JZ GL3 ;YES RST 2 ;INPUT, ECHO BACK CPI 0AH ;IGNORE LF JZ GL1 ORA A ;IGNORE NULL JZ GL1 CPI 7DH ;DELETE THE WHOLE LINE? JZ GL4 ;YES STAX D ;ELSE SAVE INPUT INX D ;AND BUMP POINTER CPI 0DH ;WAS IT CR? RZ ;YES, END OF LINE MOV A,E ;ELSE MORE FREE ROOM? CPI BUFEND AND 0FFH JNZ GL1 ;YES, GET NEXT INPUT
Here is a sample trace matching all signals (every CPU bus cycle). The format is:
<type>,AAAA DD<cr><lf>
Tracing with symbols
With a well-structured trace text records it becomes possible to intercept them, match them up with assembly listing file and display them in rich symbolic format. This is possible by running a simple utility on the host which:
- Loads the selected listing file from disk (path can be specified on command line or prompted for using the standard windows file select dialog)
- Parse the file into a dictionary with AAAA DD (address and data) as keys
- Open a COM port to listen for incoming tracer messages
- If a message comes in, parse it, extract M1 records and match with the dictionary in memory
- If there is a match, display full line from listing file otherwise just the raw trace record
- allow flipping RTS COM signal to start / stop tracing
Here is the event handler that fires on every received character from COM port (note that LF (presumed end of trace record) triggers the action:
static void Port_DataReceived(object sender, System.IO.Ports.SerialDataReceivedEventArgs e) { string received = comPort.ReadExisting(); foreach (char c in received) { if (c == LF) { // leave out the previous CR (TODO - check assumption it was a CR...) string traceRecord = sbTraceRecord.ToString(0, sbTraceRecord.Length - 1); string[] traceValuePair = traceRecord.Split(','); string recordType = traceValuePair[0].ToUpperInvariant(); switch (recordType) { // see https://github.com/zpekic/sys9080/blob/master/debugtracer.vhd case "M1": // instruction fetch if (traceDictionary.ContainsKey(traceValuePair[1])) { Console.WriteLine(traceDictionary[traceValuePair[1]]); } else { Console.ForegroundColor = ConsoleColor.Yellow; // YELLOW for unmatched record Console.WriteLine(traceRecord); } if (profilerDictionary.ContainsKey(traceValuePair[1])) { // increment hit count profilerDictionary[traceValuePair[1]]++; } break; case "MR": // read memory (except M1) case "MW": // write memory case "IR": // read port case "IW": // write port Console.ForegroundColor = ConsoleColor.Blue; // BLUE for not implemented trace record type Console.WriteLine(traceRecord); break; default: Console.ForegroundColor = ConsoleColor.Red; // RED for unrecognized trace record type Console.WriteLine(traceRecord); break; } Console.ResetColor(); sbTraceRecord.Clear(); } else { sbTraceRecord.Append(c); } } }
Note that the event handler above really only cares for M1 cycles. However it could be extended to handle all others and display a memory and I/O map in real time. Even if other devices in the system can modify the memory or I/O, this simple way still provides 100% accuracy at the read time (because by its nature always shows the state as it is read by the CPU).
For all of this to work well together, all the files need to be synchronized during build time, as depicted in this toolchain flow:
-
Adapting and running Tiny Basic (benchmarks!)
03/26/2023 at 05:12 • 0 commentsThe history of Tiny Basic programming language goes back to the same era when first home/hobby microcomputers powered by 8-bit microprocessors started to appear.
Due to ROM/RAM size limitations (2kB each) and the CPU used (8080), the choice of Tiny Basic fell on this version:
;************************************************************* ; ; TINY BASIC FOR INTEL 8080 ; VERSION 2.0 ; BY LI-CHEN WANG ; MODIFIED AND TRANSLATED ; TO INTEL MNEMONICS ; BY ROGER RAUSKOLB ; 10 OCTOBER,1976 ; @COPYLEFT ; ALL WRONGS RESERVED ; ;*************************************************************
I took the Tiny Basic source code from CPUville site (by Donn Steward which has many other great retro-computing resources there too) and made two modifications:
(1) changed from Intel 8251 UART to Motorola 6850 ACIA (I/O port locations and control / status register bits):
;--- definitions for Intel 8251 UART ------ ;UART_DATA EQU 2H ;UART_CTRL EQU 3H ;UART_STATUS EQU 3H ;UART_TX_EMPTY EQU 1H ;UART_RX_FULL EQU 2H ;UART_INIT1 EQU 4EH ;1 STOP, NO PARITY, 8 DATA BITS, 16x CLOCK ;UART_INIT2 EQU 37H ;EH IR RTS ER SBRK RxE DTR TxE (RTS, ERROR RESET, ENABLE RX, DTR, ENABLE TX) ;--- definitions for Motorola 6850 ACIA --- UART_DATA EQU 11H UART_CTRL EQU 10H UART_STATUS EQU 10H UART_TX_EMPTY EQU 2H UART_RX_FULL EQU 1H UART_INIT1 EQU 03H ; reset UART_INIT2 EQU 10H ; 8N1, divide clock by 1 ;
(2) fixed the "overflow on change sign" bug
When changing the sign of 16-bit 2's complement integer, there are two cases when the MSB stays the same - from 8000H (-32768) to 8000H (and this is an overflow error) and from 0000H to 0000H (not an error). This second case was not handled in original (and was causing HOW? error message when I first ran the benchmark program).
; CHGSGN: MOV A,H ;*** CHGSGN *** PUSH PSW CMA ;CHANGE SIGN OF HL MOV H,A MOV A,L CMA MOV L,A INX H POP PSW XRA H JP QHOW MOV A,B ;AND ALSO FLIP B XRI 80H MOV B,A RET
; CHGSGN: MOV A,H ;*** CHGSGN *** PUSH PSW CMA ;CHANGE SIGN OF HL MOV H,A MOV A,L CMA MOV L,A INX H POP PSW XRA H JM FLIPB ;OK, OLD AND NEW SIGNS ARE DIFFERENT MOV A,H ORA L JNZ QHOW ;ERROR IF -(-32768) FLIPB: MOV A,B ;AND ALSO FLIP B XRI 80H MOV B,A RET
To assemble into binary I used zmac cross-assembler, with -8 command line flag to use the 8080-style mnemonics (instead of default Z80-style). Note that the VHDL project in ISE 14.7 which produces the .bin file to download to FPGA is using the .hex file output, not the .bin (which would be the case in programming EPROMs for example). This process is described in a separate project log.
Running and benchmarking
To test the Tiny Basic, I used the benchmark test program proposed by June 1980 Interface mag article - a very simple algorithm to find first 1000 prime number. Only two minor modifications were done - as Tiny Basic has no INT() - because doesn't support floating point numbers, and commenting out CHR$(7) (which on many somewhat compatible ANSI platforms would produce a beep "bell")
100 REM ------------------------------------- 101 REM Simple benchmark - find primes < 1000 103 REM ------------------------------------- 104 REM https://archive.org/details/InterfaceAge198006/page/n131/mode/2up 110 REM ------------------------------------- 130 PRINT "Starting." 140 FOR N = 1 TO 1000 150 FOR K = 2 TO 500 160 LET L = N/K 170 LET M = N-K*L 180 IF L = 0 GOTO 230 190 IF L = 1 GOTO 220 200 IF M > L GOTO 220 210 IF M = L GOTO 240 220 NEXT K 230 PRINT N; 240 NEXT N 250 REM PRINT CHR$(7) 260 PRINT "Finished." 270 STOP
Results running at different CPU clock frequencies:
104 REM https://archive.org/details/InterfaceAge198006/page/n131/mode/2up 105 REM SW210 CPU (MHz) Result (s) 106 REM 100 01.5625 52m23 3143 107 REM 101 03.1250 26m12 1572 108 REM 110 06.2500 13m06 786 109 REM 111 25.0000 3m17 197 110 REM -------------------------------------
It can be seen that the execution speed is in strong linear correlation with CPU clock frequency (this is expected because serial I/O is sparsely used, and memory interface never adds any wait cycles, always follows CPU speed - not a case in many real computers of that era). 8080 running at 25MHz "warp speed" is comparable to another exotic bit-sliced processor of the era (HEX-29 was AMD's "showcase design" introduced in the classic bit-slice design cookbook) at 6MHz, while at the more realistic 3.125MHz it is comparable to the 3MHz 8085-based computer of the era (Intel 8080A was rated to max 2MHz clock frequency).
(image from LALU (lookup ALU CPU))
-
Single board computer on a FPGA
03/26/2023 at 05:11 • 0 commentsThe main point of this project was to recreate a working CPU by following a technical document from 1978, illustrate micro-programming in a practical way and recreate the ICs (especially Am29XX) from the era in a FPGA.
Maybe not the best (which would be to run dedicated test programs) but the most fun way to verify if the CPU works is to create a small working system around it, running Basic.
The small system is similar to SBCs (single board computers) available for many 8-bit CPUs, closest is probably 8085 MiniMax.
Project top level source file (sys9080.vhd) describes the SBC mostly through structural VHDL design approach, and can be roughly represented as:
(note a bug in the sketch: MUX in center right has arrow in wrong direction, it can only drive DBUS, and takes input from either switches or buttons - so CPU can read these as port 00H and 01H)
Key components (as they are named in top-level source file):
cpu
Details are described here. While Intel 8080 compatible, there are some differences:
- Bus control signals are directly available, as if Intel 8228 was in the system too
- Like everything else inside the FPGA, it is static, with clock frequency working from 0 to 25MHz (not tested above that)
- There is a debug port that allows any register pair to be read, or microinstruction program counter / instruction register be read. However it is commented out to save FPGA real estate.
- DMA and Interrupts are not used / hooked up (but tested and they work - only RST x instructions for interrupt response)
ram (2k*8)
Source code is here. Very similar to the static RAM ICs of the era, such as 6116. It is implemented using Xilinx-specific component to be able to fit into the design (ISE 14.7 was not able to map/pack the generic VHDL defined design into the modest Spartan XC3S200A FPGA as the design was getting too big. It repeats 31 time in the memory address space (everywhere outside ROM taken space, so 0800H - FFFFH)
tinyrom (2k*8)
Similar to EPROMs of the era, such as 2716. It contains Tiny Basic, or can contain any other 2k system program (e.g. a monitor) in the 0000H-07FFH address space (8080 family of CPUs require readable and defined RST n target addressed in range 0000H-0038H. Its size and contents is defined in parametric way:
-- ROM 2k at 0000H to 07FFH -- See http://cpuville.com/Code/tiny_basic_instructions.pdf tinyrom: entity work.rom1k generic map( address_size => 11, filename => "..\prog\zout\tinybasic2dms.hex", default_value => X"76" -- HLT ) port map( D => data_bus, A => address_bus(10 downto 0), nOE => nTinyRomEnable );
This project log provides details how a file name parameter leads to build-time initialization of ROM memory using hex file format.
acia0
This device mimics the popular MC6850 ACIA of the era. Mimics because it only supports features in its control and status registers that are used by Tiny Basic. For example, no interrupts for example or modem control pins are supported. It is an aggregation of:
- Serial to parallel converter - this works on the delay line principle (not a state machine!) and is driven by 4*RXD clock rate. It is able to detect any parity and 1-2 stop bits, but the data bit count is always 8.
- Parallel to serial converter - works as a counter driven MUX, not as a shift register. It is able to generate any parity and 1-2 stop bits, but data frame is always 8 bits.
- Glue logic that implements 8-bit data, status and control registers, with key bits compatible to MC6850 - such as:
- TDRE (transmit data register empty) - code is waiting for this bit to go high before sending out a character
- RDRF (receiver data register full) - code is inspecting this bit in a loop to check if a new character has arrived
D <= d_out when (int_read = '1') else "ZZZZZZZZ"; d_out <= rdr when (RS = '1') else status; status(7) <= '0'; -- no interrupt status(6) <= err_parity; -- parity error status(5) <= err_overrun; -- receiver overrun status(4) <= err_frame; -- framing error status(3) <= '0'; -- clear to send status(2) <= '0'; -- data carrier detected status(1) <= tdre; -- transmit register empty status(0) <= rdrf; -- receive data register full
tracer
Maybe the most interesting component in the design, because it is simple yet it allows observing any bus cycle remotely, and for instruction fetches tracing them in assembly source code format. It silently listens to bus signal activity (levels of IORD, IOWR, M1, MEMRD, MEMWR signals) and if right pattern is detected, stops the cycle using READY signal until the record describing the cycle is output on serial UART pin.
clocks
The system uses a variety of clock signals, and their generation is centralized in this component. The source is 50MHz main FPGA clock signal, from which others are derived using frequency dividers. Main outputs are:
- TX and RX clocks for ACIA. These are 38400 and 153600 Hz, for the 38400 send/receive baud rate
- CPU clock
-- connect to outputs with cpuclk_sel select cpu_clk <= ss when "000", -- single step freq_2048(9) when "001", -- 4Hz freq_2048(7) when "010", -- 16Hz freq_2048(5) when "011", -- 64Hz freq_25M(4) when "100", -- 1.5625MHz freq_25M(3) when "101", -- 3.125MHz freq_25M(2) when "110", -- 6.25MHz freq_25M(0) when others; -- 25.0MHz
"cpuclk_sel" is a 3-bit value that comes from Mercury baseboard switches 0-2, allowing these CPU speeds to be selected. When set to 000, pressing baseboard button 0 allows stepping machine cycle by machine cycle.
led4x7
This is a standard 7-segment LED driver. It takes 16 bits (4 hex digits) and visualizes them onto 4 7-seg LEDs on Mercury baseboard. It has some "fancy" features such as individual blanking of digits, blanking of whole display (this is used for flashing effect when READY is low, meaning tracer component is active), and driving the 4 dots individually.
The 16-bit input is output of a 20-bit wide "led_bus". Upper 4 bits are supposed to connect to 4 LEDs on the Mercury board (but currently are driven from PMOD pins). The led_bus can display either a subset of bus signals (A7:0, D7:0, control), or debug output from CPU.
Other components include:
- Reset circuit which ensures at least 4 clock period long reset signal when button on Mercury board is pressed
- "inport" MUX which allows reading 8 switches and 4 buttons as I/O ports 00H and 01H respectively
- "on_rts1_pulse" - simple flip-flop that allows starting/stopping the tracer by pressing the space bar on the host (if running the tracer.exe utility)
-- Tracer works best when the output is intercepted on the host and resolved using symbolic .lst file -- In addition, host is able to flip RTS pin to start/stop tracing -- See https://github.com/zpekic/sys9080/blob/master/Tracer/Tracer/Program.cs rts1_pulse <= PMOD_RTS1 xor rts1_delay; on_rts1_pulse: process(reset, rts1_pulse) begin if ((reset or btn_clk) = '1') then continue <= '1'; else if (rising_edge(rts1_pulse)) then continue <= not continue; end if; end if; end process;
ConsoleKeyInfo key; bool exit = false; while (!exit) { key = Console.ReadKey(); switch (key.KeyChar) { // TODO: clear instruction counter on some key case ' ': comPort.RtsEnable = !comPort.RtsEnable; break; case 'x': case 'X': // leave it in enabled state exit = true; comPort.RtsEnable = true; GenerateProfilerReport(); break; default: break; } } comPort.Close();
-
Building the CPU
03/26/2023 at 05:10 • 0 commentsContext
Given that this CPU implementation is an almost canonical example of microcoded design as envisioned by AMD - and a showcase of their Am29XX and Am25XX ICs - it is very helpful to go over at least chapters I and II of the "Bit-slice microprocessor design" book for better understanding. After that, the application note provides a great explanation of this specific CPU implementation. All source files to implement the CPU are under this folder.
General notes
In this CPU re-creation I attempted to follow the original application note as closely as possible. The description there is very detailed but also sufficiently clear to allow not just the understanding of this implementation, but how it can be adapted to other similar processors. The schema of the CPU is in figures 3-5 in the application note, and the VHDL top-level file is organized to follow those figures, and IC names/id so that the code can easily be mapped to schema and vice versa.
VHDL has the ability to describe circuits in both structural and behavioral ways, and in any source files these can be freely mixed. In this case, top-level is mostly structural (building blocks), for example:
-- data bus register (FLAGS) -- 7 6 5 4 3 2 1 0 -------- -- S Z 0 AC 0 P 1 C -------- u102: Am2918 port map ( clk => CLK, nOE => db(3), d(3) => '1', d(2) => '0', d(1) => '0', d(0) => u92_pin7, o(3) => open, o(2) => open, o(1) => open, o(0) => flag_cy, y(3) => DBUS(1), y(2) => DBUS(3), y(1) => DBUS(5), y(0) => DBUS(0) ); u101: Am2918 port map ( clk => CLK, nOE => db(3), d(3) => u91_pin4, d(2) => u91_pin7, d(1) => u91_pin9, d(0) => u91_pin12, o(3) => flag_z, o(2) => flag_p, o(1) => flag_s, o(0) => flag_ac, y(3) => DBUS(6), y(2) => DBUS(2), y(1) => DBUS(7), y(0) => DBUS(4) );
While the implementation of individual components in mostly behavioral (description of functionality):
entity Am2918 is Port ( clk : in STD_LOGIC; nOE : in STD_LOGIC; d : in STD_LOGIC_VECTOR (3 downto 0); o : buffer STD_LOGIC_VECTOR (3 downto 0); y : out STD_LOGIC_VECTOR (3 downto 0)); end Am2918; architecture Behavioral of Am2918 is begin y <= o when (nOE = '0') else "ZZZZ"; load_q: process(clk, d) begin if (rising_edge(clk)) then o <= d; end if; end process; end Behavioral;
Components
The parts list in the original design (59 ICs) closely matches the component list from the FPGA project:
However, there are some differences for sake of simplification:
- Simple logical gates (74LSXX) are described using intrinsic VHDL logical functions, in a structural way
- 3 units of Am2909 4-bit microprogram sequencer are merged into a single fictitious Am2909-12 component to save on interconnections. This is possible as no signals between them are used. In case of Am2901 this is not possible as intermediate carry (H flag) is needed for DAA instruction
- Microcode (Am29773) and instruction mapper (Am29761) ROMs are defined as single monolith 512*56 and 256*12 ROM, initialized using memory description text files - this is described in separate project log
- The sequencer PROM (Am29751) contents is simply defined inline, not loaded from file (and is "real" size of 16*5 bits, not 32*8)