Have sketched out a few ideas of useful programs to run, and used them to decide on a basic instruction set architecture. I'm still experimenting and optimising, but my basic plan at the moment is:
Register set
Current plan is for each independent thread to have the following set of registers:
- A and B - 8-bit mostly-general-purpose registers (a handful of instructions use these for specific purposes, but most operations can work on either)
- P0 - P3 - 4 12-bit registers used for memory address pointers and counters. Current planned uses only require 3 such registers, so the eventual implementation my only supply 3, but the ISA allows for a 2-bit field to select them so 4 are included in the design of the instruction set.
- Base address registers for memory addresses and direct memory access. These are not ISA-visible, but are set in channel configuration.
- PC - program counter
- CSB - Channel Status Byte - bitfield of:
- 0x01 - channel input program enabled
- 0x02 - channel processing program enabled
- 0x04 - channel output program enabled
- 0x08 - signal interrupt on input fetch while no enqueued value
- 0x10 - signal interrupt immediately (bit gets cleared automatically when interrupt is acknowledged)
- 0x20 - prefetch hint (may cause DMA load requests to be extended beyond requested addresses)
Instruction packing
Instructions are loaded from an 8-bit wide SRAM. There are three instruction formats:
- 4-bit with implicit operands
- 8-bit with internal operand fields
- 8-bit with an additional operand byte
Two 4-bit instructions may be packed into a single byte. An 8-bit instruction may either be aligned to the start of a byte, or it can be packed into the spare location after a 4-bit instruction in which case all bits of its second nibble are assumed to be zero. Jump destinations must be byte aligned. Packing is big-endian (i.e. the first instruction executed is in the most significant nibble).
This means that the valid instructions for the second slot in a byte are the usual 4 bit instructions, NOP (1000), MOV P0L, A (1001), EXT R,1 (1010), JMP PC-1 (1011), ADD P0, A (1100), DLDB [P0] (1101), PUT #n, A (1110) and ADD rr, i6.
Some instructions (with mnemonics IFxxx) are effectively prefixes that control conditional execution of the following instruction. An advanced implementation could fuse these to operate in a single cycle, but I'm not going to do that for now (I may support it in a CPLD/FPGA version later, but it would be too complex for the low integration logic I'm planning to use here).
Allowing 4-bit instructions makes the decoder design slightly harder, but the hope is that by packing more instructions per byte it should be possible to approach 1 instruction per clock cycle (as decode and execution will have to contend for access to the SRAM in many cases, and 2-byte instructions will of course always need 2 cycles to fetch). A small queue (probably 4 instructions) will be used to prefetch instructions to even out delays.
Instruction format
Bitfield identifiers:
- rr,qq - selects one of the P0-P3 registers
- s - 0 = A, 1 = B
- b - for 8-12bit move operations, selects either the low or high 8 bits of the 12 bit register
- i or j - bit is used as part of an immediate operand
- d - shift direction, 0 = R, 1 = L
- oo - shortened ALU operation code: 00 = ADD, 01 = SUB, 10 = AND, 11 = OR
- pppp - 74181 opcode (see mnemonics below)
- ww - an operation width: B=00, W=01, D=10, Q=11
- n - 1 to negate (adds N to mnemonic)
- c - channel condition (0 = CI - input thread running and waiting for data; 1 = CR - output ready)
Mnemonic notes:
- [x] is an indirect reference to the contents of memory pointed to by x
- for two operand instructions, left operand is destination
- r+ increments register 'r' after accessing it
- LDB, STB operate on bytes in SRAM
- DLDw, DSTw operate on main memory; direct loads cause the value to be pushed into the input FIFO so is then fetched using PULL. Multi-word values are pushed in memory address order. All DMA addresses are relative to a base address register which is set as part of the channel configuration, and defines a 4KiB page within which transfers must operate.
Opcode | Ext bits | Op byte | Mnemonic |
Pull instructions - suspend thread until a byte is passed to the channel through the input FIFO | |||
0000 | - | - | PULL A |
0001 | - | - | PULL B |
0010 | - | - | XCHG A,B |
0011 | - | - | DLDB [P1+] |
Yield instructions - pass a byte to the current thread's default destination (may suspend the thread until the destination is available) | |||
0100 | - | - | YIELD A |
0101 | - | - | YIELD B |
0110 | - | - | LDB A, [P0+] |
0111 | - | - | STB [P0+], A |
1000 | pppp | - | xxx A, B |
1001 | rrbs | - | MOV rrb, s |
Extract operation: shift B register by n (= i+1) bits, and set the A register to the bits that were shifted out (aligned to least significant bit) | |||
1010 | diii | - | EXT d,n |
1011 | iiii | - | JMP PC-(i4+1) |
1100 | 00rr | - | ADD rr, A |
1100 | 01rr | - | SUB rr, A |
1100 | 10rr | - | IFNZ rr <INSN> |
1100 | 110s | - | IFNZ s |
1100 | 111s | - | IFZ s |
1101 | rrww | - | DLDw [rr] |
1110 | 000s | iiii iiii | PUT #n, s to do - explanation |
1110 | 0010 | iiii iiii | JMP i8<<4 |
1110 | 0011 | ? | no operation assigned |
1110 | 010s | iiii iiii | LDB s, [i8] |
1110 | 011s | iiii iiii | STB [i8], s |
1110 | 100s | iiii iiii | XLAT s, [i8<<4] |
1110 | 101s | iiii iiii | MOV s, i8 |
1110 | 110s | rrii iiii | LDB s, [rr + i6] |
1110 | 111s | rrii iiii | STB [rr + i6], s |
1111 | 00oo | rrii iiii | op rr, i6 |
1111 | 01oo | siii iiii | op s, i7 |
1111 | 100d | siii jjjj | SdA s, i, j [s = (s shd i) + j] |
1111 | 1010 | pppp rrqq | xxx rr, qq |
1111 | 1011 | rrsi iiii | DSTB [rr+i5], s |
1111 | 1100 | iiii iiii | SCSB i8 |
1111 | 1101 | rrii iiii | START #i6, [rr] |
1111 | 1110 | ncii iiii | IFxxx #i6 |
1111 | 1111 | ? | no operation assigned |
Note that there are no CALL or RET instructions - brief analysis of use cases has suggested that they are not likely to be required. Most operations are small and simple, and in any case will usually require speed so inlining subroutines would probably be a good idea in any case. With any luck, the 4KiB memory available to code for a channel should provide enough space for any reasonable operation.
There are two unassigned opcodes still available, but no currently foreseen applications for them. 1111_1111 seems a good candidate for extension to 3 byte opcodes, as it would be easy to detect in the prefetch circuit, so will be reserved for this purpose.
74181 mnemonics
We use an abbreviated selection of 16 useful operations out of the 32 available from the 74181.
Opcode | 74181 opcode & description | Mnemonic |
0000 | 00000 Q=A | NOP |
0001 | 01001 Q=A+B | ADD |
0010 | 00110 (carry in high) Q=A-B | SUB |
0011 | 0100 Q=A+(A&B) | ADA |
0100 | 01111 Q=A-1 | DEC |
0101 | 11001 Q=~(A^B) | NXOR |
0110 | 10011 Q=0 | ZERO |
0111 | 11100 Q=-1 | MNSO |
1000 | 10000 Q=~A | NOT |
1001 | 10001 Q=~(A|B) | NOR |
1010 | 10100 Q=~(A&B) | NAN |
1011 | 10101 Q=~B | NOTB |
1100 | 10110 Q=A^B | XOR |
1101 | 11010 Q=B | CPB |
1110 | 11011 Q=A&B | AND |
1111 | 11110 Q=A|B | OR |
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.