Decoded instruction format and special instructions

Instructions are, as previously discussed, stored in a packed format. They are unpacked by the instruction fetcher unit and stored in a FIFO for the execution unit to use. A fetch operation may produce up to 2 instructions per fetch cycle (a fetch cycle takes two master clock cycles, as the memory they are fetching from is slower than the master clock; there are 2 instruction fetch units that operate on different instruction memory units and fill separate FIFOs, thus allowing for a total of 2 instructions per master clock cycle to be fetched), but it may also take multiple fetch cycles to produce a single instruction. Hopefully, this will even out to 1 instruction per cycle over time.

To keep things simple, the instruction fetcher is not directly connected to the register file (otherwise, we'd need some kind of arbitration over which unit gets to use the register file in which circumstances, which could easily get messy). We therefore need to arrange for methods to get the program counter into and out of the register memory indirectly via the execution unit (which is the only direct connection). The program counter realistically only needs to be stored in the register memory when the thread is not actually executing; while it is executing it can be cached in the decoder unit. The decoder needs to be able to inform the execution unit of the address of any instruction where it may stop, and the execution unit must be able to pass a new address when necessary. We therefore arrange for a bus between the two units to allow for this.

We add two instructions that are not visible in the ISA but which are interpreted by the execution unit; the instruction decoder can use these to cause program counter transfers:

RESTOREPC causes the execution unit to load the current PC from register memory and execute a jump to it; note that this is the same behaviour as a typical JMP instruction, except that a different operand mode is required
SUSPEND causes the execution unit to capture the current PC from the fetch unit and store it in register memory; it also signals to the instruction fetcher that it is ready to start handling instructions for a new thread

There are also no provisions for removing instructions from the queue if they turn out to be unnecessary. We therefore arrange for the fetcher to stop fetching if it finds an instruction that could cause a jump or suspension of a thread. It only resumes after the execution unit tells it what happened.

The PULL instruction requires special handling:

The first time it is executed in any specific task invocation, it will immediately return the operand that was placed into the task FIFO.
On subsequent executions, it will either return an additional operand (if the next entry in the task FIFO is also for the same task) or suspend the task.

The instruction fetcher therefore supplies a flag to the decoder unit that allows it to substitute a SUSPEND instruction for a PULL instruction in the latter case. YIELD and PUT instructions may also suspend the thread, but the decision to do so is deferred until execution, so are not replaced with SUSPEND instructions. YIELD, on the other hand, is a special case of PUT, so doesn't need a specific instruction.

This means that the instructions that are required to be supported by the execution unit are as follows:

Hex	Mnemonic	Brief explanation
00	SUSPEND	Store current PC in register memory
01	PULL	Retrieve task operand and store in destination register
02	JMP	Pass new PC to instruction fetcher
03	PUT	Send a value to a given destination
10	XCHG	Fetch 16 bits of register memory, byte swap, and save back to original source
11	MOV8	8 bit move from immediate operand (includes SCSB instruction)
12	MOV12	8-to-12-bit move
13	ALU	ALU operation
14	LDB	Load byte from memory (includes XLAT)
15	LDBI	Load byte from memory and postincrement address
16	STB	Store byte in memory
17	STBI	Store byte in memory and postincrement address
18	DLD	DMA load
19	DLDI	DMA load and postincrement address
1A	DST	DMA store
1B	EXT	Shift and extract bits
1C	IFREG	Conditionally execute next instruction based on tests against registers
1D	IFSTAT	Conditionally execute next instruction based on channel status
1E	SXA	Shift and add
1F	START	Set up new channel and begin execution

FIXME - optimize numeric allocations to minimize required logic

Operands are encoded using a mode and then several bits of data. 16-bit register access (used for simultaneous access to A and B registers) and 12-bit (for pointer registers) use the same encoding. ALU operations are 4 bits, and the same field may alternatively be used as a shift direction indicator/counter, or a condition code for IFxxx instructions. Not all fields are used by all instructions. Registers in fields labelled as "source" are preloaded before the execution pipeline stage (only one field per instruction is automatically preloaded; additional loads must be requested in the instruction microcode, which will make it take more than the standard 1 execute cycle).

Code	Description
0	Single source/target register (8 bit); ALU op; immediate 8
1	Single source/target register (16 bit); ALU op; immediate 8
2	Target register (16 bit); Source register (16 bit); ALU op
3	8 bit register; 16 bit source register; immediate 6
4	8 bit source register; 16 bit register; ALU op
5	Single source/target register (8 bit); immediate 8
6	Single source/target register (16 bit); immediate 8
7	PC flag; shift; immediate 8

So opcode requires 5 bits, operand type tag 3 bits, and maximum number of bits required in operands themselves is 16 bits (code 0). The instruction FIFO is therefore 24 bits wide.

Revising the register file, FPGA implementation

Instruction interleaving and processor options

Discussions

Become a Hackaday.io Member