The Y32 architecture is mostly divided in four essential blocks, each with the dedicated memory buffers:
- Glob1 (with dCache1)
- Glob2 (with dCache2)
- eBTB (with iCache)
- control stack (with stack cache)
A 5th block (with no memory access or array) processes data for multi-cycle and/or more than 2R1W operations (Mul/Div/Barrel shifter/...)
Yet the YGREC32 is a 3-issue superscalar shallow-pipeline processor. This directly dictates the format of the instructions:
- MSB=0 for "operations", that is: opcodes delivered to the globules to process data. Note: opcode 0x00000000 is NOP. The following bit addresses the globule, so there are 30 bits per instruction.
- MSB=1 is the prefix of the "control" opcodes, such as pf/jmp/call/ret/IPC/spill/aspill/... This affects the control stack as well as the eBTB (sometimes simultaneously). Note: 0xFFFFFFFF is INV.
.
The globules are very similar to the YGREC8 : a simple ALU (with full features, boolean, even umin/umax/smin/smax), 16 registers with 8 for memory access, and a 32-bit byte shuffle unit (aka Insert/Extract unit, for alignment and byteswap: see the #YASEP Yet Another Small Embedded Processor's architecture).
It must be small, fast : Add/sub, boolean and I/E must take only one cycle.
Complex operations are performed by external units (in the 5th, optional/auxiliary block) that get read and write port access with simultaneous reading of both register sets to perform 2R2W without bloating the main datapath: multiply, divide, barrel shift/rotate... The pair of opcodes get "fused" and can become atomic, but could also be executed separately.
Fused operations require an appropriate signaling in the opcode encoding.
For context save/restore, it would be good to have another, complementary behaviour, where one opcode is sent to both globs (broadcast).
It is good that control operations can be developed independently from the processing operations, though sharing the fields (operands and destinations) is still desired.
.
Both globs are identical, symmetrical, so there is only one to design, then mirror&stitch together.
Each glob communicates with the other through the extra read port (the register sets are actually 3R1W), and are fed instructions from the eBTB. The only connection with the control stack is for SPILL and UNSPILL, using the existing read & write ports. Each glob provides a set of status flags, which are copies of the MSB and LSBs of the registers (plus a zero flag). Hence some of the possible/available conditions are
- LSB0
- LSB1
- LSB2
- MSB
- Zero
The multiple LSB conditions help with dynamical languages that encode type and other ancillary values in the pointers (including #Aligned Strings format ). MSB can be a proxy for carry.
Oh and yes, add with carry is a "fused"/broadcast operation that writes the carry in the secondary destination.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.