Globule symmetry

It seems I'll have to change, adjust, adapt my model of the globules because of the memory access patterns.

The "symmetrical globule" idea is great because it's simple, efficient, easy. The functions can also be merged during one cycle by issuing an instruction pair to perform operations that are more complex than what a single instruction could do alone : 2R2W or 3R1W instructions such as

addition with carries,
multiplication,
division,
barrel shift/rot,
bit insert/extract,
MAC maybe...

Each globule is dumb and designed for speed, which limits their functionalities a lot, but new features naturally appear when they are coupled. And this breaks the symmetry, in particular for scheduling and the instruction decoder/dispatcher.

The original idea was very simple : instructions are grouped in 3 consecutive words, or less, that follow the following pattern G1 G2 C or any substring, so overall we can have the following sequences:

G1
G2
G1 G2
G1 G2 C
G1 C
G2 C
C

(that's a total of 2³-1=7 combinations, and not 8 because the empty set does not exist).

But the pairing breaks this nice little clean system. A pair would be a special case of G1 G2 where the first opcode signals a pair, which constrains the second opcode in the pair. Each globule provides their operands and then both can receive a result, often after another cycle (or more depending on the operation).

But due to requirements in orthogonality, you may want your operands and results to go almost anywhere, right ? So you may end up with these sequences:

P1 Q2
P1 Q2 C
P2 Q1
P2 Q1 C

where P is the first opcode of a pair, and Q its "qompanion". And you see that the order of the globules can now be swapped! You still have a requirement to have certain operands and results in separate globules but inhibiting P2 Q1 would add too much stress on the register allocator, I think.

This also increases the number of opcode types : we have G, C, P and Q now. So the opcode requires 2 bits.

Also the operands of the Q opcodes are implicitly complementary to the leading P opcode, so there is no need to distinguish Q1 and Q2, it's possible to save a couple of bits but I doubt it's worth the effort. Let's keep both for now.

....

The system starts to break down when confronted to the constraints of accessing memory through the A/D register pairs. It's not a new system since the CDC6600 (and family) uses dedicated address and data registers as well, however 3 address registers are used for writing and 5 are for reading so the semantic is clear:

Write to a write register, and any write to the corresponding data register will trigger a memory write cycle.
Write to a read register and the read cycle is immediately triggered.

On the YASEP/Y8/Y32, the A/D pairs are not committed, which allows read-modify-write sequences. It's all fine with the YASEP (16-32) and Y8 because the memory is tightly coupled and usually has one level. Triggering a read cycle upon register write is not an issue. However Y32 has more latency, in fact it's meant to deal with several cycles, so if you only want to write to memory, updating the A register could trigger a cache miss and a long, painful fetch sequence...

... tbc ...

How fast can it run ?

Data Stack protections

Discussions

Become a Hackaday.io Member