The perfect ISA doesn't exist.

As noted in the previous log, an ISA is always a compromise and can only be judged for a given task or set of tasks. But since everybody has their own "requirements" and they change all the time, in the end, there is no way to get an absolute judgement. Synthetic benchmarks (SPEC etc.) help but they evaluate a complete system in a specific configuration, not the ISA.

So when you must design an ISA from scratch, you're left with habits, conventions, heuristics, cool ideas you picked somewhere, sprinkled with bikeshedding and countless "what if I tried this or that ?" but you have to set the bar somewhere.

If you want to do too many things, the architecture becomes good at nothing, or the price to pay (in complexity, price, consumption...) will be very high (x86...).

If you want to do one thing very well, you need a specialised processor, such as a DSP, GPU or even a special coprocessor (such as a compression or crypto accelerator).

In-between, you wish to have a reasonable, simple and flexible ISA, but you'll always find cases where you missed a "magic instruction" that boosts your performance. For this discussion, I'll refer you to the arguments for RISC vs CISC in the famous Patterson & Hennessy books, where they crunch the numbers and prove that any added feature must considerably speed up the whole system to be worth it. Which is seldom.

My personal approach is to let the HW do what it does well and let SW access it as easily as possible, which implies a lot of orthogonality. I start from what logic does easily (move bits around, change them), design the few vital units (boolean unit, add/sub, shuffler) then connect them efficiently to the register set. Extra instructions come from available features or options that require very little effort to implement.

For example : the boolean unit shares the operands with the adder, which requires one input to be inverted. This is an opportunity to get the ANDN opcode for free, where X = A and not(b). It just occupies one more slot in the opcode map, and a few gates of decoding logic, but no modification to the main datapath. Clock speed and silicon surface are not affected but some software that processes bits and bit vectors (blitting, masking, ...) can save some time in a critical loop.

Discussions

Dylan Brophy wrote 04/08/2019 at 04:40

Thanks Yann - I enjoyed this read!

Are you sure? yes | no

Yann Guidon / YGDES wrote 04/08/2019 at 04:42

You're welcome !
and each experienced opinion is welcome :-)

If you have questions, don't hesitate.

Are you sure? yes | no

Be ready to change your opcode map a lot, until you can't

Learn and know when to stop

Discussions

Become a Hackaday.io Member