If there is anything that fueled the RISC vs CISC debates over the decades, it's the status flags, and among them, the carry flag is an even hotter topic.
This flag has been a common feature of most architectures and MIPS slashed it. Considering that only ASM would support it, it made sense, in a way, because implementing it would not significantly improve the overall performance. After all it's a 32-bit architecture so multi-precision is not required, right ?
Looking at the shenanigans required to implement multi-precision additions on YGREC8 (5 instructions and 2 branches) and the end-around-carry in C for PEAC (or: "wrestling with GCC's intrinsics and compiler-specific macros"), the carry flag is not a done deal. The POWER architecture implements a sophisticated system to get around hardware limitations but nobody seems to have caught on it.
There is a big problem as well: the high-level languages. This can't be easily fixed because this side is so deeply entrenched that no solution is in sight. We end up with hand-coded libraries for multi-precision numbers (gmp ?), non-portable macros, weird intrinsics, platform-specific code...
Platform-wise, there is no easy solution but it is not hopeless.
Not all architectures need to avoid the carry flag, in particular microcontrollers or cores that will not grow to the point of using out-of-order.
For Y8, the opcode space is so short that I had to ditch the add-with-carry opcode. This is replaced by a 5-opcode macro (10 bytes instead of 2) whenever needed. We'll have to deal with it.
The YASEP has a larger opcode space and includes add-with-carry and sub-with-borrow. This is very classic here.
The F-CPU however is a totally different beast. Not only can't it have a single carry flag (due to the split pipelines), but because it is an inherent SIMD architecture, several carries would be required. There, the idea is to use a normal register to hold the carry bit(s), using either a 2R1W form or a 2R2W form (ideally).
- 2R2W is conceptually best except for the need to write 2 values to the register set at the same time. Each partial carry gets its own place in the register in SIMD mode, and there is no bottleneck for the single carry flag at the decoding stage. The single opcode works in a single cycle. FC1 could write the extra data to another cluster.
Eventually, the opcode could be a two-cycle instruction, first delivering the sum then the carry on the second cycle. - 2R1W splits the operation results in 2 separate opcodes, while only one operation is performed. It is slower because 2 successive opcodes are required but it is an easy addition to any existing ISA. On the programming language side, an extra operator "carry" (probably @+ and @- ?) can do the trick.
There is no perfect solution. Each type of core requires a complex balance so as usual, you're the one who decides (and is to blame).
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.