For such delicate mechanism, error detection is certainly necessary. It's the difference between a machine that (mostly) works and endless headscratching.
But things get pretty complicated...
First, how many parity bits ?
Parity is checked, stored and computed for each DRAM and register word, where they are vulnerable.
I thought about one parity bit, that's easy and simple, in theory. In pratice, it breaks the whole symmetry of the bitplane design, 17 is prime and partition is impossible.
The next number is 18 : it's even and has 3×3 as factor. It's better and I came up with some partitions to help reduce the strain on the address lines.
And 2 parity bits are better than one. For example one parity per byte, or a partial SECDED, will help discriminate where the error(s) occur(s).
But now I realise that going to 18 bitplanes breaks something else ! The CCPBRL system uses pairs of registers in strings, and 16 bitplanes works very well because 16=2×2×2×2, so we could "fanout" one signal to all 16 boards with 2 strings of 8 relays, controlled in the middle.
With 18 bitplanes, though, the numbers don't fall naturally : 18/2=9, and a string of 9 can't be tapped in the middle. Oh, with CCPBRL it's possible to have 4:5 strings but something else isn't good : a string of 9 coils requires 15V, which is "yet another voltage" to generate !
Thinking about it for a while, I realise that breaking the 18 in half wasn't the only solution. It's possible to partition it into 3 strings of 6 : 6 is even so can be cotrolled in the middle, and we get the necessary 9V between the 3V rail and the 12V rail.
The fanin-18 signals apply to all the circuits related to the parity-protected storages : memory and registers. There are not many of them (mostly write enable, read enable, etc.). More high-fanout signals are required by the ALU part, which is 16-bits wide and uses a pair of 8-coils strings.
2 bits of parity give more informations of the location of the fault.
Usually, the expected type of fault is a bad joint, a bad connector or a faulty part, like a leaky diode or capacitor. There could be power-related issues like voltage spikes that might trip one of the thousands of CCPBRL relays, so these issues are more diffuse. Having a consistent location to examine will help with the machine maintainance.
2 bits of parity can be used to check individual bytes : 2 independent groups of 9 bitplanes will protect one byte each. SECDED codes can't do better, and can't locate 2 simultaneous errors.
So the parity circuit is as simple as you'd imagine : XOR all the bits of a byte and compare the result with the parity bit...
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.