Here is the new version of the algorithm:
Voilà !
There are many refinements since the last versions, helping to solve the issues found in the last logs, while still maintaining simplicity and efficiency, both with hardware and software implementations.
- The same algorithm generates both short and full length results. The 16-bit result simply truncates the 32-bit result. The short result misses the D input but if you really cared, you wouldn't use 16 bits only.
- A second carry bit, for the mixing lane, preserves as much entropy as possible. Now both lanes use EAC mod 65535.
- All adders have 2 inputs and a carry-in. This simplifies hardware design, as well as the use of "Add with Carry" opcodes (though x86 only has 1 carry flag so it will probably use LEA a lot).
- The init values now integrate the size of the buffer, which is not limited to 64Ki words.
- The magic init value 0xABCD4567 should boost early avalanche, even with very low values.
- The 0x7 is mixed with the first datum, it is odd and high density, increasing carry propagation,
- The other numbers have medium or more bit density
- 0x4567 + 0xABCD = 0xF134 so the first round should not cancel both addends (and the many next, to be confirmed).
Keep in mind that this diagram shows the "principles", the algorithm could be implemented "as is" (like the reference below) but several optimisations are possible in software for example.
#define PISANO_WIDTH (16)
#define PISANO_SIZE (1<<PISANO_WIDTH)
#define PISANO_MASK (PISANO_SIZE-1)
#define MAGIC_INIT ( 0xABCD4567 )
uint32_t Pisano_Checksum32(
uint16_t *buffer, // points to the data to scan
uint32_t words // number of 16-bit WORDS to scan
) { uint32_t
Y = words + MAGIC_INIT,
X = Y >> 16, // extract the MSB
C = 0, // The "magic number" takes care of proper
D = 0; // initialisation so C & D can be left cleared
Y &= 0xFFFF;
while (words) {
// C, D : range 0-1
// X, Y : range 0-FFFFh
C += X + Y; // result range : 0-1FFFFh
X = Y + *buffer + D; // result range : 0-1FFFFh
buffer++;
Y = C & PISANO_MASK; // result range : 0-FFFFh
C = C >> PISANO_WIDTH; // result range : 0-1
D = X >> PISANO_WIDTH; // result range : 0-1
X = X & PISANO_MASK; // result range : 0-FFFFh
words--;
}
C += X + Y; // result range : 0-1FFFFh
return C+((X+D) << PISANO_WIDTH); // range : 0-FFFFFFFFh
}
uint16_t Pisano_Checksum16(
uint16_t *buffer, // points to the data to scan
uint32_t words // number of 16-bit WORDS to scan
) {
return (uint16_t)(Pisano_Checksum32(buffer, words));
}
Now, there is a lot of work to simplify the loop body while keeping the exact same external behaviour.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.