Hashing with generalised PEAC (ep.2)

As we have recently discussed, Non-power-of-two PEAC is a generalisation of the original binary PEAC system, and they both look increasingly promising to each other. So let's call the new generalised algo gPEAC :-)

PEAC is fast, with a very short critical datapath, and well suited for hardware implementation, but it is limited to only certain widths and its mixing is quite weak. It is suited for medium to large blocks, from tens to thousands of bytes.
gPEAC has better mixing and a much wider set of working parameters. It uses one fewer register and the code looks simpler. However its critical datapath has not one but 3 operations (add, compare and subtract run in parallel, and a final selection step). It should work well for small buffers, such as for hash tables for URI/URL/parslers...

Mixing is not a critical criterion for a checksum/CRC and PEAC's speed reflects this. However hash tables require much stringent properties. I will not repeat Bon Jenkin's work, but only mention that one key's hash should be very different from another key that is very similar. This implies an "avalanche" effect, created by operations that promote mixing and tumbling of the individual bits of the numbers.

Both PEAC and gPEAC must be initialised with a pair of values: one "arbitrary constant", and the size of the buffer to hash.

Then the buffer is processed, as usual.

At the end, a hash needs to increase mixing: this is borrowed from Bob's final() method (introduced in lookup3.c), where the state of the registers is further mixed and tumbled to increase avalanche and smooth the statistics for small variations of input values. This is a compromise that lets the previous scanning part be simpler, and relegate the thorough avalanching to the last moment. At this stage, which could be about 3 to 8 additional rounds of gPEAC, the modulus could even change for each round.

The result of the hash with gPEAC is given by X+(Y×modulus) (multiplication is necessary at this stage because it is not a power of 2) and the range is 2×modulus² because the folding/modulo is performed before the addition and the result has one variable with double the range. I'll get the maths straight soon.

The choice of the modulus depends on

the range of hash values that are required (which also depends on the collision probability, see the birthday paradox)
the size of the host's registers
the size of the hashed words (8-bit bytes ? 16 bits ?)

At this early stage, we can only estimate the range and properties of the modulus and infer its application range.

The modulus must be less than one half the range of the host registers. That's 2³¹ for 32-bit systems, and only 32767 for 16-bit computers. Otherwise the Fibonacci accumulation overflows. This is however a nice feature of gPEAC because it does not explicit the need for a carry value. This simplifies handling in the largest range of computers and languages possible.
The square of the modulus must be as close to a power of two as possible. This is required because taking individual bits, or groups of bits, should have a probability as close to 50% as possible. Hash values are often truncated and should not have a noticeable bias.
The modulus should be just a bit larger than the size of the hashed words. This is to increase avalanche during the scanning itself and reduce the number of rounds of final mixing. So if 16-bit words are mixed, the ideal modulus is about √2×2¹⁶ = 92681 though this wastes a huge range provided by a 32-bit register.
However, with a thorough final mixing, the scanning part could have a larger modulus to increase the entropy (before it is truncated).
Knowing φ=1.6 we can estimate how many rounds of final mixing are required to produce at least one wrap-around in the worst case. The rule of thumb is to give an extra 50% to the log2 of the expected value. Example: to reach approx. 16M from the value 1, it takes 24 multiplies by 2, but the value 14930352 (almost 16M) is reached after 24×1.5=36 multiplies by φ. This means that quite a lot of rounds are required for one isolated LSB to cascade, however the registers are initialised by a set of values that should promote avalanche in fewer steps. (tbd)
The avalanche could be increased by XOR with a dense pattern (to increase carry propagation during addition) and rotations.

The goal is not to compete with existing high-performance hashes that already exist, but to find a sweeter spot for small, embedded systems, with a few registers holding 16 or 32 bits. The application would be for small lookups, such as variable names for interpreters, or URI in embedded servers, not for large databases. The existing and known "simple hashes" have a lot of flaws and drawbacks...

At this moment, my computer is still crunching the numbers and seeking interesting moduli, and this is a very slow process when we don't know what to look for. The computation has reached 128829 for now, almost a power of 2. Finding much larger moduli is computationally hard with a trivial method, just as for the binary PEAC, so I don't know yet how to find better higher values.

I just realised...

Hashing with generalised PEAC (ep.3)

Discussions

Become a Hackaday.io Member