Close
0%
0%

PEAC Pisano with End-Around Carry algorithm

Add X to Y and Y to X, says the song. And carry on.

Similar projects worth following
A purely additive, non-cryptographic, fast and minimalistic checksum/PRNG/scrambler.

The sweet spot is a 32-bit checksum with a pair of 16-bit registers,
- very small, using only ADD,
- without the flaws of Adler32 and Fletcher, yet the same size and complexity,
- as good as CRC32, which it may replace when the memory footprint is critical
- can't be 0-crashed
- very easy to implement in digital circuits as well as microcontrollers,
- suitable for the detection of trivial corruption in network packets or files.
- not patented
- "as is", inappropriate for cryptography (though could inspire someone)

Lately, gPEAC with m=741454 (39 bits of entropy) has been found, a bit slower but with better avalanche than the PEAC with w16, more suitable for hash tables.

French article in GNU/Linux Magazine France

If you wonder why, just click and read the log.

For the what, check Definitions and Draft as well as PEAC vs LFSR for a comparison.

The original algo/project name was "Fibonacci checksum" but it later appeared that it was not the most accurate choice because Leonardo Pisano (the real name of Fibonacci) is the name associated to the periodicity of the sequence under modulo. All I did was add the tiny detail of "end-around carry", or "1-complement addition", which changed everything.

 
-o-O-0-O-o-
 

As for how I came to this system...

In 2012, Piano Magic released their album "Life Has Not Finished with Me Yet". One song contains a weird repeating pattern...

Glen Johnson's lyrics are often cryptic and more evocative than objective, but any geek's mind would cling on this mantra at the end:

"Add X to Y and Y to X"

This is really weird but... Why ? What's the point in this obscure "song" with no precise theme or clear subject ? And what does it do ? This last question is the most easily answered : just follow the damned algorithm.

C'est parti...

X=1, Y=0
Y+=X => 0+1=1
X+=Y => 1+1=2
Y+=X => 1+2=3
X+=Y => 2+3=5
Y+=X => 3+5=8
X+=Y => 5+8=13
X+=Y => 8+13=21
Y+=X => 13+21=34
X+=Y => 21+34=55
...

No need to go further, most of you should have recognised http://oeis.org/A000045, the famous Fibonacci sequence.

This gave me a compelling idea to modify the old Fletcher & Adler algorithms, keeping their very low footprint and minimalistic computational complexity. Both of these well known algos use a pair of values and have a similar structure. The trick is that rearranging the data dependency graph provides the equivalent of a minimalistic polynomial checksum, because the result is fed back on itself, in a more robust way than Fletcher's algo.

At first glance, this new checksum loop's body becomes something like :

Y += ( X + datum[i  ] );
X += ( Y + datum[i+1] );
i+=2;

This loop body is totally trivial to unroll. As trivial is the design of the corresponding digital circuit. This early version seemed to contain the whole checksum entropy in the last computed value but now X and Y are part of the signature. And the really important enhancement is the use of the carry!

For superscalar 32 bits CPUs, the following code seems to work well though the missing carry hardware (and/or lack of language support) requires more instructions to emulate:

t = X + Y + C;    Y = X + data;
C = t >> 16;      X = t & 0xFFFF;

A more efficient variation exists which does not use a temporary variable:

C += X + Y;
Y = X + data;
X = C & MASK;
C = C >> WIDTH;

In this worst case, without support of a carry flag, that's 5 basic operations (not counting memory loads) that fit in 4 registers and 3 cycles, to process 2 bytes. Not too bad. I'll let you deal with alignment. But is it really safe or safer?

The following logs will show how the idea evolves and the performance increases, through discussions about carry wrap-around, register widths, scheduling, avalanche, parallelism, orbits, structure, and escaping black holes...

 
-o-O-0-O-o-
 

Logs:
1. The variable
2. Datapath
3. Adler32 weakness
4. Easy circuit
5. Periodicity
6. A promising 32-bit checksum
7. Progress
8. Why
9. Orbital mechanics
10. Orbits, trajectories and that damned carry bit
11. Two hairy orbits
12. How to deal with black holes
13. Anteriority
14. Moonlighting as a PRNG
15. A new name
16. More orbits !
17. First image
18. Structure and extrapolations
19. Even more orbits !
20. Start and stop
21. Some theory
22. Some more theory
23. Even more theory.
24. A little enhancement
25. sdrawkcab gnioG
26. Further theorising
27. What is it ?
28. A bigger enhancement
29. Going both ways
30. Can you add states ?
31. Please, GCC !
32. _|_||_||_|______|______|____|_|___|_________|||_|__
33. Carry on with GCC
34. A good compromise
35. The new non-masking algorithm
36. Add or XOR ?
37. A 32/64-bit version
38. Closing the trajectories
39....

Read more »

Checksum_PEAC64_asm.c

64-bit checksum, 7 instructions and 128 bits per iteration, in asm for x86-64

x-csrc - 2.73 kB - 11/04/2024 at 06:34

Download

PEACLS_20241013-2.tgz

PEACLS : scrambler and descrambler finished, detect errors

x-compressed-tar - 20.95 kB - 10/13/2024 at 20:11

Download

PEACLS_20241013.tgz

PEAC Line Scrambler, first VHDL version (the prefix check is still missing)

x-compressed-tar - 20.48 kB - 10/13/2024 at 06:08

Download

orbit_resume_64.c

sequential scanner adapted to 64-bit values. Can disable MSB masking.

x-csrc - 2.13 kB - 10/01/2024 at 05:22

Download

gPEAC_1M.tbz

Sieve generator, sieve, and scanner for the gPEAC orbits, "1 million" edition.

x-bzip-compressed-tar - 81.95 kB - 05/17/2023 at 14:51

Download

View all 70 files

  • PEAC modulo

    Yann Guidon / YGDES11/05/2024 at 13:43 0 comments

    The log 122. Funky modulo has some funny/funky/hackmemesque observations for lower-level implementations, but what about the higher level ?

    It only occured to me today, thinking about gPEAC, that the wraparound of the sum S goes like

    S' = (S / m) + (S % m)

    Not sure how it's going to help with analysing the large-scale behaviour though, it's above my pay grade.

  • Hashing with PEAC39/2

    Yann Guidon / YGDES11/04/2024 at 06:02 0 comments

    Look at the logs
    155. More candidates and 741454
    156. The good numbers
    163. Hash enhancements

    The scan has harvested a very interesting number:

    • m= 2^(39/2)-1 = 741454 = 2×7×211×251 = 10110101000001001110b is known perfect, using 20-bit registers.
    • orbit = 549.754.775.568 = 111111111111111111100000010100000010000 in binary, takes about 8 minutes to scan.
    • offset: o = 0111.11010101.11010101 = 7D5D5h = 513493 = 107×4799

    With these parameters, we get a very effective hash algorithm for text strings, 2 bytes at a time (among others). Aside from the PEAC structure itself and its Fibonacci/Pisano scrambling with the addition,

    • The modulus is conditional and increases the scrambling by seeding avalanche with 8 more bits,
    • The offset is unconditional and increases the avalanche in normal ASCII characters, while also triggering "End Around Carry" much more often than normally with empty input.

    TODO

    • Check common divisors of 741454 | 513493 : none !
    • Check the orbit length of PRNG with offset 513493

    For this, the "scanner" must be modified to add the offset, which creates all kinds of ... hmmm algorithmic questions. Fortunately there are two ways to design a checksum with PEAC : single-carry or dual-carry. Single-carry is less parallel but uses a bit fewer operations. It's easy to adapt with the binary PEAC but I realise it's the first time I do this operation on a gPEAC: it's my first gPEAC checksum.

    .

    .

  • PEAC w64 checksum : x86-64 edition

    Yann Guidon / YGDES10/18/2024 at 01:25 1 comment

    I did it !

    The fastest checksum (128 bits per loop) that is also solid (unlike Fletcher/Adler etc.)

    Here is the C source code:

    #include <stdint.h>
    
    #define PEAC64_INIT1 (0x7777777777777777ULL)
    #define PEAC64_INIT2 (0xDDDDDDDDDDDDDDDDULL)
    
    void w64_checksum_asm(
        uint64_t *buffer,
        uint64_t *r,
        uint64_t *s,
        uint64_t WordPairs) {
    
      uint64_t
        X=WordPairs,
        Y=PEAC64_INIT2,
        I=PEAC64_INIT1;
    
      asm volatile("\n\t"
    "  addq %[I], %[X]                \n\t" //  X += I;
    "  clc                            \n\t" //  C = 0;
    "                                 \n\t"
    "  test %[WordPairs],%[WordPairs] \n\t" // WordPairs=0?
    "  jz empty_buffer                \n\t"
    "                                 \n\t"
    "  .p2align 4                     \n\t"
    "innerloop:                       \n\t"
    "    adcq %[X] , %[Y]             \n\t" // Y += X
    "    adcq (%[buffer]), %[X]       \n\t" // X += *buffer
    "                                 \n\t"
    "    adcq %[Y], %[X]              \n\t" // X += Y
    "    adcq 8(%[buffer]), %[Y]      \n\t" // Y += buffer[1]
    "                                 \n\t"
    "    lea 16(%[buffer]), %[buffer] \n\t" // buffer+=2
    "    decq %[WordPairs]            \n\t" // WordPairs--
    "    jnz innerloop                \n\t" // while (WordPairs);
    "                                 \n\t"
    "empty_buffer:                    \n\t"
    "  adcq %[X], %[Y]                \n\t" // Y += X
    "  adcq %[I], %[X]                \n\t" // X += PEAC64_INIT1
    : // outputs
      [X] "+r" (X), [Y] "+r" (Y),
      [WordPairs] "+r" (WordPairs),
      [buffer]  "+r" (buffer)
    : // inputs
      [I] "r" (I)
    : // clobber
      "cc" );
    
      *s = X;
      *r = Y;
    } 

    And gcc -O3 -S compiles it down to:

    w64_checksum_asm:
      movabsq $-2459565876494606883, %rax
      movq    %rcx, %r9
      movabsq $8608480567731124087, %r8
      addq %r8, %rcx
      clc
      test %r9,%r9
      jz empty_buffer
      .p2align 4
    innerloop:
        adcq %rcx , %rax
        adcq (%rdi), %rcx
        adcq %rax, %rcx
        adcq 8(%rdi), %rax
        lea 16(%rdi), %rdi
        decq %r9
        jnz innerloop
    empty_buffer:
      adcq %rcx, %rax
      adcq %r8, %rcx
      movq    %rcx, (%rdx)
      movq    %rax, (%rsi)
      ret

    That's it !

    With this, you get a 128-bit signature of a memory block, which can be filled with read(2) for example. It is hard to do significantly better now, because the memory becomes the bottleneck.

    One last trick is the use of ADCX and ADOX opcodes, to implement a separate carry structure and break the 4-instruction carry chain into two halves. But it won't work because decq affects the overflow flag.

    Now I have to check the primary orbit of w64...

  • TODO: scans

    Yann Guidon / YGDES10/17/2024 at 16:48 0 comments

    Things have progressed again and a new branch of the PEAC family has appeared, tuned for SW implementations where the masking operation is simply skipped. This completely changes the mathematics of the system.

    Below are 2 examples with w26 and w31 which are "extended" (hence the "+") by not masking the sum after the addition.

    w26 is known to be "perfect" but w31 does not pass the sieve.

    However messing with the MSB(s) changes the rules and everything has to be reconsidered now. The difficulty is that the 32-bit words require immense amounts of computations and the others (based on 64-bit registers) are even worse.

    So a thorough computation exploration is required for

    • w26+
    • w31 (not holding my breath)
    • w31+
    • w63 (passes sieve)
    • w63+
    • w64

    At least a first estimate of computability of the primary orbit.

    Furthermore, another class of gPEAC needs to be evaluated: the 1.5× types, called "pseudo-binary", for use in Line Scramblers (see the previous logs such as 167. Line encoding with PEAC: it's alive) : the 2 MSB of the modulo are 1 and the rest is 0. These are powers of 2 multiplied by 3, so the subtraction only requires processing a few MSB and not the whole word. At least a few of these numbers should be in the list of the moduli I had scanned last year. I have found 3 only...

    a=3 ; for i in $(seq 40); do a=$(($a*2)); echo $a; done
    6 : perfect
    12 : 10P
    24 : maximal
    48 : maximal
    96 : maximal
    192 : 191M 186P
    384 : 380M, 378P, 374P
    768 : 763M, 761M, 754P, 750P, 746P (w8)
    1536 : 1528M, 1526P, 1516M, 1511M, 1504M, 1501M, 1498P
    3072 : 3060M, 3038P, 3036M, 3018P
    6144 : 6128M, 6118P, 6106P
    12288 : 12280M, 12254P
    24576 : 24563M, 24551M, 24549M, 24546P
    49152 : 49050, 49084, 49094, 49128
    98304  : 98300, 98298, 98285 98264 98254
    196608 : 196605M 196598P 196594P          (w16)
    393216 : 393208M 393195M 393186P 
    786432 : 786418P  786403M 786388M 786390P 786391M
    1572864
    3145728
    6291456
    12582912
    25165824
    50331648
    100663296
    201326592
    402653184
    805306368
    1610612736
    3221225472
    6442450944
    12884901888
    25769803776
    51539607552
    103079215104
    206158430208
    412316860416
    824633720832
    1649267441664
    3298534883328
    ...
    

    ...

    Stay tuned.

  • PEACLS error detection (and correction?)

    Yann Guidon / YGDES10/13/2024 at 17:49 2 comments

    I added error detection, it was in fact very simple and it seems to work very well.

    I started playing with it, for example adding one bit of noise on the channel (in bold, scroll the listing).

    Init=1 In=0000000000000000 Scr:10-0000000000000000 Sc2:10-0000000000000000  Out=0000000000000000 OK=U  
    Init=0 In=0000000000000000 Scr:00-0000000000000001 Sc2:00-0000001000000001  Out=0000001000000000 OK=1  
    Init=0 In=0000000000000000 Scr:00-0000000000000001 Sc2:00-0000000000000001  Out=0000000000000000 OK=1  
    Init=0 In=0000000000000000 Scr:00-0000000000000010 Sc2:00-0000000000000010  Out=1111111000000000 OK=0
    Init=0 In=0000000000000000 Scr:00-0000000000000011 Sc2:00-0000000000000011  Out=1111110111111111 OK=0
    Init=0 In=0000000000000000 Scr:00-0000000000000101 Sc2:00-0000000000000101  Out=1111110111111111 OK=0
    Init=0 In=0000000000000000 Scr:00-0000000000001000 Sc2:00-0000000000001000  Out=1111110111111111 OK=0  
    Init=0 In=0000000000000000 Scr:00-0000000000001101 Sc2:00-0000000000001101  Out=1111110111111111 OK=0
    Init=0 In=0000000000000000 Scr:00-0000000000010101 Sc2:00-0000000000010101  Out=1111110111111111 OK=0
    Init=0 In=0000000000000000 Scr:00-0000000000100010 Sc2:00-0000000000100010  Out=1111110111111111 OK=0

    The error is not detected immediately but the OK flag is consistently stuck to 0 afterwards.

    What this means is : the epilogue of a packet should have 2 padding words to ensure the last data word has been correctly transmitted.

    Another interesting aspect is the efficiency of the 2 prefix bits : even though they are not flipped during transmission, the reconstructed bits mismatch anyway ! So it does not matter so much that the prefix does not use the whole coding range, error detection occurs anyway and the carries help prevent all-0 words.

    Even more interesting is the bit pattern of the altered reconstructed word: after 3 words, the output contains the inverse of the alteration ! Is this an opportunity for error correction ?

    Further wiggling shows that it's a corner case, but finishing the packet with 2 blank words certainly detects multiple errors. Get the source code here.

  • Line encoding with PEAC: it's alive.

    Yann Guidon / YGDES10/13/2024 at 03:21 0 comments

    I just had a first successful test run in VHDL using w16. It's not finished but the result is almost as good as I expected.

    [yg@Rizen5 PEACLS]$ ./runme.sh 
      In=0000000000000000 X:010000000000000000  Out=0000000000000000  
      In=0000000000000000 X:011101110101101101  Out=0000000000000000  
      In=0000100000000000 X:010111010111011011  Out=0000100000000000  
      In=0000000000000000 X:000100101101001001  Out=0000000000000000  
      In=0000000000000000 X:011100000100100101  Out=0000000000000000  
      In=0000000000000000 X:000000110001101110  Out=0000000000000000  
      In=0000000000000000 X:001100110110010100  Out=0000000000000000  
      In=0000000000000000 X:011101101000000010  Out=0000000000000000  
      In=0000000000000000 X:011010011110010110  Out=0000000000000000  
      In=0000000000000000 X:011000000110011001  Out=0000000000000000  
      In=0000000000000000 X:000010100100110000  Out=0000000000000000  
      In=0000000000000000 X:001010101011001010  Out=0000000000000000  
      In=0000000000000000 X:011101001111111010  Out=0000000000000000  
      In=0000000000000000 X:010111111011000100  Out=0000000000000000  
      In=0000000000010000 X:000101001011001111  Out=0000000000010000  
      In=0000000000000000 X:011101000110000100  Out=0000000000000000  
      In=0000000000000000 X:000010010001010011  Out=0000000000000000  
      In=0000000000000000 X:011111010111011000  Out=0000000000000000  
      In=0000000000000000 X:010001101000101011  Out=0000000000000000  
      In=0000000000000000 X:000001000000000100  Out=0000000000000000  
      In=0000000000000000 X:000010101000110000  Out=0000000000000000  
      In=0000000000000000 X:000011101000110100  Out=0000000000000000  
      In=0000000000000000 X:000110010001100100  Out=0000000000000000  
      In=0000000000000000 X:011001111010011000  Out=0000000000000000  
      In=0000000000000000 X:000000001011111100  Out=0000000000000000  
      In=0000000000000000 X:001010000110010101  Out=0000000000000000  
      In=0000000000000000 X:011010010010010001  Out=0000000000000000  
      In=0000000000000000 X:000100011000100110  Out=0000000000000000  
      In=0000000000000000 X:011110101010111000  Out=0000000000000000  
      In=0000000000000000 X:010011000011011110  Out=0000000000000000  
      In=0000000000000000 X:000001101110010111  Out=0000000000000000  
      In=0000000000000000 X:000100110001110110  Out=0000000000000000  
    

    I send 2 individual bits and the data on the line (X) is a nice mess. The output is exactly as expected however!

    The "schematics" are almost straight out of 79. Taxonomy:

    Encoder:

    Decoder:

    There is a slight delay due to pipelining/buffering so the output log above is "compressed".

    I can play with the INIT_X and INIT_Y values, for now I use

    INIT_X <= "0110110111011011";
    INIT_Y <= "1101110101101101";
    

    One thing I notice though is that the D bit is very dependent on the input message. When there is no input entropy, the prefix/HA output is either 00 or 01, barely ever 10. C seems to have a good behaviour but I need a way to toggle D more. Or even remove D...

    .

    .

    .

    But otherwise, so far it's a success, almost like expected.

    .

    _____________________________________________

    .

    I updated the "circuit" because the D should subtracted from the message, instead of added.

    So this solves a "little problem" with the output that gets off-by-one in certain situations.

    Get PEACLS_20241013.tgz !

    The carries will give a lot of problems, for sure. The 2-bit prefix depends a LOT on the message's MSB, for example. But the XOR part is quite good.

  • Line encoding with PEAC : OK

    Yann Guidon / YGDES10/09/2024 at 23:23 0 comments

    The idea started more than a year ago: 161. PEAC for line encoding

    So PEAC has many advantages and I wanted to turn it into a "line coder" for differential twisted pair transmission, for example. It didn't work back then but I finally cracked it !

    As you can see, this is the same scrambler as in 79. Taxonomy and I added a pair of gates (AND+XOR) to form a half-adder. The result is concatenated in front of the first 2 bits of a scrambled word, and that's it.

    With only this circuit, you get:

    • enhanced parity,
    • spectrum spreading / scrambling
    • clock resynchronization,
    • DC suppression
    • strict run-length limitation (up to w+1 bits)

    Packet-start signaling is added by ORing a 11 pattern to the prefix, and replacing the payload with a "start" symbol. This forced prefix can't map to any valid transmitted data, and can be used for signaling other conditions as well.

    This system works for any PEAC width, be it 4, 10, 16, 26 bits, depending on the BER, the medium's bandwidth, the timing constraints and the maximum length of a sequence of identical bits (000000... or 111111....).

    I worked all the theory and it only begs to be implemented, maybe in a future #Tiny Tapeout project ?

    .

    .

    .

    dropping the mic.

  • w32 wasn't such a dud

    Yann Guidon / YGDES10/01/2024 at 05:24 0 comments

    Somewhere I write that the primary orbit of w32's length is 13.603.546.275 iterations.

    Well, I don't know what happened. I'm testing again with another program right now and this number is far exceeded:

    Step 3423.088.934.912 : X=889508404, Y=3377359528, C=1
    

    I'm also testing w32+ simultaneously: it uses 64-bit registers and the carry is simply the MSB shifted back to the LSB... it's pretty wild!

    I adapted the scanning program to handle 64-bit data. When working with w63, -DNOMASK saves 25% of the computation time and the result looks as good.

    20241002:

    Computing further (about 27h) gives these results :

    • w32 masked:
    Step 127242701111296 : X=486058137, Y=1452857267, C=1
    Step 127246996078592 : X=57168942, Y=2026275220, C=1
    Step 127251291045888 : X=2130391839, Y=3854448558, C=1
    Step 127255586013184 : X=3974460549, Y=1367440861, C=0
    Step 127259880980480 : X=4171287861, Y=349814740, C=0
    Step 127264175947776 : X=1548631083, Y=2991865890, C=1
    Step 127268470915072 : X=635132602, Y=3437845154, C=1
    Step 127272765882368 : X=2468530850, Y=365303063, C=0
    ^C
    99801.70user 7.42system 27:43:43elapsed 99%CPU
    • w32 nomask:
    Step 127783866990592 : X=1857706233519333428, Y=6363298141668173779, C=432530938
    Step 127788161957888 : X=10278906262506799708, Y=11191247221860114708, C=2393244361
    Step 127792456925184 : X=17161298025364128903, Y=3297960277882192750, C=3995676065
    Step 127796751892480 : X=10933138637106185040, Y=14836593303031674861, C=2545569706
    Step 127801046859776 : X=1746085887977577584, Y=4191515253591217683, C=406542301
    Step 127805341827072 : X=9757511789714895600, Y=5605426172831249234, C=2271847750
    Step 127809636794368 : X=18277385174601529875, Y=5102702571660683175, C=4255535354
    Step 127813931761664 : X=11054218837186404983, Y=17005938597805104479, C=2573760886
    Step 127818226728960 : X=8927644014085209424, Y=8319196262501969693, C=2078629102
    Step 127822521696256 : X=7085261420112952104, Y=6817447390030532634, C=1649665976
    Step 127826816663552 : X=6780497954159397175, Y=14411798433420106112, C=1578707702
    ^C
    100167.06user 5.81system 27:49:45elapsed 99%CPU
    • w63 masked (classic):
    Step 98187247353856 : X=7789805987906940537, Y=8997492253897967197, C=1
    Step 98191542321152 : X=6219391496852744986, Y=2932353114636017441, C=0
    Step 98195837288448 : X=3513660373162068975, Y=5583561552522910818, C=1
    Step 98200132255744 : X=5464065101111839226, Y=6559200458595146488, C=1
    Step 98204427223040 : X=1704903053962015866, Y=689819550266971419, C=0
    Step 98208722190336 : X=5773997793067942377, Y=1739895575921382638, C=0
    Step 98213017157632 : X=8265381472875106818, Y=112466157831344603, C=0
    Step 98217312124928 : X=8866559370984867717, Y=1274529812375955528, C=0
    Step 98221607092224 : X=2616865902061394463, Y=2296577278457820939, C=0
    Step 98225902059520 : X=4885176899669488212, Y=5742267305191855321, C=1
    Step 98230197026816 : X=5923363790209385457, Y=6818607223108001127, C=1
    ^C
    101740.10user 5.92system 28:15:59elapsed 99%CPU
    
    • w63 nomask:
    Step 129712307306496 : X=13898979293651408146, Y=10471030960484516614, C=1
    Step 129716602273792 : X=15651397243760147896, Y=6773005075823389117, C=1
    Step 129720897241088 : X=17954653194078123381, Y=16120148392752158639, C=1
    Step 129725192208384 : X=198529025183803037, Y=13573698087745720944, C=0
    Step 129729487175680 : X=11965111940566923107, Y=14889642454769736172, C=1
    Step 129733782142976 : X=6123098105022721077, Y=15600159587929872239, C=0
    Step 129738077110272 : X=9623131821115452626, Y=7793938344049970617, C=1
    Step 129742372077568 : X=5460162401730428220, Y=481501917958797404, C=0
    Step 129746667044864 : X=364843809283491869, Y=16020168061567238065, C=0
    Step 129750962012160 : X=18379832403448125132, Y=9352390193144058009, C=1
    Step 129755256979456 : X=3070579473553777869, Y=11817703077574477677, C=0
    Step 129759551946752 : X=4276236740962303982, Y=2211040804269366145, C=0
    Step 129763846914048 : X=15164859771112835864, Y=11754352737906713717, C=1
    ^C
    101738.11user 5.51system 28:15:56elapsed...
    Read more »

  • The dawn of PEAC63 ?

    Yann Guidon / YGDES09/28/2024 at 01:44 0 comments

    We already know that 2147483648 % 7559=1984 so w31 is out. However PEAC63 still passes the sieve so it could be a good candidate. The question is : how good ?

    PEAC63 is equivalent to 126 bits of signature, the MSB work as carry bits so we get 128 bits just after the additions. There is quite little mixing so hashing is out of question but it's still very good for rough checksumming of files on a 64-bit platform, using 32-bit data at every cycle. Furthermore, it's not very different at all from the existing code. Getting bit 63 maybe done by sign extension or simply a shift, or even testing the sign bit of an unsigned number.

    Another funky thing to test is whether clearing the MSB/sign bit after it's copied back to the LSB, would affect the orbit's size. NOT clearing the MSB could save one operation and at this stage, it's pretty critical. Though with modern CPUs with OOO, it could not matter so much. So it must be tested.

    But exhaustive scans are out of question. I should run a sequential scan on the primary orbit, for the sake of it, but 2^126 states is not realistic, not even parallelizeable because even an arc is larger than the 2^52 states of PEAC26. That's why the sieve was so critical...

    The code would go like this:

    #define WIDTH (63)
    #define MASK (0x7FFFFFFFFFFFFFFFULL)
    
    uint64_t C, X, Y;
    uint32_t data;
    
    C += X + Y;
    Y = X + (uint64_t)data;
    X = C & MASK;  <= can we remove this ?
    C = C >> WIDTH;

    I'm not sure of the numerical properties but that code looks damn fast, with few instructions and able to gobble 32 bits at once per cycle. Fingers crossed.

  • w26's second life

    Yann Guidon / YGDES09/25/2024 at 23:37 0 comments

    (Damnit, the last log was exactly one year ago :-O )

    .

    (Warning ! wrong assumption ! w32's primary orbit is way larger than 13G ! See w32 wasn't such a dud)

    .

    Ideally, w32 would be perfect or maximal. It's neither. So it's not possible to design a mathematically sound CRC/checksum with 32-bit wide registers. In fact, w32's primary orbit is much smaller than expected...

    So what's perfect ? w26.

    It's suitable to checksum 16-bit wide words only, so it's slower than w32, which is used in some database code. But compare w32's primary orbit containing only 13.603.546.275 iterations, vs w26's 4.503.599.694.479.358 ! So w32 could be "good enough" for some cases but it doesn't benefit from the whole mathematical potential of the PEAC system.

    I have not found any suitable candidate that is larger than 32. The sieve says the following widths should be tested: 51 52 54 55 59 60 and 63 (this last one is alluring right ?)

    However this is now a 64-bit number and it's less suitable for embedded computing (often limited to 32 bits). It would work on a PC though.

    The other solution is to take w16 and parallelize it, SIMD-style. Embedded CPUs seldom have this feature.

    But I have found a trick.

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    So we're stuck with w26 and 32-bit registers. That's still a 64-bit checksum in the end. Now let's do the math.

    32-26=6 bits are unused in the register and do not meaningfully contribute to the result. They are affected by the data but the carry output is lost, never recovered nor re-injected so it's like a dead-end.

    26-16=10 bits are not affected by the data that is injected for checksumming. Normally the input data is added in the LSB for convenience.

    I guess you'll agree that 10 > 6 : This means that the data that "escapes" the register can be fully re-injected in the lower bits, above or along the input data. In fact, the PEAC checksums a part of itself, while still preserving the w26 properties. There are now 32+32=64 bits worth of entropy, not just 52: that's 4K more codes that are now unlocked.

    Of course, at this point, it's important to check the actual orbit's topology of this new generator. Is it still "perfect" ? This extended system still accepts 16 bits per cycle only so the advantage over PEAC16 is only when needing a larger signature. But using the extended w26 provides twice as many bits and works for 32-bit platforms.

    ...

    Now, it's time to code.

View all 173 project logs

Enjoy this project?

Share

Discussions

Yann Guidon / YGDES wrote 09/21/2022 at 17:14 point

https://www.youtube.com/watch?v=EWgts6EiJA0 has some relevant ideas here and there :-)

  Are you sure? yes | no

Yann Guidon / YGDES wrote 07/28/2022 at 16:03 point

TODO : check the distribution and periodicity of each rank of bit to show they don't exhibit the flaws of https://en.wikipedia.org/wiki/Linear_congruential_generator#m_a_power_of_2,_c_=_0

  Are you sure? yes | no

Yann Guidon / YGDES wrote 04/08/2022 at 11:41 point

https://en.wikipedia.org/wiki/Cyclic_code

It seems that PEAC is a cyclic code (in a loose way because it's not using Galois fields but still interesting)

  Are you sure? yes | no

Ezra Wolf wrote 01/21/2022 at 23:18 point

Wow !

  Are you sure? yes | no

Yann Guidon / YGDES wrote 01/26/2022 at 05:27 point

Hi Ezra, feel free to elaborate :-)

  Are you sure? yes | no

Yann Guidon / YGDES wrote 01/17/2022 at 08:49 point

Correction to some of my statements here and there (and my videos) : φ is not transcendental but an irrational number that is a solution to the quadratic equation x² − x − 1 = 0. Mea culpa, but at least I learned something from one of the Bogdanoff brothers' countless errors and confusions :-D

  Are you sure? yes | no

Yann Guidon / YGDES wrote 01/14/2022 at 16:55 point

Oh my !

https://en.wikichip.org/wiki/x86/adx

Now, THAT's a hack :-)

And I can see how it could increase the speed of some algorithms, even this one...

  Are you sure? yes | no

Tim wrote 01/14/2022 at 19:44 point

On ever other architecture, yes. On x86 is does not add notably to the entropy :)

  Are you sure? yes | no

Yann Guidon / YGDES wrote 01/15/2022 at 12:04 point

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/23/2021 at 21:33 point

Interesting !

https://research.nccgroup.com/2021/10/15/cracking-random-number-generators-using-machine-learning-part-1-xorshift128/

https://research.nccgroup.com/2021/10/18/cracking-random-number-generators-using-machine-learning-part-2-mersenne-twister/

  Are you sure? yes | no

[deleted]

[this comment has been deleted]

Yann Guidon / YGDES wrote 10/13/2021 at 14:39 point

so far, half a year and 50£.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 09/14/2021 at 04:16 point

George Marsaglia and Arif Zaman got quite a lot figured out in 1991:
https://projecteuclid.org/download/pdf_1/euclid.aoap/1177005878

Why didn't this catch up, but they focus on extremely large sequences with tens of registers of lag, and didn't look enough at just a pair of registers. They also didn't look at the link with checksums.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 09/08/2021 at 07:28 point

Funny !

https://en.wikipedia.org/wiki/Lagged_Fibonacci_generator

Toward a universal random number generator, G.Marsaglia, A.Zaman

  Are you sure? yes | no

Yann Guidon / YGDES wrote 08/05/2021 at 23:24 point

https://www.researchgate.net/profile/Michael_Greenwald/publication/3334567_Performance_of_checksums_and_CRCs_over_real_data/links/53f4b15f0cf2888a749113c5/Performance-of-checksums-and-CRCs-over-real-data.pdf

"In practice, TCP trailer sums outperform even Fletcher header sums."

Ouch.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 08/08/2021 at 00:51 point

"Ones complement (with EAC) Fletcher, however, has a weakness that sometimes offsets its probabilistic advantage: since bytes containing either 255 or 0 are considered identical by the checksum, certain common pathological cases cause total failure of Fletcher-255."

Re-ouch.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 08/05/2021 at 23:09 point

http://users-tima.imag.fr/cdsi/guyot/Cours/Oparithm/english/Modulo.htm and others say that special adders exist to compute mod 2^n-1.

But this is not necessary because the carry is explicitly integrated in the next cycle.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 08/05/2021 at 22:20 point

https://www.cs.auckland.ac.nz/compsci314s1t/resources/Checksums.pdf  is a nice beginner-level overview of error detection basics. Polynomial checks are mentioned, as well as 1s complement / end-around-carry and Fletcher is well explained but no sight of Fibonacci.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 05/24/2021 at 03:52 point

Oh, nice !

https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html#Integer-Overflow-Builtins

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates