First log, second stage.

I just started the project and I'm already questioning a few early design choices.

The first stage mixes D with F & G to provide the two addresses, and R is a direct result of the table lookup. There is not much to be done there, and the input entropy is meant to be already "minimal".

The second stage does the dual update (regardless of A and B being identical). That's what I feel needs an enhancement.

First, the source of the twiddling is F and G, which is a dubious choice because it's also used for mixing in the first stage. I would have preferred another source but it's the best I have so far, and using E and H would leak information. I don't want a counter because it could leave a linear trace at the output, in some way, and it's one weakness of RC4.
Then the compaction function is just a XOR of 2×5 bits, it's not bad but not the best. A different reduction function would be required. I have considered PopCount but it has a pretty strong bias toward 16, it would be more predictable.
And the barrel shifter (that performs the rotation) is one easy simple operation but it takes quite some area (5×32 MUX2). It's not the fastest but I don't know yet a better method to shuffle bits. The number of set bits MUST remain equal to prevent bias at the output, so multiplies or division (integer or Galois) are not possible either.

I have thought of extending the width of the LUT but 32 bits would only make the problem worse and the source of S=E^H would not be solved.

For now, it works "well enough" to complement the basic rand(), since the LUT adds a pretty strong non-linearity to the output so it masks short-term correlations and greatly extends the period.

Discussions

Become a Hackaday.io Member