Close
0%
0%

miniMAC - Not an Ethernet Transceiver

custom(izable) circuit for sending some megabytes over differential pairs.

Similar projects worth following
This is not Ethernet, though (initially) quite inspired by Fast Ethernet (100Base-TX) and using the same medium (RJ45/CAT5 or better) and magnetics. Or just plain matched impedance diffpair, I will not judge.

It should provide mostly equivalent performance but does not require all the IEEE 802.3xyz hoop-jumping: it could be implemented in a cheap FPGA (Ice45 ? A3P250?), TinyTapeout or even a Pi pico or something.

As it progresses, it looks more like a poor man's Fibre Channel over Cat5 but with strong error detection and low latency.

Applications are wherever you need to deport devices, dozen of meters away with standard cabling (or circuit traces), such as sensors, sound, pictures... at a few megabytes per second (when RS485 won't cut it and you require electrical isolation).

The project evolves towards a MAC+PHY+AFE triad, no autoMDI, with minimal analog magic and some digital trickery. Actual line trilevel coding is not yet definitive, so nibble size will likely change

20251113: Architecture update

20250525: spinning #miniPHY off.

.

miniMAC is an extension and application of #PEAC Pisano with End-Around Carry algorithm because the PEAC algorithm replaces the checksum, the scrambler and the 4b/5b table usually employed by 100Base-TX, see

166. Line encoding with PEAC : OK
167. Line encoding with PEAC: it's alive
168. PEACLS error detection (and correction?)
169. TODO: scan

Application is for embedded/custom data transfers over RJ45/Cat5 UTP/STP where the whole TCP/IP stack is not required and a simple FPGA/microcontroller are more than enough.

.

A PAM3-based + gPEAC18 version is currently in development. The GrayPar layer is replaced by a NRZI+Hammer convolutional error amplifier that detects more errors, faster, optimising the descrambler's work and reducing the FIFOs' depths. There is no error correction but detection is solid and fast, allowing almost immediate retransmit of the most recent data, thus optimising actual bandwidth in difficult environments (like industrial, automotive...).

 
-o-O-0-O-o-
 

Logs:
1. Let's start.
2. Tinkering with CircuitJS
3. AGC
4. Serial vs Parallel
5. Sender-side droop/wander prevention with MLT-3
6. Double parity
7. Flipping
8. a Quasi-Popcount
9. gPEAC again
10. Popcount
11. the bi-flipper topology
12. Ternary encoding
13. The whole flip+parity extension stage
14. Should 4 be flipped...
15. Popcount (better)
16. Run Length Limitation, reduced
17. Making it work
18. Protect the flip bits
19. Error detection
20. Modulation, simplified for now - NRZi
21. Architecture
22. Ternarity and more
23. SU(3)
24. mod3
25. mod3bis
26. Bidir PEAC+ParPop
27. Two lanes
28. Bidir ParPop : OK
29. Protocol
30. DPLL-1
31. I need a name.
32. Reversible PEAC scrambler
33. TMDS
34. PEAC treillis
35. PEAC Reversibility achieved
36. Bidirectional pipeline
37. Dual-lane version: easy
38. DPLL-2
39. Line compensations
40. Maximum avalanche time
41. Transition minimisation
42. The "same" symbol
43. PAM3 and the bi-Trits
44. The ParRot
45. Constellation 2
46. The spreader
47. Gray parity
48. One more bit...
49. Larger words
50. The new parity circuit
51. Permutations
52. Permutations 2
53. The last parity
54. Control Word Sequence
55. Rebuild
56. Detection latency and buffer depth
57. Burst errors
58. Protocol revision
59. MAC & PHYs
60. The error model of PEAC scramblers
61. Shared PEAC
62. Sub-protocol: QSDE
63. miniPHY
64. New pipeline
65. GrayPar17
66. ADD3-EAC
67. Move the NOTs
68. Fewer burst errors
69. spurious errors
70. Stats with GrayPar17 and PEAC16x2
71. Not XOR, not ADD, then what ?
72. Add, Sub and errors
73. Multi-bit errors
74. Galois fields ?
75. Better markers
76. Better stats
77. buffer prefix
78. gPEAC18
79. A matter of channels...
80. Architecture (summer edition)
81. Unit swap
82. Unit swap (2)
83. Extended control words
84. GrayPar18
85. Article
86. A variable-strength adaptive parity
87. New pipeline
88. New pipeline (2)
89. GrayPar18: 5+5+5+3
90. ParGray: it's reversible!
91. New modulo.
92. Nested Gray Parity Loops
93. New new modulo.
94. Error detection: how it fits in the protocol
95. Meanwhile: Koopman
96. Modular adder
97. Coding space
98. Architecture update
99. Parallel bus
100. C/D in the middle
101. gPEAC: the circuit
102. gPEAC circuit correction
103. New new new modulo.
104. Orbit length invariance
105. gPEAC: the (other) circuuit
106. gPEAC18 descrambler (la suite)
107. Bus Inversion
108. Error correction
109. Scrambler (clearer)
110. Scrambler and descrambler: almost ready
111. Scrambler and descrambler: VHDL
112. gPEAC stress test
113. More gPEAC18 results
114. Even more gPEAC18 results
115. PEAC flaw
116. NRZ FTW
117. Checking more assumptions
118. Max Hamming
119. A reversible mixing primitive
120. Hamming distance maximiser
121. MaxHam gen2
122. Brute-forcing the permutations
123. Brute-forcing in C
124. Proof, pudding.
125. Lemon and lemonade
126. Hammer = Hamming Maximiser
127. Looping the Hammer
128. Success
129. No more gPEAC ?
130. 
131. 
132. 
133. 
134. 

For the older obsolete "NRZi" version, as of 20250325 we had:...

Read more »

miniMAC_2026_20251230.tbz

gPEAC + Hammer18 working together

x-bzip-compressed-tar - 193.25 kB - 12/30/2025 at 17:08

Download

Hammer18.tbz

implements the error maximiser

x-bzip-compressed-tar - 69.42 kB - 12/29/2025 at 00:32

Download

PermParam_20251227.tgz

better brute force, better program, better results

x-compressed-tar - 7.54 kB - 12/27/2025 at 05:20

Download

PermParam_20251226.tgz

brute-force search for permutations for the Hamming maximiser

x-compressed-tar - 3.71 kB - 12/26/2025 at 20:08

Download

gPEAC18_VHDL.20251220.tbz

NRZI stage added.

x-bzip-compressed-tar - 156.35 kB - 12/20/2025 at 03:47

Download

View all 31 files

  • No more gPEAC ?

    Yann Guidon / YGDES01/04/2026 at 11:48 0 comments

    The log 181. PEAC w18 is a mixed bag, there are good things but overall, the less good aspects stick.

    Given the great performance bump introduced by the Hammer circuit, I wonder why I still keep the gPEAC layer. There are two reasons: it's the best scrambler, and though the very long periods are great, more importantly it can't be "crashed" (which is a flaw of LFSRs).

    At a higher level, the system is stronger because it associates two circuits of different nature.

    But what if?

    .

    Removing gPEAC removes the scrambler. Is it required ? Even though the miniPHY handles baseline wander (somehow, at least that's the expectation), and even if it uses a sort of convolutional error correction system, the spectrum still needs to be spread. Scrambling also helps a bit to increase error detection.

    LFSRs don't work well, they suffer from easy cancellation. Using the Hammer on the send side would be much better (and it's very tempting) but cancellation remains, even though a wider Hammer could provide hidden states. But it wouldn't work. It probably wouldn't improve error detection, which is already maximised.

  • Success

    Yann Guidon / YGDES12/29/2025 at 17:03 0 comments

    The Hammer18 circuit fits well inside the NRZI unit and instantly delivers fantastic results. Just as expected. That will be my Christmas then!

    Here are the results after 10 millions of injected errors:

     1 : 2241925 - ****************************************************
     2 : 5543183 - ********************************************************************************************************************************
     3 : 1691752 - ****************************************
     4 :  369784 - *********
     5 :  112181 - ***
     6 :   32917 - *
     7 :    6360 - *
     8 :    1401 - *
     9 :     377 - *
    10 :      84 - *
    11 :      21 - *
    12 :      12 - *
    13 :       2 - *
    14 :       0 -
    15 :       0 -
    16 :       0 -
    17 :       1 - *

    The little 1 at the end is an initialisation bug in the program.

    Otherwise, the 4x slope is very apparent: the system has achieved true 2-bit-per-word performance!

    There is a little "bump" at the start, 1/4 of the errors are caught immediately, but the next cycle catches 1/2! Then every number is divided by 4 as expected.

    • CD0:115 : 115 errors were not caught and passed as the first 0-filled word of a control sequence.
    • CD1:6443188 : 2/3 of the detected errors triggered the C/D bit and the rest of the word was not 0. That's 56027× the number of data that passed with a 0.
    • Err:3556696 : the rest (1/3) was caught as number errors: either the number was out of range or the MSB was 1.

    I'm still unable to explain why the CD bit catches 2× more errors than the other methods, though I'm not sure it matters. However, we have a way to extrapolate the error handling capability.

    10 millions (almost 24 bits) give 2 errors at 13 words, 3 more words (4^3=2^6=64) will give about one error in 640 millions (close to 1 billion).

    Notes:

    • the error model that was tested here is just one bit. Results will vary a bit depending on the error model. More bits and at different positions will affect the curve a little, but not radically.
    • Adding another 0-word during C/D transitions will get us in the 5 billion ballpark for rejection. This is actually a requirement since the gPEAC has a one-word latency (hence the bump at the 2nd word) and an error could come at the last data word and go unnoticed, so a second 0-word acts as a checksum check.
    • Since the NRZI+Hamming circuit does a LOT of crazy avalanche, now comes the time to check if a more basic binary 18-bit PEAC could work too. I'm looking back at old logs, to find some already-calculated data, and there is
      • 19. Even more orbits ! : primary orbit of 18 : 172.662.654  (instead of 34.359.738.368 to pass, or 0.5%)
      • 44. Test jobs :     18: Total of all reachable arcs: 68719736689
      • 90. Post-processing : Width=18 Total= 34359868344 vs 34359869438 (missing 1094)

    In fact I now realise that I have very little clue about the topology of w18. I'm taking care of this at 181. PEAC w18.

    And I still need to fix this tiny little bug in the program, that leaves one uncaught error. I didn't notice it before because I always got many leftovers but that bug still appears with no NRZ or Hamming avalanche, even after thousands of cycles : my test code must have a problem somewhere.

    ....

    And it's a weird issue with something that does not clear a register somewhere, it's taken care of by double-resetting the circuit, 2 clocks seems to solve it but what and where... ?

    But at least I can get clean outputs:

    100 errors:

     1 :     23 - ***********************
     2 :     61 - *************************************************************
     3 :     13 - *************
     4 :      0 -
     5 :      3 - ***

    1000 errors:

     1 :    229 - ********************************
     2 :    582 - ********************************************************************************
     3 :    154 - **********************
     4 :     23 - ****
     5 :      8 - **
     6 :      3 - *
     7 :      1 - *

    10K errors

     1 :   2236 - ********************************
     2 :   5625 - ********************************************************************************
     3 :   1635 - ************************
     4 :    351 - *****
     5 :    108 - **
     6 :     31 - *
     7...
    Read more »

  • Looping the Hammer

    Yann Guidon / YGDES12/29/2025 at 00:29 0 comments

    I tried to feed the circuit from itself and see if loops appear, and how long they would be. I start with one bit set:

    • Start= 0 or 11  => cycle in 1777 cycles
    • 1 : 3556 cycles
    • 2 leads to 5 or 16 : 5334
    • 3 : not part of an orbit, leads to a 10668-loop
    • 4 : leads to 1
    • 6 or 8 : 10667
    • 7 : 2666
    • 9 : not part of a cycle, leads to a 889-loop
    • 10 leads to 6/8
    • 12 : not part of a cycle, leads to 10668-loop
    • 13 : leads to 5 or 16 : 5334
    • 14 : leads to 5 or 16 : 5334
    • 15 : loop in 2667
    • 17 : not part of a cycle, leads to a 762-loop

    Actually the lengths of the loops do not matter a lot (unless they are ridiculously short) since this would assume a stream of data=0 which can't happen due to gPEAC.

    The fact that the values change so drastically is a big improvement over the previous simple NRZ scheme, since this totally locks the error, while the NRZ could have its effect cancelled as soon as the next cycle if two bits are flipped at the same location on consecutive cycles.

    Since the expected buffer size (16 or 32 words max) is way shorter than the observed loop length, there is no need to optimise further, as it could only impede (a bit) directed attacks, not improve error detection in common cases.

    And I expect a big jump of error detection eficiency: this additional convolutional layer adds one word of latency but is the key to achieve true 2 bits-per word error detection: 15 words will lead to 1 chance in a billion of leaking an error, and 32 words (64 bits) will make it virtually impossible to pass in real life scenarios.

    This also means that a 32-word buffer is all that's needed. In high/medium error rates, there is no need to transmit "empty commands" anymore, saving 2 or 4 intermediate checksums, or about 1/16th of bandwidth! So this new unit is very important for efficiency overall, though it couldn't be enough all by itself, its proterties are complementary to those of the gPEAC layer. It's the pair that works together to reach the theoretical limit.

    That new unit also over-scrambles the transmitted data stream. This is not the intended function but it does it (somehow) anyway. So the data's properties must be re-evaluated and at least discarded. This implies that the #miniPHY should expect absolutely random data, no special case... This removes one of the (initially supposed) advantages of gPEAC but it's for the overall best.

  • Hammer = Hamming Maximiser

    Yann Guidon / YGDES12/28/2025 at 04:52 0 comments

    So we have a unit that takes 18 bits and outputs 18 bits, whose values are as dependent from the others as permitted by only 64 XOR gates. Let's call it the H unit, because it tries to maximise the Hamming distance of the input word.

    This has been designed already, and I picked the encoder of a previous log:

    As noted before, it has some of the best possible avalanches I could find: 7 7 8 8 8 8 8 9 9 10 11 11 12 14 14 14 14 15 16 is not perfect but as good as it can be, within the constraints of parsimony of the project.

    We also know some of its flaws, by reading the avalanches of the reverse transform : a few combinations need only a few bits to flip only one bit, as indicated by the start of the avalanche sequence : 2 2 3 4 4...

    So, since the unit would be looped on itself, another permutation is required to amplify this even more, making sure these few weak bits are fed back to the strongest ones.

    But first, let's untangle this mess because there are a looot of crossings that would make P&R uselessly harder.

    ...........

    The untangled circuit has had a weird episode.

    The score has changed ! 7 8 8 8 8 9 9 9 9 11 12 13 14 15 15 15 15 15

    The 16-bit avalanche has disappeared, as well as one of the 7-bits.
    However now we have five 15 and the sum of avalanches has climbed to 200 (vs 188 for the original version, max score is 18*18=324). I don't know how or why but that's good !

    .....
    .....

    Version with even fewer crossings :

    I checked : same behaviours, despite changing the place of the end gates.

    ....

    And now, the VHDL, and...

     0 : 13   110111011111011001
     1 : 14   110111011111011011
     2 : 14   110111101111011011
     3 : 13   110111101110011011
     4 : 10   010111100100110101
     5 :  8   110001101001001010
     6 :  8   000000011111111000
     7 :  9   101110111001000100
     8 :  8   111111000010100000
     9 : 15   111111011111111001
    10 : 16   111110111111111101
    11 : 14   111110111111110001
    12 : 11   001101111111010001
    13 :  9   000111100010110101
    14 :  7   110111000000100010
    15 :  9   111111000000101100
    16 :  8   100001001111110000
    17 :  9   000000011111111010
      total:195

    What the !!!!

    7 8 8 8 8 9 9 9 9 10 11 13 13 14 14 14 15 16

    5 fewer in total and 16 has appeared again, yet still only one 7,

    I give up.

    .

    After all the minimum requirements are met and too much optimisation toward high numbers make combinations behaving inappropriately.

  • Lemon and lemonade

    Yann Guidon / YGDES12/28/2025 at 03:39 0 comments

    The last log has shown that the decoder has low performance. The encoder though is quite rad. I could swap them to get the desired result but that would still be insufficient. However it would be good to apply the encoder iteratively to get even better error-spreading.

    This means that the Hamming Maximisation circuit is not just in series with 116. NRZ FTW in the pipeline but integrated inside it. In fact the bit-flipping should be inside the feedback loop for greatest efficiency.

    And the beauty here is that the H transform (?) does not need to be a bijection, or provided in reverse form, so the unit is unique, implemented in one instance in the circuit, possibly shared by the send and receive circuits. It could increase in complexity later...

    However the study of the reverse transform informs us about some key characteristics. Those are not stellar but the iteration brings the required efficiency. It can still be increased by an output permutation, such that cyclic sequences are not too short.

    Another transformation to consider is swapping gates in the maximiser, to reduce wire crossings. I know this will be taken care of by the P&R program but there is no harm in helping it.

  • Proof, pudding.

    Yann Guidon / YGDES12/27/2025 at 09:04 0 comments

    After some computations, I'm sticking to the balanced configuration with 7-14 avalanche for both decoder and encoder, and I have chosen the first one specified below:

    14  188 168  14  7   14  7 - 1965 7515 4021
    
    Perm1965 =
     forward(  3  5  9 17 16 10 15 12  1  2  0 14  6  7 13  8 11  4 )
     reverse( 10  8  9  0 17  1 12 13 15  2  5 16  7 14 11  6  4  3 )
    Perm7515 =
     forward( 17  2 11  0  6 16  8  9 10 14  1  7 13 15  5 12  4  3 )
     reverse(  3 10  1 17 16 14  4 11  6  7  8  2 15 12  9 13  5  0 )
    Perm4021 =
     forward(  4 17  6  5  1 15  7 14 16 13  0  9 10  8 12  2  3 11 )
     reverse( 10  4 15 16  0  3  2  6 13 11 12 17 14  9  7  5  8  1 )
    

    There are 4 variants but they are just exposing the symmetries of the tiles.

    With these sequences of number, it's now time to implement the circuit, starting from the last layer (4021)

    A first obvious feature is that the end of each cascade loops back directly to a quad inverter, so it looks good.

    CircuitJS.

    Level 2, the middle permutation layer, implements Perm7515.

    Things start to look great!

    And finally the outer permutation Perm1965. But the circuit has exceeded the size of links so I have split the decoder from the encoder.

    Encoder:

    Decoder:

    The manual checks have found that the advertised numbers don't work as expected, and look like the previous two manual version I had created before. So a 5th layer and 4th permutation would be required for a full coverage but... Let's stick to the current config for now, because that's already a lot of gates.

    • Encoder: 7 7 8 8 8 8 8 9 9 10 11 11 12 14 14 14 14 15 16 is very good !
    • Decoder: 2 2 3 4 4 5 5 6 6 6 7 7 7 7 7 8 8 9 is lousy, worse than 121. MaxHam gen2 !

    The discrepancy has no apparent reason for now. But there is a simple workaround: swap the decoder and the encoder, so avalanche is better on the receiving side.

    Another "solution" would be to loop the circuits back for another round.

    3rd solution : use the highly efficient encoder in the loop of the NRZI converter, to form a combined unit.

  • Brute-forcing in C

    Yann Guidon / YGDES12/26/2025 at 07:33 0 comments

    As I try to find the best permutations in C, the tiles need to be emulated as well, and I have come up with this C code, optimised for speed on recent OOO CPUs:

    static inline unsigned int Encode_layer(unsigned int X) {
      // inversions
      unsigned int Y = (X & 0x02010) * 30;
    
      // Gray encoding
      unsigned int G = (X & 0x03C1E) >> 1;
    
      return X ^ Y ^ G ;
    }
    
    static inline unsigned int Decode_layer(unsigned int X) {
      unsigned int
        Y =  X & 0x02010,
        A = (Y          ) >> 4,
        B = (X & 0x03018) >> 3;
      Y = Y*30;
        A ^= (X & 0x0381C) >> 2;
        B ^= (X & 0x03C1E) >> 1;
      return X ^ Y ^ A ^ B;
    }
    

    I process both 9-bit "tiles" at once, in the same word (SWAR approach).

    For each tile, bits 5-9 are inverted by bit 4, and bits 0-4 are a Gray cascade.

    Optimising the permutation was a different story and I opted to allocate 3MB of L3 cache...

    ..

    ..

    ..  (typingtypingtypingtypingtypingtypingtypingtyping)

    ..

    And I was dubious but I got some pretty impressive results. The top ones so far look like this, after the 3rd layer :

    min    Encoder      Decoder
    sum  sum max min  sum max min  permutation #s
     5    99  15  2   165  18  3    1663:1314   
     5    99  15  2   165  18  3    1672:1476   
     5    99  15  2   165  18  3    1825:1323  
     5    99  15  2   165  18  3    1834:1485  
     5   100  12  2   118  11  3    1087:1380  
     5   100  12  2   118  11  3    1096:1542  
     5   100  12  2   118  11  3    1249:1371  
     5   100  12  2   118  11  3    1258:1533  
     5   101  14  1   145  14  4     201:1353  
     5   101  14  1   145  14  4     210:1515  
     5   101  14  1   145  14  4      39:1362  
     5   101  14  1   145  14  4      48:1524  
     5   101  14  1   159  12  4     201:1362  
     5   101  14  1   159  12  4     210:1524  
     5   101  14  1   159  12  4      39:1353  
     5   101  14  1   159  12  4      48:1515  
     5   102  11  2   120  10  3     476: 108  
     5   102  11  2   120  10  3     485: 270  
     5   102  11  2   120  10  3     638: 117  
     5   102  11  2   120  10  3     647: 279  
    
    • All of these have a sum of minimum of the avalanche = 5 : either 2 and 3 or 1 and 4
    • Both encoder and decoder have a sum of avalanches > 100, sometimes reaching 159...
    • The maximum of an individual avalanche is in the 10-14 range

    So far I have a good set of criteria to filter further brute-forcing for the 4th layer (3rd permutation). And I have optimised the heck out of it so it produces these results in under 1 second, so the further tests will take only like 20 minutes...

    And if I'm patient enough I can relax the thresholds to find more interesting and better behaving configurations.

    And it seems that 4 layers could be enough after all :-)

    .....................

    Adding the 3rd permutation, I get these results:

     Min      Encoder       Decoder
     Sum    Sum Max Min   Sum Max Min    Perm. numbers
      12    127  13  3    200  14   9    1663:1482:1458
      12    127  13  3    200  14   9    1672:1320:1467
      12    127  13  3    200  14   9    1825:1491:1296
      12    127  13  3    200  14   9    1834:1329:1305
      12    130  12  3    196  17   9     669:1154: 386
      12    130  12  3    196  17   9     678: 992: 395
      12    130  12  3    196  17   9     831:1163: 548
      12    130  12  3    196  17   9     840:1001: 557
      12     99  10  2    203  15  10     218:1241:1478
      12     99  10  2    203  15  10     227:1079:1487
      12     99  10  2    203  15  10      56:1232:1316
      12     99  10  2    203  15  10      65:1070:1325
    

    The decoder gets impressive avalanche but the encoder doesn't take off :-/ the minimum avalanche doesn't get above 3...

    Which is explained by a bug in the code ...

    .......................

    .......................

    And with the stupid bug found, here it goes:

     13   148 12  5   172 13  8   - 1646:1465:1157
     13   148 12  5   172 13  8   - 1655:1303:1166
     13   148 12  5   172 13  8   - 1808:1474: 995
     13   148 12  5   172 13  8   - 1817:1312:1004
     13   148 12  5   184 13  8   - 1646:1465:1166
     13   148 12  5   184 13  8   - 1655:1303:1157
     13   148 12  5   184 13  8   - 1808:1474:1004
     13   148 12  5   184 13  8   - 1817:1312: 995
     13   177 13  8   127 10  5   - 1425:1705: 762
     13   177 13  8   127 10  5   - 1434:1867: 771
     13   177 13  8   127 10  5   - 1587:1696: 924
     13   177 13  8   127 10  5   - 1596:1858: 933
     13   178 12  8   149 13  5   - 1730:  30:1314
     13   178 12  8   149 13  5   - 1739: 192:1323
     13   178 12  8   149 13  5   - 1892:  21:1476
     13   178 12  8   149 13  5   - 1901: 183:1485
     13   195 14  7   150 11  6   - 1731: 854:1016
     13   195 14  7   150 11  6   - 1740: 692:1025
     13   195 14  7   150 11  6   - 1893: 863:1178
     13   195 14  7   150 11  6   - 1902: 701:1187
     13 196 15...
    Read more »

  • Brute-forcing the permutations

    Yann Guidon / YGDES12/25/2025 at 08:33 0 comments

    The manual design is getting tedious, I have to automate it.

    The final form (VHDL source code) will be something like, for the encoder :

    Avalanche_D
    Permutation_P1
    Avalanche_E
    Permutation_P2
    Avalanche_D
    Permutation_P3
    Avalanche_E

    For now, P1=P3 for the ease of design but it doesn't have to be so.

    The permutation can be a cyclical interleaving but the parameters must be determined and it might be too "regular".

    Then a Galois sequence is a natural choice because it is not too regular and can be reversed.

    By chance, we have 18 bits and 18+1=19 is a prime number ! so the generators are 2;3;10;13;14;15
    So for each of the 3 layers, we have 6 sequences to test, but each sequence must also be "rotated" for the 18 bits, 6*18=108 combinations. This makes 108×108×108=1,259,712 combinations to test!

    It is possible but not practical with GHDL. I must code it in C, along with the figures of merit. I can then use __builtin_popcount(x) and other things directly and scan the whole range in a few minutes.

    ....

    OK there is one more parameter to add to the generators : offset and index are different so each layer has 6×18×18=1944 combinations (should be all different). The 3 layers amount to 7,346,640,384 tests... so yeah I'll run that on 6 cores or something, and I'll strip all the computations down to the absolute minimum.

    ...

    But Galois sequences have some lousy patterns:

    Generator[0] =  2 :  1,  2,  4,  8, 16, 13,  7, 14,  9, 18, 17, 15, 11,  3,  6, 12,  5, 10
    Generator[1] =  3 :  1,  3,  9,  8,  5, 15,  7,  2,  6, 18, 16, 10, 11, 14,  4, 12, 17, 13
    Generator[2] = 10 :  1, 10,  5, 12,  6,  3, 11, 15, 17, 18,  9, 14,  7, 13, 16,  8,  4,  2
    Generator[3] = 13 :  1, 13, 17, 12,  4, 14, 11, 10, 16, 18,  6,  2,  7, 15,  5,  8,  9,  3
    Generator[4] = 14 :  1, 14,  6,  8, 17, 10,  7,  3,  4, 18,  5, 13, 11,  2,  9, 12, 16, 15
    Generator[5] = 15 :  1, 15, 16, 12,  9,  2, 11, 13,  5, 18,  4,  3,  7, 10, 17,  8,  6, 14

    The 18s are aligned, right in the middle, reminding me that the sequence is symmetrical :-/

    OTOH we have to start somewhere since 18! = 6.4E15

    My Galois generator can already generate 1944 sequences but that's less than 1 billionth of the sequence space, and I can't scan it exhaustively. It's possible to do 1944^2=3779136 by permuting two of these sequences. Then, it's possible to compute the pipeline up to some point, for the 18 interesting values, and be more thorough at the last stage.

    Some permutation table would help too, though I'm not sure if it would slow things down instead, because the LUT would occupy about 1MiB. Making the array would be worth it only if it is used more than a million times but it's unlikely. However, the 9-bit tiles fits in 2K bytes and it's perfect.

    ....

    C-ing...

    ...

    And I can generate 3.7 million permutations in about 0.24 second, using 3 stages of increasing buffers.

    #include <stdlib.h>
    #include <stdio.h>
    
    #define GENCOUNT (6)
    #define BITS     (18)
    #define MODULUS  (19)
    #define PERMS    (GENCOUNT*BITS*BITS) // 1944
    
    unsigned char generators[GENCOUNT]={ 2, 3, 10, 13, 14, 15 };
    unsigned char sequences[GENCOUNT][BITS];
    unsigned char perms[PERMS][BITS];
    
    unsigned char T[BITS];
    
    static inline void combine_perm(
        unsigned char* perm_val,
        unsigned char* perm_index,
        unsigned char* perm_dst
    ) {
      for (unsigned int i=0; i<BITS; i++)
        perm_dst[i] = perm_val[perm_index[i] & 31];
    }
    
    int main(int argc, char **argv) {
    
      for (unsigned int gen = 0; gen < GENCOUNT; gen++) {
        unsigned int generator=generators[gen];
        unsigned int num=1;
        for (unsigned int index=0; index<BITS; index++) {
          sequences[gen][index] = num;
          num = (num * generator) % MODULUS;
        }
      }
    
      unsigned int perm=0;
      // Scan 18*18*6=1944 sequences
      for (    unsigned int gen = 0;    gen < GENCOUNT; gen++) {
        for (  unsigned int index = 0;  index < BITS;   index++) {
          for (unsigned int offset = 0; offset < BITS;  offset++) {
            unsigned int i=index;
            unsigned int k=0;
            do {
              // convert Galois domain (which excludes 0) to bit index domain
              unsigned int j=sequences[gen][i]-1;
     j +=...
    Read more »

  • MaxHam gen2

    Yann Guidon / YGDES12/25/2025 at 03:28 0 comments

    The previous attempt was very nice and taught me things, which I'll try to distill again, as I start coding it. Several insights affect this reboot:

    • I wanted to avalanche way too much early on
    • I should mix more at the first/outer layer
    • avoid "cancellation" by detecting "loops" of XOR that break the avalanches
    • alternate layer types so mixing is spreadier on the emitter.
    • 4 or 5 layers should be OK to reach H9 on both sides

    So we'll start with the "outer" layer:

    (please excuse the flaw at the lower-left quadrant)

    The Encoder layers stack becomes EDEDE, and decoder DEDED.

    For DE:DE, we have this sandwich:

    Avalanche for D-E: 1, 1, 1, 1, 2, 2, 2, 2, 6, 6, 6, 7, 8, 8, 9, 10, 13, 13

    Avalanche E-D : 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 4, 4, 5, 5, 6, 6, 12, 12

    Now what can we do ? Copy-paste of course ! With a little permutation that will associate the low avalanches with the high ones, and vice versa.

    The Permutation P is reversed by S, which are both used 2×, and the intermediary Q is reversed by T. So overall the complexity is kept low through the duplications.
    Encoder: DPE-Q-DPE
    Decoder: DSE-T-DSE

    Once you have the 9-bit tiles, they are duplicated and linked by permutations...

    The result is still 64 XOR gates and a quite powerful avalanche, both forward and backward. However one of the input avalanches only to 2 bits on the transmission side...

    Sender: 2, 5, 7, 7, 8, 8, 9, 9, 9, 10, 10, 11, 11, 12, 13, 13, 13, 13

    Receiver: 6, 6, 6, 6, 6, 7, 7, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12

    The sender has a bad bit with only 2 avalanches, which could be tied to the C/D bit. But that's a sub-optimal approach...

    There are two ways to go forward:

    1. add another layer. This increases latency, cost, complexity...
    2. explore the design space with an exhaustive method: let a computer generate many permutations and find the best parameters. This looks like the way to go...

  • Hamming distance maximiser

    Yann Guidon / YGDES12/23/2025 at 00:11 0 comments

    A better 3-layers error propagator :

    Avalanche from a single-bit error can reach 14, and only one affects only 8. That's a success.

    However a whole bunch of inputs (7) still flip only 2 transmitted bits. In order to boost the avalanches there, the encoder should have a cascade too...

    So here is the new encoder, with its two cascades to mitigate the poor effect on the line.

    Two inputs only affect 4 bits. These can be allocated to the CD and MSB bits.

    Notice that from the data available, ALL errors with 1, 2, 3, 5, 12, 13, 14, 15, 16, 17 and 18 bits will flip at least 2 decoded bits.

    The whole pipeline is getting insane:

    Here is the link, I removed segments to keep the URL working.

    Total gates count for the encoder and the decoder : 64 XOR each. That's not insignificant but it's still way "cheaper" than the tens of words of FIFO that it saves, and even though the immediate error latency increases by one or two cycles, the worst cases are considerably reduced, without having to increase the avalanche in the PEAC scrambler.

    • Avalanche input-to-output: 4,4,6,6,6,6,6,6,7,8,8,8,8,8,9,9,10,11
    • Avalanche input-to-output: 5,5,6,7,7,8,9,9,9,9,9,9,9,9,9,10,11,11

    The worst-case error avalanche has decreased... but it's still manageable and it's a compromise for the size and the input avalanche. I'm sure there are better circuits but it's the best I can do in a day, it's a considerable improvement over the existing system.

    And now, VHDL is calling.

View all 129 project logs

Enjoy this project?

Share

Discussions

Yann Guidon / YGDES wrote 12/25/2025 at 02:11 point

https://arstechnica.com/gadgets/2023/06/speed-matters-how-ethernet-went-from-3mbps-to-100gbps-and-beyond

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/23/2025 at 08:29 point

Toshiba was on the ParPop thing since at least 1983:

https://patents.google.com/patent/US4587445A/en

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/26/2025 at 02:56 point

https://connect.ed-diamond.com/gnu-linux-magazine/glmf-277/erreurs-en-rafales-multiparites-et-codes-gray-entrelaces

  Are you sure? yes | no

Yann Guidon / YGDES wrote 04/16/2025 at 14:03 point

"Not everybody can take that much fun. :-) "

  Are you sure? yes | no

Yann Guidon / YGDES wrote 04/16/2025 at 13:36 point

My approximate design parameters :

- Fclk = 100 to 130MHz (coming from "somewhere")
- 1 gate = 3 inputs max, approx 1ns (about 30% variation across voltage, process and temp variation)
- Input pins : some can do differential and dual-edge clocking, can be ganged to act as high speed comparators and create crude ADC. But their count must be kept low: 2 is nice, 3 is still OK, 4 has to be worth it.
- Output pins : can do differential and dual-edge clocking too

  Are you sure? yes | no

Yann Guidon / YGDES wrote 04/16/2025 at 05:08 point

https://www.academia.edu/5243141/A_CMOS_Transceiver_for_10_Mb_s_and_100_Mb_s_Ethernet

  Are you sure? yes | no

Yann Guidon / YGDES wrote 04/15/2025 at 11:07 point

Clock recovery can easily get incredibly complex...

https://en.wikibooks.org/wiki/Clock_and_Data_Recovery/Structures_and_types_of_CDRs/The_CDR_Phase_and_Frequency_Detector_PFD

But I can't use analog circuits here

  Are you sure? yes | no

Yann Guidon / YGDES wrote 03/26/2025 at 17:39 point

Some links from Tim for comparison :

https://ams-osram.com/de/innovation/technology/open-system-protocol

https://www.melexis.com/en/news/tech-talks/melibu

https://www.nxp.com/products/BMX6X02

https://www.analog.com/en/products/adbms6821.html

  Are you sure? yes | no

Yann Guidon / YGDES wrote 03/23/2025 at 19:30 point

20250323 !

So far we got :

- a pretty good 16-bit scrambler with a very long period, no risk of crash and parallel implementation : easier operation, lower power and

- the scrambled word has 2 additional marker bits for flagging the data vs control words, also providing a "sticky" checksum flag

- the parity/flip stage performs pop count on the 2 halves, extracting a parity flag that alters the data markers, and flips whole bytes when the number of set bits is lower than 4.

Result :

- 16 data bits are expanded to 20, which is the same +25% overhead as 100BaseT

- Far better and faster error detection, both from the scrambler and the parity/popcount levels.

- The maximum length of consecutive 0s is 10 by construction. This effectively bounds the bandwidth in a strict F-F/10 range, which is an important design parameter for the coupling transformers and the working frequency. This figure changes depending on the modulation scheme, as MLT3 divides the main frequency by 4.

- Droop management will use the popcounts from the parpop level, and insert "correction" packets when the drift exceeds a (configurable) value.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 03/10/2025 at 00:35 point

https://www.iol.unh.edu/sites/default/files/testsuites/ethernet/CL25_PMD/PMD_Test_Suite_v3.5.pdf

"if more than 7 errors are observed in 3x10^11 bits (about 19,770,000 1,518-byte packets), it can be concluded that the error rate is greater than 10^-11 with less than a 5% chance of error. Note that if no errors are observed, it can be concluded that the BER is no more than 10-11 with less than a 5% chance of error."

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/29/2024 at 02:10 point

The seminal 64b/66b paper : http://www.omnisterra.com/walker/pdfs.talks/dallas.pdf

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/23/2024 at 18:03 point

at least it's something.
https://www.falstad.com/circuit/circuitjs.html?ctz=CQAgjCBMB0CsCmBaMAGELqUig7LsAHGLAGwol4AsJUAzCLCJfQsmAFADuIJkl4ZHpTRhBKdgDcGg0Wl79Z6KEwp4U65Wi3QOAc2kjBsSgrHp2+kgQJRSPa1AdaLIAJwnbNd-0hPz3VBQbSDtUPk9-EAISH3DohVdIczAcCCtguJiocLRXaFdYdSLiorAkV3ANWiTxbnTs-hi0LH5xACchZocmhqVA9g6exR6WpUKXEe7hEFoUVsj4iIJXGhCaWqiViO8IjeWaMES3X3AjvazDpP3TmoHNg6Prtb71dgAlcDBaA48wL4OwDUlNNaNB6Fp0HAXH9vjM5p9YdUIe0EatQv84fNAsU7jCfgoMZcXiVcYSjnjdpUcQEUOEiYEbESNn94fSybcaQ5nhTnsz2eBsCIzlwBRpuRjeewACaVHBJSByyrhBXy8AAOVolGlSoUOAOuCSKVW6s1IsSjKOOyZ2tQiuwjNpPiCyjAGq1NLpep1N0iYQShsdUGdzIN4C9LN16xFEbDANZws5wVCQWCwejoftlS5aZlfqg01t8umhrd6btitQXMVIarAa5dhDdoLqB80z58L4IhbPpDyoLgclHp8yeV1fYogqVvJ6gtQIwRVojAgiHnKEBsGsxEorkStFoBWdJB0riIZGwOH3Cov5R4x9PtNwlBwwmsN4wO-YAGNRULDYLKTAkBIH8kKkCs3wkGA24kAUBAXpCODsMIECApQjLhtU6EHFEAA6ADOxD4cgsD4diRSkfkmpEAQswbo6KCuLQFEpOQerCN8zAXiEpElOg+HYDxJTFOA+HiMhApoVAFaYVJhq4QRJF4cRgnFMxeAUE0HG0FxilgGCJCwIQE5BJhWAqRoKD8ZZBFCcJYCidGfCMjITluFGAT-vSrnWqIhrfCIGEnEayhSvAABmACGACuAA2AAuiAxfAuZKNoa7jrw4D+bJWUnCqIXhdF8WJcl4CpZC6UdP8AU-M6wUiEUuLZflUHOvlDWvAEL45a1WHmB8vWxuA3V-HOTBoKC4KpVCHmSdy-6DlSs5LW5vozqtqE2FszLeS5c1OOwAD2URlWgwg7uAmCMPO65KFcMxHcNp3jRdekhBVa6kO9zRPbQj1LsCDEVDA13QMUrh6l913KIwf3HVgz3ncDcAfeoEMGbw0PykkcN0IjQMMKjDGQ5jd147jdKAxdINE+jUNk+EuOUADZ0E29oPgyT33KMzD20DYOAgAAYhAaCsBUsBg7ZQllIggvICAAAi8AxRFACe7D8yAgsi2V70siACsAJIALYAA7JRFAB2n7wJrjLCxA9BkWK0uGxAQttPAACOUXwDbGtax7EDQ4E7tK1FcVq-hADCaufkl9tlbrSSINgQbh4rMf4QA8mFYV4fAcVJ8H5WMArACyEUAB74QAaod8URbodtAA

  Are you sure? yes | no

Ken Yap wrote 10/21/2024 at 22:04 point

Ethernet does not imply TCP/IP. IP is just one of the protocols that can be transported on Ethernet. Historically there were other protocol families such as Novell's IPX which could be carried simultaneously with IP. However IP is now the standard. They all use the now standardised Ethernet frame which is the data link layer.

Your project establishes a different data link layer so cannot be used with standard Ethernet hardware except the cables and the sockets. But you knew that of course.

I've toyed with the idea of using Ethernet cables and sockets for carrying power and signals over up to a few metres purely as a local hack because I have lots of cables. Mainly to counter the proliferation of wall wart power supplies. I would have to ensure those sockets are never used with normal Ethernet connections.  PoE would be a standard way to get what I want with more flexibility but more hardware complexity. But truth be told, wall warts and WiFi or Bluetooth are probably preferable to Ethernet cables all over the place.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/21/2024 at 22:23 point

Hi @Ken Yap !

I remember IPX, programming it in ASM around 1997... and the ISA Ethernet cards using coax, the T and terminators :-D

> But you knew that of course.

I do.

> purely as a local hack

Well, in some cases, I need to go beyond the local hack.

So far, I have implemented TCP/IP boards such as #WizYasep but they are too often overkill for the clients' needs. Hence this project, where I also explore alternative novel data processing techniques.

And in many of my applications, radio links can't be selected.

  Are you sure? yes | no

Ken Yap wrote 10/21/2024 at 22:54 point

Fortunately I have only myself to entertain these days. 🙂

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/22/2024 at 12:59 point

@Ken Yap 

and Hackaday helps you :-)

  Are you sure? yes | no

Ken Yap wrote 10/22/2024 at 13:13 point

There are other entertainments awaiting when I have exhausted/tire of this one. 😊

PS: Tagging the recipient is redundant when the reply is below theirs and just generates annoying email traffic.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/22/2024 at 13:16 point

I didn't get a notification..

Let's see if it works this time.

  Are you sure? yes | no

Ken Yap wrote 10/22/2024 at 13:22 point

You certainly generated two emails to me with your first reply. 👿

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/21/2024 at 20:44 point

https://www.ti.com/lit/an/snla266a/snla266a.pdf

https://ww1.microchip.com/downloads/en/AppNotes/AN2686-Ethernet-Compliance-Test-10BASET-100BASETX-1000BASET.pdf

https://download.tek.com/document/61W_17381_3.pdf

https://cdn.teledynelecroy.com/files/appnotes/100base-t1-ethernet.pdf

https://pdfserv.maximintegrated.com/en/an/AN1738.pdf

https://bitsavers.org/components/national/_dataBooks/1995_National_Ethernet_Databook.pdf

https://ibis.org/summits/feb19/dmitriev-zdorov.pdf

 ...

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/21/2024 at 07:15 point

https://www.falstad.com/circuit/circuitjs.html?ctz=CQAgjCBMB0CsCmBaMAGc0yRQDgCwGZZZJdsUUA2ATn3AHYRYQDGkwwAoAdxApPApoKuNGEEgUHAG7MqFAWljixaNJHD4UVcjvTzVEuBwDms+SuZkFEiRzFVwYfPMhLwKOutf7DOwuF8dXGo8EkhBYUg6WBRcEEQUaB1KFDBtKlg6XG1hXDA3BKSdQTStTOzKXDz-RKoqE0dnKE9Gl2xsG0lTCnaoNx6OyF7VBtQW73dx4dsediaJuZcWyVmncyp1djxwDZmQOq9+A531STA6CAGoI93IfjQqaAzkl50wJAdUNHxT7l5eu5xYRqe4cABOvBEUF6wOucVE5HBkNE4lhgJsdwaaJhUM08L2xwWKH4ExW7hJbjGXjcZNQ-DSmw8m12ZOODPJcXZkghdM5u15fR8XyRWz5mzA21JzDoOj+i0FjklNOkjGUqPp4jU7leomgQrgMroch+FCNpDk+iMAHsbBA0CI6oxfEwtd8ODb2AF7VoHDAXUlXBjHCB8O76F7mD6oHBAuQqHQKEpA1rziGwx07ZHHX7Y1oE0mXVAQB1QzaGJmHQ5EskE5AaFRcLB8Jh-cl44m+IX1AxQ-gM2gAGKZhCIBywIo6l7vRAMZAgAAi8AANgBDACeHD77hAQ4CgfYaDnAEkALYAB3gABMVwA7ADG8E3GZ3EFoX2SUEn8QgA7B8AAjgArvA94bludwvkW3zfguADCAA6ADOADyABmqGIfAAAuHBAA

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/21/2024 at 05:35 point

https://www.falstad.com/circuit/circuitjs.html?ctz=CQAgjCBMB0CsCmBaMAGK0wA4DMrZgBYsA2SFTA48AdhFhAOzqTDACgB3OggqSX2MTSR+IFGwBuDTMNGwefXsPApVa1eGhUwWsWLhsA5tNm9KvEUrFswxAJzhbmKIMcoLrtCmhrs9CN7qdtTBYNTu4dTYKIQQiIExuHaYItRgkILEBJBRIIiYGLj4pHbYWek4SGT6dnZGjsTOOZANTZjOXvWNTa7dUO164sas7i7aqBYDnVysjWOtUNQt4jNO4HYt6bbry5wgtS2WKuSK1mEQfUcHp2h20Haw6k9qYEgOqGjYu1zXGdqqPSoKxAl1EWVM1gATiCCMIBuCbsc2NDBBCEUc0GBYF1Yf1nAjolZgR9ASpXM1rDMUOSlm4PEC9iTTngqGQOozqayTiMLKJxNCWVBuZzFsskQLINswBtwJLWZ4GOFVIy1n9Zds1eIAPZ6AIMFC1Oj6PVi7BsHXpcB6WGGmD0QL2qCykBmi3OPU2hx2-TUvSbZyukDu60Ghz2nyqaWwXAZP1Bl3mkC0D2hn1qaglbB2AjR9LhoIZwSx5S0M3Yd1oABieoQiDDEeez1eiFoyBAABF4AAbACGAE82OWVCBq1bYyM8hAAJIAWwADvAACY9gB2AGN4IPg6OmB91MdnpOR5D4ABHACu8HXA6AA only one transfo, then.

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates