Project | Not an Ethernet Transceiver

« Back to project details Sort by:

gPEAC again
2 days ago • 0 comments
The currently used PEAC16x2 scrambler is nice but has two weaknesses :
1. sub-par mixing
2. the "mark" bits are too dependent on the input data
This was already considered back in 170. TODO: scan and elaborated with 1. Let's start.

The solution : a gPEAC chosen for its MSB and LSB.

Ideally, a "perfect" modulus should be chosen but the LSB are always 10. It was considered as a problem, since I wanted to reduce the size of the extra constant adder, so a string of 0s would have been great (or 1s...)

The new strategy is to reuse the existing adder and include the constant as a simplified MUX at the operands. The corresponding register gets updated only if a carry is generated.

The modulus could be "pipelined" with the data input but this would work only for one half of the circuit.

The adder is one thing, but the additional carry adds even more complexity. Actually, it's a delay for the C flag... so OK.

Let's return to the scrambler:

This version uses the simpler, binary PEAC with a typical power-of-two addition.

The non-power-of-two modulo requires a subtraction. Well, fortunately it's a constant value that can be easily encoded with OR and ANDN gates.

Now let's have two phases : one if "normal operation" (as described above), the second is "modulo".
- on the first phase, SumX goes to a register : there is a path to Y.
- on the 2nd phase, Y gets adjusted (on the SumY adder ?) but there is no direct way to loop it back without overwriting X.
So a new, more flexible, datapath is required with more MUX.

Design rule : avoid MUX (or other wide fanout signals) right at the output of the DFF because the fanout increases the latency.

However the "phase" signal could be relatively easily pre-amplified.

So let's start with the X and Y registers and let's go backwards.

Feeding X and Y are the MUXes, each with one constant and the output of one adder. Now this needs to be extended to the 2 adders. The MUXes now serve as a "swap" gate. Each register has an individual "write enable" signal, which could be implemented as another MUX but it's usually implemented within the DFF gate so let's just ignore this for the moment.

Extending from there, the datapath looks like that now:

So each adder can write to either register as needed :
- Phase 0 : SumX writes to Y and SumY writes to X (actual computation)
- Phase 1 : SumX writes to X and SumY writes to Y (modulo adjustment)
The diagram is incomplete (no C & D registers)

One trick affects the order and assignation of the registers: the conditional write to the registers during the modulo operation. This makes the scrambled output valid only during Phase 1... But the word can be taken as a whole : the 18 bits are a valid scrambled code with marker MSB (no carry combination).
Next challenge : define the new constants.
a Quasi-Popcount
3 days ago • 0 comments

Since the last log mentioned that the exact number of set bits is not critical, the popcount circuit can be simplified: only three layers of adders are required!

The trick: input bits d14 and d15 are ORed to provide the carry-in of the last layer, which only provides 4 bits.

It does not really matter if the number of set bits is 15 or 16, but 16 requires 5 bits, which would add an extra layer of adders for nothing.

Intermediary results c0, c1 and c2 are discarded. C1 could be used to estimate the "phase" (state of the MLT3 at the end of the frame) but it's incomplete. And since c0/c1/c2 are not used, the FA block can be replaced by a "MAJ3" gate.

Not represented here : the parity circuit for marker bits m0 and m1, here is a more complete version:

Not trivial but still reasonable. That's fewer Full Adders than usual, see for example http://www.righto.com/2016/01/counting-bits-in-hardware-reverse.html.

The whole pipeline looks good too:

Note how the first two stages work at quite low speed, then speed up at the serialiser.

PEAC16×2 could work at 2× speed and implement byte-wide sub-computations but the gains (speed, surface) are not compelling, as complexity increases. And slower circuits consume less.

The pipeline could be reused, without the MLT3, for inter-chip or inter-board communications, over a balanced pair maybe. Resynchronisation is a whole different subject though.
Flipping
3 days ago • 0 comments
Two bits is all I can afford as overhead.

Two parity bits is cute but can't go far... and the extra degree of freedom actually increases the work on the encoder side, for a meager benefit.

March 9, 2025, 2AM : a different approach is possible !

The 16 data bits are POPCOUNTed.
- LSB is parity.
- MSB (meaning: count < 7) is the "flip" bit that will negate all 16 data bits (before going to MLT3)
This flip bit must also flip the parity of course.

This garanties that there will be a minimal activity on the line, reducing wander: 8 transitions per 16 bits minimum. The parity enables fast error detection (pins the location of a possible error) as well as keeps the behaviour of "Return to Neutral".

Now, the parity can be the last bit. It can be implemented as a sort of "reset of the MLT3 state" instead of just some more data/payload. It also clearly delimits the length of a +1 or -1 run, which is now 8 bits at worst.

The prefix is not "flipped" because a "out of band" packet (start, stop, negotiation...) must start with 11 in all cases.
If some wander can be predicted, the flip bit could be conditioned (reminiscent of some composite RLL codes...). The actual value of F is not checked, such that eventual tweaking/optimisations are possible.

Decoding is very simple, use the F bit to xor the output. And empty packets are easy to spot : if the payload is 00000... then the F bit must be set, as well as parity probably. And if payload is 11111... then F=0

Now, I have to design a POPCOUNT with 16 inputs.
Simplified versions, with occasional off-by-one results, are possible because the F bit is not strictly enforced on the receiving end. F must be set if popcount<8, but some cases of 7 or 9 are tolerable (though they relax the timing and droop requirements).
TODO: schematic of the "pipeline"
Double parity
4 days ago • 0 comments
The format is now set to 20-bit frames that contain 16 bits. The overhead is 25% like 100BaseT but with increased error detection and wander reduction.

The 16 bits are processed with PEAC16, as per the diagram below

A different PEAC flavour is under consideration but the principle remains the same : the 16 bits (d0..d15) are extended to 18, with the extra 2 bits as markers (m0, m1).

2 more bits (p0, p1) are added as even parities : that's a total of 20 bits.

The key property of PEAC is that the whole message can never be all-0 or all-1.

In any case, m+d always have at least 1 bit set and 1 bit cleared.

The parity bits ensure that the whole frame has a "Return To Neutral" behaviour, though it does not remove all wander.

There are 2 parity bits, so 2 choices depending on the m+d parity:
- 00 or 11 if the packet is even
- 10 or 01 if the packet is odd.
One approach to reduce wander is to "break" long strings of identical values when they are +1 or -1. The parities are placed at 1/4th and 3/4th of the frame for maximum reach, as well as overlap with the previous and next words. It is important to provide a strong, strict bound on the maximum run length of 0s.

The parity bits are checked by the receiver but the actual parity pattern is a degree of freedom chosen by the encoder to reduce wander. It is there to provide a garantee that at least 2 transitions will occur in each frame.
Sender-side droop/wander prevention with MLT-3
10/29/2024 at 17:14 • 0 comments
The MLT-3 encoding has some interesting properties. In particular, it "works" in modulo 4: if the word to be sent has a number of 1s that is 0 mod 4 then we know that the output has gone through a multiple of a full cycle.

Similarly, if you take the parity of a given word, when it is even, then the output has done a full cycle or a half cycle. This has 2 combined effects :
1. The parity bit is the least efficient error detection measure, but it is still good to have one.
2. This parity can be chosen (even) to help reduce droop by forcing 1/4th turn on MLT3 for the next sub-word. The next bits will then be "dephased" by 0 or 1/2 turn.
Add to this the PEAC scrambler (with carry out), which ensures there is always at least one bit set (and one bit not set) and the parity becomes a way to force the "output phase" (state of MLT3) to either be
- inverted (+1 becomes -1 and vice versa)
- remain neutral (0 -> 0)
So it can be seen as a way to reduce wander. A bit. But it looks like a very interesting dual-purpose system that can both add protection (error detection) and increase signal integrity.

100Base-TX uses 4b/5b recoding to enhance integrity (by forcing transitions) with a 25% overhead, though unfortunately it gets wiped by the LFSR scrambler. In my early system, a 16-bit word gets 2 more bits (for check & framing), so it the overhead is 12.5% but if that at least 1 transition overall, it's spread over 18 bits... I'm ready to accept a bit more overhead to prevent baseline wander at the source.

There is already work (see the last log) to make the receiver droop-proof but it easily gets complex... My system does not have a specification for the maximum length or duration of a stable line level so I was preparing for the worst at first. But if I can significantly reduce this risk at the source, even at the cost of some bandwidth, I can simplify the receiver. It's even better if the extra bits serve as checks as well.

-o-0-O-0-o-

Now, how many bits are required ?

The (leading) pair of carry bits guarantee that at least one bit in the whole word is set. For 16 bits of data, total is 18 bits. The even parity bit provides "phase inversion", bringing the total to 19 bits.

If we want to ensure that the whole cycle is completed, we need 2 more bits : we get the "RTN" (Return To Neutral) convention that ensures that each word starts and ends with the neutral level at the start of the cycle. There could be 8 combinations:
- 000
- 001 010 100
- 011 110 101
- 111
The middle ones have more potential for reorganising the phases such that they are more balanced. But I have found three problems with this :
- This brings the total to 21 bits which is more than 25% of overhead
- 21 is a very inconvenient number, though it could be worse if it was prime. 21=3×7 so there is still some wiggle room but it's certainly not binary, not even even.
- It is not certain that returning to a given MLT-3 phase is desirable, because this could introduce some "bunching" in the power spectrum, at the packet level.
OTOH using 3 bits per packet could ease other parts of the circuit's design by processing the 18 bits in smaller subsets : 18=6×3 so there could be 3 identical groups of 6 bits to analyse, and local decisions can be individually taken.

The current choice is to set the total number of bits to 20, 16 data, 2 frame/carry, 2 parities. 20=2×2×5 which is far easier to process, serialise, deserialise and mentally process, and the overhead is again 25% just like 100Base-TX. So now we only have 2 extra bits to balance the wander. Let's make the best of it.

The consequence is that a full cycle can't be forced. However, it is necessary to keep the parity even, because the next word should start from 0 (one of the two zeroes). Otherwise it's impossible to know if the balance is good or bad, as a simple 90° shift can change the computations and we suppose it's not possible to know the state of the MLT3 FSM in advance.

For example, if a pattern 000000010100 starts from 0, it's fine because most of the 0 states will occur in the neutral state. But if it starts at +1, then there is a strong imbalance towards +1 which should be broken up.

Hence : we shouldn't create the full-turn-per-word rule to prevent certain harmonics but we need to keep the 0-or-180° constraint to keep the balancing computations simple enough. It's less damaging to the spectrum but it simplifies the decisions for the padding. So it's another encoding constraint : Return To Neutral (or Zero) at the word level, which is enforced with parity, but there are 2 parity bits now.

As a rule of thumb, both P bits should be 1 unless the data's parity is odd, then the only thing to choose is which bit to clear, which is another story...

-o-0-O-0-o-

This raises yet another question: where do these extra bits go ?

Ideally in the middle of the word, so it can "flip" neighbouring parts and prevent long sequences of identical levels.

Given 20 bits, including 2 parities, we get 18 bits of data which is 6×3:
- 2 bits for frame/crc
- 4 data
- 1 parity
- 6 data
- 1 parity
- 6 data
And here you have your frame.

Parity/balancing takes place after the scrambler, in a new pipeline stage.

Parity / popcount is calculated individually on each of the three 6-bit groups, giving three 3-bit numbers that are then compared to elect which parity bit (if any) should be disabled.

Popcounts higher than 3 (4,5,6) wouldn't need to be bothered with because they provide enough swings to rebalance the average, overall. The algorithm should focus on popcounts less than 3 which have a risk of greater imbalance and longer runs of +1s or -1s. But then, it also depends on the phase because the 00000 could occur during a neutral phase. So the popcounts should only count the +1 and -1, not the number of transitions: the input should be filtered by XORs...
playing around with XOR gates...

Or maybe a LUT would be more practical and configurable.
.....
... something something... maybe for the next log.
....

Having only 2 parity bits limits the inter-word wander rebalancing. But each word can be processed in parallel.

The other effect is that there is now a guarantee that each 20-bit word has at least 2 changes. So this means a loss of data (absence of packet) can be detected in less than 20 bauds, and a +1 or -1 state can't last longer than 6 bauds.

.

There is more to come and dig but it looks quite promising. And reducing the number of parities to 2 is also good because it also limits the conditions to consider for flipping either of them.
Serial vs Parallel
10/28/2024 at 01:27 • 0 comments

Historical Ethernet (10Base-T) and Fast Ethernet (100Base-TX) are traditionally working with a serial datastream.

In 100Base-TX, the data are brought in 4-bit nibbles (25MHz), transformed to 5-bit groups which are serialised and from there, scrambling, NRZI and MLT-3 are done purely serially. It's simple and easy, first because "it uses few gates" (particularly for a pre-Y2K technology) and there was not much to do anyway. Baseline wander didn't even seem to be a concern after all.

I could implement the PEAC part with a pair of serial adders and shift registers but this would not bring any advantage, particularly when what people desire is speed. There is a thirst of even higher rates and CRCs keep being used, but how do you run one at 25GHz without crazy silicon technologies? Do you really need to use SiGe, BiCMOS, AsGa, InP ?...

The solution of course is to do as much as possible in the parallel, slower domain, and relegate the serializer to the very last step.

One scaled-down example is the old Actel ProASIC3 FPGA that is rated for a maximum clock speed of 350MHz but has pins that can reach 700M baud using DDR/dual edging. So one clock cycle transfers 2 bits. And even then, the FPGA fabric can't work at such a speed: the adder would work at 100MHz at best.
However if the adder provides the 16+2 bits at once, the frequency is reduced to 39MHz. The high-speed design effort is moved to the high-speed parallel-to-serial circuit. This approach is scalable to other FPGA and even ASIC.
Hopefully it is even possible to preprocess the data to reduce droops.
AGC
10/23/2024 at 20:51 • 0 comments
Trying to adjust the input level... sim here
As is, it doesn't work as expected yet but it's progressing.

it's crude but there are some ideas, such as : grounding the center tap capacitively might not be the best solution.

there are just 2 comparators to detect the +1 and -1 levels, level 0 is in-between.

The diodes serve 2 purposes : on top of performing envelope detection to set the gain (and charge the capacitor), they also define the margin between the top voltage and the detection level. Since the diode drop increases with current, which also increases with the input signal, the margin is reduced a bit when the signal is low. The diodes are 1N34s (germanium) to lower the drop, yet with some impedance.

I want to use the transfo in reverse so the line level is 1V and sensed at 2V, giving enough headroom for detection.

There is a resistive network 1M-100k-1M that "centers" the levels but has enough wiggle room to absorb "droop". The 100k-22pF has a RC time constant for the gain, but the ratio also affects the range of the AGC.
The topology is not quite right but the main ideas are here : separate the gain from the drift, each has their own time constants.
- Drift/Droop/wanter has a very short time constant, in the order of the microsecond,
- gain works on a much longer scale, milliseconds or seconds.
So 2 capacitors are required...

This is quite different from https://www.ti.com/lit/ds/symlink/dp83847.pdf

The TX' center tap is directly tied to 3.3V, which is extra weird because the transfo will trigger the ESD protection diodes if one side pulls down... Or you need Vcc = (2×3.3)-0.7=6V ?? Or the swing is shorter: high-side switching gives 5V-3.3V=1.7V but this creates a 3.4V peak-to-peak signal on the line, which is out of spec for Ethernet.
... and the RX secondary is somehow floating.
Tinkering with CircuitJS
10/21/2024 at 18:53 • 0 comments

Here is the link. I have learned a few important things so far.

The big breakthrough was understanding that the transfo introduces distortion when current passes through it because it loads the ferrite with a magnetic field. This is why distortion disappeared when i put the terminator at the end of the transmission line (in front of the transfo) and the transformer acts as a signal isolator, which gets distorted as soon as the sink impedance increases > 47K

So the link above has no transfo at the source (yeah economy!!!). Otherwise, the current required to drive the magnetics would act like a weird low-pass with some occasional resonances depending on the cable length.

Source impedance does not look critical in the simulation, but I put a 100 ohm terminator anyway. This acts both as absorption of eventual reflection later in the cable, but also as a signal divider to keep the signal level in spec (around 2 to 2.5V). Correction : line level should be 1V.

The sense transformer at the end can be used in "amplifier mode" to double the amplitude of the detected signal, but when the line is "pristine", the amplitude reaches -5/+5V levels so an AGC is required. Said AGC can also work for baseline wander compensation : adjusting for gain and offset.

I'm not sure the idea of introducing "code violations" would work in practice : in a highly-distorting line (with the transfos messing with the signal, reducing the bandwidth dramatically), the break of the sequence heavily disrupts the signal, creating spikes sometimes and then muffling the next symbols. So I have to find a better system to add one optional bit per transmitted word.

Oh, pictures can be uploaded now !
But this sim was conducted at relatively low frequency, without inter-symbol interferences due to reflections on the twisted pair. And yes I have set the transfos' inductance to 350µ, as prescribed in the datasheets.
Of course all of this is simulations with a poorly characterised set of parts. Nothing like SPICE or better : real experiments. But these simulation have shown me a *lot* of effects, including droop ("baseline wander") and I have a better understanding of them now, so I will be less surprised when I encounter them in real life.

Finally : Falstad's CircuitJS might not be a professional, certified, calibrated, bug-free tool. But it has gotten even better over the years and is incredibly useful to quickly test ideas and check for common effects. The more subtle one require SPICE but it's much less convenient. Thank you Paul!
Let's start.
10/20/2024 at 19:45 • 0 comments
So it started maybe two weeks ago when I realised that PEAC w16 could be used as more than a scrambler: Line encoding with PEAC: it's alive. and PEACLS error detection (and correction?)

There are some drawbacks though, but I have found a better scrambler system using a well-selected gPEAC: TODO: scans

So instead of using the binary PEAC w16, I use the closest Perfect to a 3x multiple:
```
196608 : 196605M 196598P 196594P 
```
The ideal modulis is 65536×3=196608, the closest Perfect moduli are 196598 and 196594. You could find your own modulus in gPEAC_scans_1M.tbz if you want to play with a different width.

The idea is that each data word is scrambled by PEAC, which also provides 2 bits of "pseudo-parity" that can only take the values 00, 01 or 10. The mark 11 is used for control/framing : hence the 3x factor !

Two benefits :
- The parity is much less dependent on the MSB of the scrambled data
- The scrambling is much more thorough
One concern :
- the non-binary modulus takes another cycle to compute.
Another concern : baseline wandering.
- At the receiver end, some analog tricks could help.
- At the emitter end, forward correction could be provided by selective negation of the message, but a 3rd header bit would be required.
MLT-3 seems to be the way to go but I find a couple of issues with the classic method. There seems to be a driving conflict because the transformer is driven from both sides with different values, despite the center tap.

From the "transformer's rule", and as confirmed by simulations, the voltage should be equal on both sides of the tap (it's 1:1) but

* if out+ drives 1, the AND makes out- drive 0, instead of -1. and vice versa.

* if either output drives 1, the opposite -1 value gets absorbed by the ESD/clamp/protection diode !

So in either case, the transformer "shorts" one output in the +1 and -1 cases, meaning that a lot of energy gets channeled to GND.
This datasheet shows that the center tap is capacitively coupled, thus helping a bit but not completely.
clamp diodes not shown...

My idea so far is to use a 2-bit quadrature counter, it's very easy to design and the 2 out-of-phase outputs can drive the transformer directly, without conflict, at the cost maybe of some sort of wiggling offset somewhere but... I'll have to test it.

Oh and since the quadrature code can go in either way, it could also encode the word's polarity/bitflip... I'm trying to explore how it could work on CircuitJS.

Design rule : avoid MUX (or other wide fanout signals) right at the output of the DFF because the fanout increases the latency.