Implementing Backpressure in a No-RAM Pipeline
In the latest version of the project (see attached source), I've introduced a simple backpressure mechanism to ensure that each produced data word is truly consumed before a new one is accepted. Here’s how it works:
The design is 100% synchronous. There is no external RAM, no FIFO. The objectives are twofold:
– Never overwrite an output that hasn’t yet been read.
– Never accept a new input until the previous output is consumed.
For debugging and visualization, the 16 output bits will be connected to a parallel bus, monitored by the logic analyzer.
Validation Protocol
The pipeline uses two standard handshake signals:
out_valid: signals that output data is ready.
out_ready: external pulse signaling the data has been consumed.
Control logic is built around:
wire hold_pipeline = (out_valid && !out_ready);
wire step_en = !hold_pipeline;
Pipeline steps forward only if the previous output has been read (out_ready=1). Otherwise, the pipeline is frozen everywhere. The step_en signal then drives all internal enables.
This logic ensures:
– No new input if previous output isn’t consumed.
– No risk of internal overflow, even with zero FIFO.
– The pipeline remains frozen as long as downstream isn’t ready.
That’s exactly what’s needed for streaming scenarios controlled by a slow peripheral or manual inspection.
Signal Legend
| Signal | Direction | Role/Meaning |
|---|---|---|
| in_valid | Input | External → FPGA: “Input data is present on in_byte” |
| in_ready | Output | FPGA → External: “I can accept a new input now” |
| in_byte | Input | External → FPGA: Data to process (8 bits) |
| out_word | Output | FPGA → External: Output word produced (16 bits) |
| out_valid | Output | FPGA → External: “Output data is ready, read me!” |
| out_ready | Input | External → FPGA: “The output data has just been read/consumed” |
Typical Timing
| cycle | in_valid | in_ready | out_word | out_valid | out_ready | Description |
|---|---|---|---|---|---|---|
| 0 | 0 | 1 | -- | 0 | 1 | Idle state. FPGA does nothing—no input pending, no output produced. in_ready=1: ready to accept. out_valid=0: nothing to read. The outside can inject a value at any time. |
| 1 | 1 | 1 | X1 | 1 | 0 | Input received. External presents value on in_byte with in_valid=1 (pipeline ready). The pipeline captures input, computes, produces X1, sets out_valid=1. out_ready=0: output not yet consumed. |
| 2 | 0 | 0 | X1 | 1 | 0 | Waiting for consumption. Input transfer done (in_valid=0), output not yet read (out_ready=0). Pipeline blocks: in_ready=0, out_valid=1, output is held stable. Registers frozen, no updates. |
| 3 | 0 | 0 | X1 | 1 | 1 | Output read. External acknowledges output by asserting out_ready=1 for one cycle. The pipeline sees the "green light": data is consumed. X1 is readed by ESP32. On next clock, pipeline will advance and be ready for new input. |
| 0 | 0 | 1 | X1 | 0 | 1 | Return to idle. After data consumption, pipeline returns to idle: in_ready=1, ready to accept a new word. Cycle repeats as above for the next input. |
| 1 | 1 | 1 | X2 | 1 | 0 | New input accepted. External detects in_ready=1 and injects a new value with in_valid=1. The pipeline captures this new input, computes X2, and outputs it. Sequence is analogous to step 1. |
Key Principles
-
Computation (internal register update) happens only on a clock edge where
step_en=1, i.e.:step_en = !hold_pipeline = !(out_valid && !out_ready)
-
If the output is not consum
hold_pipeline and step_en depending on the statesed (
out_ready=0), the entire pipeline is frozen: no new input, no state evolution, everything held stable. -
in_ready=1only when the pipeline can accept a new input (never if the previous output remains unconsumed). -
The external system dictates the pace: as long as
out_readyisn't asserted, there is no risk of data loss or overflow, as the pipeline remains frozen. The external master sets the tempo.
hold_pipeline and step_en depending on the states
In the idle state, the pipeline keeps stepping on every clock edge, because step_en = 1 as long as out_valid = 0.
But in_fire = in_valid && in_ready stays at 0, since the MCU has not yet asserted in_valid to signal that an input sample is available.
Result: the pipeline advances, but always falls into the flush branch of the registers (all validity flags reset to 0, no useful data propagates).
In the input-received state, we detect the first clock tick where in_fire = 1, meaning the input sample is actually being consumed.
On that clock edge:
-
the pipeline captures the sample,
-
computes the response,
-
places X1 on
out_word, -
forces
out_valid = 1(data ready), -
while
out_ready = 0(MCU has not read it yet).
The combinational logic then drives hold_pipeline high.
On the next clock edge, since step_en = !hold_pipeline = 0, the entire pipeline is frozen: no internal register advances as long as the output has not been consumed.
The MCU may take as many cycles as needed to read the output and indicate it by asserting out_ready = 1.
At that moment, hold_pipeline drops to 0, step_en returns to 1, and we fall back to the idle state:
-
in_ready = 1(ready for the next sample), -
waiting for the MCU to assert
in_valid = 1to indicate new input data.
Then the cycle repeats, sample after sample.
Debug and Visualization
With the 16-bit parallel output, I can easily connect a logic analyzer and verify:
- Each
out_validis matched by a subsequentout_ready - No double writes
- No missing data
Why This Approach?
This architecture is the easier way to do : no RAM, no FIFO. It offers simple, robust control, making it ideal for early prototyping and external capture. While it's not suitable for high-throughput deployment (where a FIFO would be required), this method ensures total reliability in manual or low-rate streaming scenarios. The ESP32 acts as the master: it alternates read and write phases, always transferring a single sample at a time.
Bertrand Selva
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.