Close

BACKPRESSURE

A project log for Winter, FPGAs, and Forgotten Arithmetic

RNS on FPGA: Revisiting an Unusual Number System for Modern Signal Processing

bertrand-selvaBertrand Selva 11/16/2025 at 14:310 Comments

Implementing Backpressure in a No-RAM Pipeline

In the latest version of the project (see attached source), I've introduced a simple backpressure mechanism to ensure that each produced data word is truly consumed before a new one is accepted. Here’s how it works:

The design is 100% synchronous. There is no external RAM, no FIFO. The objectives are twofold:
– Never overwrite an output that hasn’t yet been read.
– Never accept a new input until the previous output is consumed.
For debugging and visualization, the 16 output bits will be connected to a parallel bus, monitored by the logic analyzer.

Validation Protocol

The pipeline uses two standard handshake signals:
out_valid: signals that output data is ready.
out_ready: external pulse signaling the data has been consumed.
Control logic is built around:
wire hold_pipeline = (out_valid && !out_ready);
wire step_en = !hold_pipeline;
Pipeline steps forward only if the previous output has been read (out_ready=1). Otherwise, the pipeline is frozen everywhere. The step_en signal then drives all internal enables.
This logic ensures:
– No new input if previous output isn’t consumed.
– No risk of internal overflow, even with zero FIFO.
– The pipeline remains frozen as long as downstream isn’t ready.
That’s exactly what’s needed for streaming scenarios controlled by a slow peripheral or manual inspection.

Signal Legend

SignalDirectionRole/Meaning
in_validInputExternal → FPGA: “Input data is present on in_byte”
in_readyOutputFPGA → External: “I can accept a new input now”
in_byteInputExternal → FPGA: Data to process (8 bits)
out_wordOutputFPGA → External: Output word produced (16 bits)
out_validOutputFPGA → External: “Output data is ready, read me!”
out_readyInputExternal → FPGA: “The output data has just been read/consumed”

Typical Timing

cyclein_validin_readyout_wordout_validout_readyDescription
001--01 Idle state. FPGA does nothing—no input pending, no output produced. in_ready=1: ready to accept. out_valid=0: nothing to read. The outside can inject a value at any time.
111X110 Input received. External presents value on in_byte with in_valid=1 (pipeline ready). The pipeline captures input, computes, produces X1, sets out_valid=1. out_ready=0: output not yet consumed.
200X110 Waiting for consumption. Input transfer done (in_valid=0), output not yet read (out_ready=0). Pipeline blocks: in_ready=0, out_valid=1, output is held stable. Registers frozen, no updates.
300X111 Output read. External acknowledges output by asserting out_ready=1 for one cycle. The pipeline sees the "green light": data is consumed. X1 is readed by ESP32. On next clock, pipeline will advance and be ready for new input.
001X101 Return to idle. After data consumption, pipeline returns to idle: in_ready=1, ready to accept a new word. Cycle repeats as above for the next input.
111X210 New input accepted. External detects in_ready=1 and injects a new value with in_valid=1. The pipeline captures this new input, computes X2, and outputs it. Sequence is analogous to step 1.

Key Principles

hold_pipeline and step_en depending on the states

In the idle state, the pipeline keeps stepping on every clock edge, because step_en = 1 as long as out_valid = 0.
But in_fire = in_valid && in_ready stays at 0, since the MCU has not yet asserted in_valid to signal that an input sample is available.
Result: the pipeline advances, but always falls into the flush branch of the registers (all validity flags reset to 0, no useful data propagates).

In the input-received state, we detect the first clock tick where in_fire = 1, meaning the input sample is actually being consumed.
On that clock edge:


The combinational logic then drives hold_pipeline high.
On the next clock edge, since step_en = !hold_pipeline = 0, the entire pipeline is frozen: no internal register advances as long as the output has not been consumed.

The MCU may take as many cycles as needed to read the output and indicate it by asserting out_ready = 1.
At that moment, hold_pipeline drops to 0, step_en returns to 1, and we fall back to the idle state:

Then the cycle repeats, sample after sample.

Debug and Visualization

With the 16-bit parallel output, I can easily connect a logic analyzer and verify:

Why This Approach?

This architecture is the easier way to do : no RAM, no FIFO. It offers simple, robust control, making it ideal for early prototyping and external capture. While it's not suitable for high-throughput deployment (where a FIFO would be required), this method ensures total reliability in manual or low-rate streaming scenarios. The ESP32 acts as the master: it alternates read and write phases, always transferring a single sample at a time.

Discussions