Low-pass filter design

Why Compile-Time Fixed Coefficients and "mul-by-const mod m" LUTs?

My objective is to design the FIR without multipliers, using few wires, only shifts + additions/subtractions and ROM. Just as in the first part of the program, with the binary/RNS conversion.

In RNS, the multiplication
(x × h) mod m
with h a fixed coefficient (filter tap) becomes a simple ROM lookup:


LUT_m[r] = (c * r) mod m
with c = h_int mod m,   r in [0, m-1]

The first case investigated was with free coefficients (no fixed coefficient).
With moduli:
{7, 11, 13, 15, 17, 19, 23, 29, 31},
and max size m=31, encoding requires 5 bits.

All possible multiplications require:


Memory = N × Σ m² × 5 bits
      = N × (7² + 11² + 13² + 15² + 17² + 19² + 23² + 29² + 31²) × 5 bits

Numerically:
Σ m² = 3545

N = 128 ⇒ 2,268,800 bits ≈ 246 M9K (Cyclone IV)
N = 256 ⇒ 4,537,600 bits ≈ 493 M9K

So it's way too heavy to be efficient and in any case completely exceeds the resources of a Cyclone IV.
Moreover, having free filter taps would mean a much more complicated design, with a loader to load the coefficients into the filter. I'm not a fan of this idea, especially since you'd have to load the program into the FPGA before every execution.And it's a try, a prototype. Not a real product...

The Alternative: Compile-Time Fixed Filter Coefficients

We store only the actually needed products because the coefficients are fixed. Then:


Memory = N × Σ m × 5 bits

Numerically:
Σ m = 165

N = 128 ⇒ 105,600 bits ≈ 11.5 M9K
N = 256 ⇒ 211,200 bits ≈ 22.9 M9K

And now, this is entirely manageable and consistent with the "multiply without multiplying" approach.
This is the option that was chosen.

Filter Design: Windowed-Sinc FIR, Symmetry, Quantization and Scaling

The FIR filter implemented here is a windowed-sinc lowpass of length N = 128, using a Hamming window for side-lobe suppression. This is the classic textbook approach for FIR design when you want maximum phase linearity (symetry) and predictable cutoff.

We start from the ideal lowpass impulse response (the sinc), center it around n_c = (N-1)/2, and apply a symmetric window.

h_ideal[n] = 2·fc_norm · sinc(2·fc_norm·(n - n_c)) 
h[n] = h_ideal[n] · w[n] 
h[n] = h[n] / sum(h[n]) // DC gain normalization

With N even, symmetry is perfect: h[n] = h[N-1-n]. This ensures a strictly linear phase and a constant group delay of (N-1)/2 samples.

Quantization and scaling: The entire filter is scaled by a quantization factor Q (here, Q = 1024) and all taps are rounded to integer values:

 h_int[n] = round(Q · h[n])

Every result is just "off by a factor Q". This scale is recovered by dividing by Q at the output if needed.

The following figure show the filter elements.

MATLAB/OCTAVE script to generate filter coefficients


clear; clc;

%% ---------- Paramètres FIR ----------
N          = 128;        % nb de taps (pair recommandé pour Hamming)
fc_norm    = 0.10;       % fréquence de coupure normalisée (jusqu'à 10 % de f_e)
window_fun = @hamming;   % @hamming, @hann, ...
Q          = 1024;       % facteur de quantification (puissance de 2)

%% ---------- Moduli ----------
moduli = [7 11 13 15 17 19 23 29 31];

%% ---------- FIR (fenêtre de Hamming + sinc) ----------
n       = 0:N-1;
nc      = (N-1)/2;
x       = n - nc;
h_ideal = zeros(1, N);

for i = 1:N
    h_ideal(i) = (x(i) == 0) * (2 * fc_norm) + ...
                 (x(i) ~= 0) * (sin(2 * pi * fc_norm * x(i)) / (pi * x(i)));
end

w     = window_fun(N).';
h     = h_ideal .* w;
h     = h / sum(h);           % normalisation DC
h_int = round(Q * h);         % quantification entière

fprintf('--- FIR report ---\n');
fprintf('N=%d, fc_norm=%.3f, window=%s, Q=%d\n', ...
        N, fc_norm, func2str(window_fun), Q);
fprintf('sum(h)=%.12f ; sum(h_int)=%d ~ Q=%d\n', ...
        sum(h), sum(h_int), Q);

sum_m2 = sum(moduli .^ 2);
sum_m  = sum(moduli);
fprintf('Σm^2=%d ; Σm=%d\n', sum_m2, sum_m);
fprintf('General: N*Σm^2*5 = %d bits\n', N * sum_m2 * 5);
fprintf('Fixed :  N*Σm *5  = %d bits\n', N * sum_m  * 5);

% Tracé (réel vs quantifié)
figure;
plot(n, h, 'LineWidth', 1.25); hold on;
stairs(n, double(h_int)/Q, 'LineWidth', 1.0);
yline(0, ':');
grid on;
xlabel('Indice n');
ylabel('Amplitude');
title(sprintf('FIR - HAMMING, N=%d, f_c = %.2f f_s', N, fc_norm));
legend({'h (réel)', 'h_{int}/Q (quantifié)', 'zero'}, 'Location', 'best');

MATLAB/OCTAVE script to generate multiplication functions for all filter taps and moduli

This MATLAB/OCTAVE script automatically generates, for each filter tap and each RNS modulus, a Verilog function implementing modular multiplication:
each function multi<modulus>_h<tap> encodes the multiplication table (h_int[tap] * r) % modulus as a LUT (lookup table).
Inputs and outputs are always coded on 5 bits [4:0].
A comment indicates the integer value of the filter coefficient used.

The resulting file, lut_verilog.v, is included with the project files.


close all
clear all
clc

% Paramètres
N      = 128; % nombre de taps
moduli = [7 11 13 15 17 19 23 29 31];

Q          = 1024;      % facteur de quantification
fc_norm    = 0.10;      % fréquence de coupure normalisée
window_fun = @hamming;  % type de fenêtre

% --- FIR (fenêtré Hamming + sinc) ---
n = 0:N-1; nc = (N-1)/2;
x = n - nc;
h_ideal = zeros(1,N);
for i = 1:N
    h_ideal(i) = (x(i)==0) * (2*fc_norm) + (x(i)~=0) * (sin(2*pi*fc_norm*x(i)) / (pi*x(i)));
end
w = window_fun(N).';
h = (h_ideal .* w); h = h / sum(h);
h_int = round(Q * h);

outfile = 'lut_verilog.v';
fid = fopen(outfile, 'w');

for k = 1:N
    for mi = 1:length(moduli)
        m = moduli(mi);
        c = mod(h_int(k), m);
        fname = sprintf('multi%d_h%d', m, k-1); % tap indices start at 0

        % --- commentaire avec valeur du coefficient ---
        fprintf(fid, '/// %s: h_int[%d] = %d\n', fname, k-1, h_int(k));
        fprintf(fid, 'function [4:0] %s;\n', fname);
        fprintf(fid, '  input [4:0] r;\n');
        fprintf(fid, '  begin\n');
        fprintf(fid, '    case (r)\n');
        for r = 0:m-1
            v = mod(c*r, m);
            fprintf(fid, '      5''d%d : %s = 5''d%d;\n', r, fname, v);
        end
        fprintf(fid, '      default: %s = 5''d0;\n', fname);
        fprintf(fid, '    endcase\n');
        fprintf(fid, '  end\n');
        fprintf(fid, 'endfunction\n\n');
    end
end

fclose(fid);
disp(['Fichier généré : ', outfile]);

Some example functions:

/// multi7_h0: h_int[0] = 0
function [4:0] multi7_h0;
input [4:0] r;
begin
case (r)
5'd0 : multi7_h0 = 5'd0;
5'd1 : multi7_h0 = 5'd0;
5'd2 : multi7_h0 = 5'd0;
5'd3 : multi7_h0 = 5'd0;
5'd4 : multi7_h0 = 5'd0;
5'd5 : multi7_h0 = 5'd0;
5'd6 : multi7_h0 = 5'd0;
default: multi7_h0 = 5'd0;
endcase
end
endfunction

/// multi11_h0: h_int[0] = 0
function [4:0] multi11_h0;
input [4:0] r;
begin
case (r)
5'd0 : multi11_h0 = 5'd0;
5'd1 : multi11_h0 = 5'd0;
5'd2 : multi11_h0 = 5'd0;
5'd3 : multi11_h0 = 5'd0;
5'd4 : multi11_h0 = 5'd0;
5'd5 : multi11_h0 = 5'd0;
5'd6 : multi11_h0 = 5'd0;
5'd7 : multi11_h0 = 5'd0;
5'd8 : multi11_h0 = 5'd0;
5'd9 : multi11_h0 = 5'd0;
5'd10 : multi11_h0 = 5'd0;
default: multi11_h0 = 5'd0;
endcase
end
endfunction

The presence of zeros here is normal: the first filter coefficients are zero. All these functions are written in lut_verilog.v

Addition modulo m

For the FIR sum in RNS, we need to add two residues, each in the range [0, m-1].
If the sum exceeds the modulus, it can only be in [m, 2m-2], so a single subtraction by m is always sufficient to bring the result back into the correct range.

The following Verilog function implements this modular addition for a given modulus :

// Modular addition function for modulus m
function [4:0] add_mod_m;
  input [4:0] a, b;
  input [4:0] m; // the modulus
  reg [5:0] sum; // one extra bit to detect overflow
  begin
    sum = a + b;
    if (sum >= m)
      add_mod_m = sum - m;
    else
      add_mod_m = sum;
  end
endfunction

The result is always in [0, m-1].

In practice, I wonder if it’s not preferable to implement a dedicated modular addition function for each modulus, with the modulus value hardwired in the logic.

This avoids having to pass the modulus value as an input and allows the compiler to generate a fixed (hardwired) comparison and subtraction.

I will therefore implement 9 modular addition functions using this principle in the final program (see next log).

Next step

In the upcoming log, I’ll upload the fully functional program.

This intermediate stage was necessary to clearly formalize the generation of modular addition and multiplication functions and give the MATLAB/OCTAVE script in order to generate the functions for random moduli.

In the end, the original requirements are met: everything runs with LUTs, modular addition is always 5 bits max, just bit shifts, no wide adders, and absolutely no multipliers anywhere.

Why Compile-Time Fixed Coefficients and "mul-by-const mod m" LUTs?

The Alternative: Compile-Time Fixed Filter Coefficients

Filter Design: Windowed-Sinc FIR, Symmetry, Quantization and Scaling

MATLAB/OCTAVE script to generate filter coefficients

MATLAB/OCTAVE script to generate multiplication functions for all filter taps and moduli

Addition modulo m

Next step

Received the 16-channel logic analyzer

First Functional RNS FIR

Discussions

Become a Hackaday.io Member