Pushing the RP2350 to the Limit: 5-Voice Polyphony with a Non-linear TPT Ladder Filter

I just hit a performance milestone on Darśana: on a Raspberry Pi Pico 2 (RP2350 @ 204 MHz), running on Core 0 only, I’m now running a 5-voice poly engine built from 3 oscillators + noise + a TPT ladder filter per voice — that’s 15 oscillators, 5 noise sources, and 5 TPT ladder filters in total — while keeping peak CPU usage under 90% and staying stable (no dropouts).

I chose TPT for a simple reason: I wanted a musical, high-quality ladder filter at the heart of the synth. A ladder filter is where a synth’s character often lives — that smooth “grip” when you move the cutoff is a big part of what makes it feel like an instrument. The trade-off is that TPT (especially ZDF-style structures) can be computationally demanding compared to lightweight traditional IIR designs. So I approached it by incorporating as much as I could from guidance in the literature, along with general real-time DSP optimization techniques, aiming to keep the sound quality while fitting within a realistic CPU budget.

TPT + ZDF: aiming for “no one-sample delay” behavior

By using a ZDF (Zero-Delay Feedback) TPT structure with trapezoidal integration, the filter stays stable and “analog-like” even under aggressive cutoff modulation, without the thinness you often get from naïve digital structures.

Key speedups

To reduce real-time cost, I applied these optimizations:

Non-iterative PWL solver (no Newton iterations): Instead of heavy iterative solvers, I use a piecewise-linear (PWL) approach that resolves the saturation characteristic (hard-clipper style) in a single algebraic step.
Division-free inner loop: Any per-sample division is eliminated: reciprocals like inv_denom are computed at the control rate, so the audio loop is dominated by multiplications.
Efficient 2× oversampling with IIR: To mitigate aliasing from nonlinearities, I use 2× oversampling, but with a 4th-order (2-biquad) IIR instead of FIR, achieving strong attenuation with far fewer cycles.
Full unrolling + register-friendly state: The 4-stage ladder is unrolled and internal states are kept as locals to maximize time in registers, minimizing memory traffic and loop overhead.

Hardware-aware optimization: running critical DSP from SRAM to kill jitter

Executing from flash via XIP can introduce unpredictable latency (cache misses → jitter). Audio needs deterministic timing, so:

I mark time-critical routines (e.g., filter_process, the PWL solver) with `__not_in_flash_func` and run them from SRAM.
This improves determinism under load and helps prevent dropouts even when many peripherals are active.

It's an analog blueprint running at digital speed.

TPT + ZDF: aiming for “no one-sample delay” behavior

Key speedups

Hardware-aware optimization: running critical DSP from SRAM to kill jitter

5-Voice Polyphony Working (TPT Ladder Filter Chain Tuned)

Built-In OLED Oscilloscope & Spectrum Analyzer

Discussions

Become a Hackaday.io Member