BubbleSynth: Bubbles, Lasers, and a Modular Synth

Origin

Back in 2014, while at MIT Media Lab, I built BubbleSynth — an interactive installation where floating soap bubbles became voices in a generative composition. The original ran on a Mac Mini with openFrameworks doing computer vision and SuperCollider handling synthesis, communicating over OSC. Bubbles emerged from a machine, drifted past stage lights against a black backdrop, and a webcam picked them out as bright "blobs." Position, size, and lifespan adjusted a harmonic sine wave; their disappearance triggered samples — including a few from David Lynch's Dune (1984) for the sci-fi atmosphere I was after at the time. "The spice must flow."

The 2014 paper's future-work section ended with a speculation: that one could eventually use a depth camera to eliminate the black-backdrop requirement, but it was unclear whether a Kinect's IR could even see a thin-film bubble. A decade later, the RealSense D415 answers that — its depth stream picks up bubbles cleanly, segmenting them by distance from the camera rather than by visible-light contrast. The new installation runs in ambient light, against any background.

And the thing I'd wanted since I first picked up a galvo laser is now in place: a laser that physically traces each bubble's contour in mid-air, live.

What's New

Three big additions, plus a full software rebuild:

A galvo laser traces each bubble in mid-air, live. As bubbles drift through the tracking area, an RGB galvo projector outlines them in real time — bubbles become luminous, their contours drawn into the same physical space they're moving through.
Depth-based bubble segmentation. The RealSense D415's depth stream isolates bubbles by their Z-distance from the camera, not by brightness contrast. No black backdrop, no controlled lighting, no shinbusters. The system works in ambient room light, against any background — a major freeing of the installation requirements compared to 2014.
A modular synth (VCV Rack 2) voices them. Replacing SuperCollider with a node-based modular environment gives me visual patching, hot-swappable timbres, and a much richer voice palette. Two oscillator voices (Mutable's Plaits in 2-operator FM mode) sing through the bubble positions, with fast LFO vibrato giving them a bird-like warble. Underneath, an ambient drone holds the harmonic foundation in C Lydian.
An Arduino controls a relay + servo for autonomous operation. The bubble machine fires on demand from the host machine; a servo-mounted wand rises into a fan's airflow when the system wants to coax out a big bubble.

The underlying perception loop (camera → bubble detection → OSC → synthesis) is conceptually unchanged from 2014, but every component has been rebuilt with current tools, and depth segmentation removes the original's environmental constraints.

Hardware

Computer: Intel NUC11PAHi7 (i7-1165G7, Iris Xe graphics, 16GB RAM, Windows 11)
Camera: Intel RealSense D415, using the depth stream. Depth-based segmentation isolates bubbles by their distance from the camera. The IR projector is on (it's what enables depth computation). No backdrop required; works under ambient light.
Laser: RGB galvo projector accepting ILDA-format point streams (~30k pps)
Bubble machine: Off-the-shelf party bubble machine, controlled by an Arduino + 5V relay board
Bubble wand: Servo-mounted wand (TD 8130MG metal-gear, 13 kg-cm torque) that dips into bubble solution and rises into a fan's airflow to produce the occasional very large bubble

What's not needed compared to the original 2014 build: no black backdrop, no upturned stage lights, no controlled lighting environment. The depth-based isolation pipeline tolerates whatever lighting and background the venue offers.

Software

TouchDesigner runs the perception and laser pipeline:
- Depth-threshold isolation — anything within the bubble field's distance range from the camera is treated as foreground
- Cleanup and binarization
- Trace SOP extracts contour polylines for the laser
- Blob Track TOP extracts centroids and bounding boxes for the synth
- cvOSCcv sends bubble parameters out via OSC
- Laser CHOP drives the galvo with the traced contours
- Arduino control via Serial DAT
VCV Rack 2 runs the audio engine:
- Two Plaits oscillator voices, 2-op FM, with fast vibrato LFO into FM CV
- Pitch driven by horizontal bubble position; amplitude by vertical
- A scale-quantizer on the audio output (rather than the CV) — an accidental discovery that gives scale-flavored wave-shaping rather than true pitch correction. Sounds surprisingly good on FM voices.
- Sample triggers (water bloops, crystal mallet pings, water drum hits) on bubble-disappearance events, quantized to a 16th-note grid for musical timing
- Voxglitch Wav Bank for randomized sample playback
- Valley Plateau reverb for the spatial wash
- Output via VB-CABLE back into TouchDesigner for monitoring
Arduino running a simple serial protocol: '1'/'0' for relay on/off, '2' for timed blast, '3' for cycle mode, 'U'/'D' for servo wand up/down

How It Works

A bubble enters the tracking area. The RealSense D415 outputs a depth image where every pixel encodes its distance from the camera. Thresholding by distance — keeping only pixels within the bubble field's working range — isolates the bubbles regardless of what's behind them or what color the ambient lighting happens to be. The thresholded silhouette feeds two parallel pipelines:

For the laser: a Trace SOP extracts the contour polyline. That polyline goes straight into TouchDesigner's Laser CHOP, which converts it into a stream of X/Y points at the galvo's native sample rate. The laser physically draws the bubble's outline in real time, in air. As the bubble drifts, the outline drifts with it.

For the synth: a Blob Track TOP outputs the bubble's centroid. Those coordinates get packed into OSC messages and sent to cvOSCcv in VCV Rack. Horizontal position drives oscillator pitch; vertical position drives amplitude through a VCA. Two bubbles, two voices. When a bubble pops or floats out of frame, a clocked sample trigger fires — a water bloop, a mallet ping, a percussion accent — quantized to the nearest 16th note so events land musically rather than jittery.

The whole loop closes in around 15ms end-to-end, well below the perception threshold for cause-and-effect.

In aggregate, the bubbles are making the music. The piece is real-time and generative — it could run for hours without ever playing the same composition twice, because no two clouds of bubbles drift the same way. The audience hears a contemplative, slow-drifting score whose every note traces back to a bubble somewhere in the field.

Modes (and a note on this installation's safety posture)

The original 2014 BubbleSynth offered three operating modes: Generative (autonomous, no visitor interaction), Direct Control (visitors physically pop bubbles to play the installation like an instrument), and Group (visitors collaborate to keep a large bubble alive in the field).

For this installation, we're running Generative Mode only. Visitors are kept outside the laser's active scanning volume — the galvo is bright enough to be visible tracing bubbles, which means it's bright enough to be a hazard at close range. Better to let the piece play itself and watch from outside the volume than to invite visitors into the beam path.

Within Generative Mode, the patch has two sonic personalities I switch between:

Whimsy Mode (default for this run): Bird-like FM voices warbling over a C Lydian drone, with water-bloop and crystal-mallet sample triggers on bubble events. Light, contemplative, magical.
Dune Mode (preserved from the original): Hijaz scale, Middle-Eastern percussion, occasional spoken samples from Dune. "The sleeper must awaken."

Calibration

The laser galvo and the camera have to see the same world. Physically, they're mounted on a shared bracket as close to coaxial as the hardware allows — minimizing parallax so software correction stays simple. In TouchDesigner, a four-corner pin transform maps camera-image coordinates to laser galvo coordinates. Project a square from the laser, mark its corners in the camera image, drag corner-pin handles until they match. Five minutes of setup once everything's positioned.

What I'd Do Next

Velocity to timbre mapping — feed bubble velocity into Plaits' TIMBRE CV so fast-moving bubbles get brighter, still bubbles get duller. Reinforces the bubbles-as-musicians claim.
Inter-blob distance to harmonic interval — when two bubbles drift close, voices play similar pitches; far apart, they spread to wider intervals. The bubbles develop relationships.
Use the depth channel for true Z-position synthesis parameters — currently I'm depth-thresholding but throwing away the actual Z values once I have a 2D mask. Feeding bubble Z into the synth would give a third independent axis of control per bubble.
A second laser for true 3D outline projection — currently the laser traces bubbles in 2D from the camera's viewpoint. With two lasers offset by a known angle, the outline could be projected at the bubble's actual 3D position, visible from a wider range of audience viewing angles.
Bubble path trails — galvo-drawn motion histories trailing behind each bubble, drawing its entire flight as a fading luminous line.

Thanks

The original BubbleSynth (NIME-style paper, 2014) was built at MIT Media Lab. This rebuild owes a lot to the TouchDesigner community, the VCV Rack and Bogaudio module developers, the Voxglitch and trowaSoft authors, and everyone who's posted galvo-laser hacks on Hackaday over the years.

Project Details