Build a modern digital walkie-talkie using the popular ESP32! Stray Radio devices automatically discover each other on your local Wi-Fi network, enabling clear voice communication. Broadcast to everyone or select a user from the on-screen list to talk directly (unicast).
The project features an intuitive interface with an LCD screen and rotary encoder control. Perfect for makers, developers, and radio enthusiasts exploring local voice over IP. Dive into open source hardware and software to build and customize your own device!
After covering the hardware in the last post, let's dive into what makes it tick: the software architecture.
When I refactored this project from its old Arduino-based monolith, my main goal was to create a clean, flexible, and scalable system. I built the entire architecture around FreeRTOS and a layered, event-driven model.
Here’s a breakdown of the key layers:
1. The Foundation: BSP & HAL
This is the lowest layer, separating the "application logic" from the physical hardware.
BSP (Board Support Package): This is the core of the hardware abstraction. It’s defined by a single struct (bsp_t) containing function pointers for all hardware operations. A global pointer (g_bsp) is used by all other modules to access hardware. HAL (Hardware Abstraction Layer): These are the drivers using the BSP. They manage the specific peripherals (hal_display, hal_audio, hal_encoder).
2. The Heart: The Main Event Queue
This is the true center of the entire system. Instead of tasks calling each other directly, most modules send messages to a single, central queue: main_queue_event.
This means we have one primary handler in main() that receives all these events and decides what to do next.
3. Synchronization: The System Event Group
If the main_queue is for sending messages, the system_event_group is for signaling states. This is crucial for synchronizing tasks. For example, the lan_task waits for the WIFI_CONNECTED_BIT (set by hal_net) before it starts running.
4. The "Worker" Modules
These are the other key modules that run as separate tasks and interact with the event system:
Wi-Fi & Provisioning (hal_net): Manages the Wi-Fi connection in APSTA mode, including the web portal for setup.
LAN Task & Peer Manager: Manages our UDP-based discovery protocol and keeps a live list (peer_info_t) of all other active 'Stray' devices.
This is the most complex part of the project. It's a two-way process (Record/Transmit and Receive/Play) managed by FreeRTOS tasks, queues, and a shared memory pool.
Transmit Pipeline (Record & Send)
Trigger: The audio_task waits for the PTT_PRESSED_BIT. When it arrives, it mutes the speaker.
Record Loop: While PTT is held, the audio_task:
Grabs a free buffer from the shared_buffer_pool.
Reads the I2S (RX) data into a stereo buffer.
Converts the stereo sample to mono.
Sends this mono audio_chunk_t to the mic_to_net_queue.
Network Send: A separate lan_tx_task waits for chunks on that queue. It wraps the audio in a UDP packet adding header
typedefstruct {uint32_t magic_number; // "secret knock" (e.g. 0xDEADBEEF)udp_packet_type_t type; // Packet type from enumuint64_t sender_sn; // Unique sender serial numberchar sender_name[...
Hello everyone! Welcome to the first proper hardware log for the `stray` project.
My goal here is to walk you through the v1 prototype hardware: why we chose certain parts, how it's all connected, and (most importantly) all the "features" and quirks you'll find if you dig into the design files.
Design Philosophy
The main driver for this design was ergonomics. We wanted a device that felt comfortable to hold, with all the main functions "at your fingertips." This led to some... interesting design choices.
Core Components
Here’s a quick rundown of what's on the board:
MCU: An ESP32-WROOM module, using an external IPX connector for the antenna.
Encoder: An AS5601 magnetic encoder. It’s super smooth. We're currently using I2C to read its position, but the A/B pins are also routed if you prefer that approach.
Display: A 1.54" 240x240 LCD** based on the ST7789 driver. It's connected via SPI, with a PWM pin for backlight control.
Audio Codec: This is the big one. We're using a TLV320AIC3120IRHBR Low-Power Mono Codec. It has an embedded miniDSP and a Class-D speaker amp. It's total overkill, but it was part of the original design.
It's controlled via I2C and streams audio data via I2S.
It's connected to a POM-3046P-R microphone and a small speaker.
Buttons (Simple): There are two dedicated buttons: a side-mounted PTT and a bottom-mounted Power button.
Buttons (Complex): We have dedicated PTT button on the sidde and a PWR button, last in row (bottom left)
The other 7 buttons are all connected to a **74HC164PW 8-bit shift register**. By clocking a single pin, we get a key code back at the input. This scheme is... a bit convoluted. It's not great at detecting multiple key presses, so I simply avoid those scenarios in the software.
Charging: A MAX1811 chip handles the Li-Ion battery charging.
Battery: We managed to fit a 4000mAh battery, which was the limit for the case. We haven't done formal battery life tests, but it doesn't give you that "must always be charging" feeling.
Unused Parts: There's an audio jack wired to the codec, but I just never got around to implementing it in the software. And a microSD card holder which, I was told wouldn't work, so I didn't bother to even try to start.
---
Here's how it looks inside
The Enclosure (A "Hybrid" Approach)
The case is a 2-part custom assembly:
The back panel is a heavily modified ("filed-down") off-the-shelf plastic enclosure.
The front panel is 3D printed, and we also had a variant with a laser-cut metal plate.
The PTT button-lever and the encoder wheel are also 3D printed, but I'm afraid the STL files for these small parts have been lost to time.
Inside, the (two) boards are stacked and mounted using standoffs.
---
"Features, Not Bugs" (The Quirks)
This is the fun part. This board has character.
Quirk #1: Flashing is... special. The microUSB port is for charging and flashing firmware if you manage to make custom cable connecting external USB-to-UART converters Tx/Rx with microUSB's D+/D-.
Quirk #2: Boot Mode.** To put the ESP32 into bootloader mode, you have to **hold one of the 7 keypad buttons***while* pressing the Power button. I don't remember the exact schematic magic, but it's tied to the boot pin.
Quirk #3: Battery "Monitoring". It's primitive. We just do a direct ADC read on the battery voltage. The "percentage" you see in the UI is more of a guess than a precise measurement.
Quirk #4: The Dead SD Card Slot.** Yes, there's an SD card slot on the board. No, it doesn't work. We messed up the pins. :)
The BIG One (My "Favorite" Bug): The output from the 7-button shift register is wired to IO_0. If you know the ESP32, you might know that the standard I2S driver *also* tries to claim IO_0 for its use...
The foundations for this project were laid several years ago. It began as a remote Push-to-Talk (PTT) tangent controller for amateur radios. A colleague of mine designed a few hardware prototypes and handed them over to our team for firmware development.
The first implementation was built on Arduino. It quickly grew in complexity, becoming difficult to maintain and hitting several technical limitations. During this time, our focus also shifted—away from simple radio control and towards direct device-to-device communication over IP.
We managed to build an MVP that worked not only on a local network but also via a custom server on the global internet (WAN). We even had minimal transceiver control working. However, circumstances forced us to shelve the project for a while.
The Rebirth (v2.0)
Even though it was on the shelf, we constantly returned to the project in our thoughts and conversations. So this year, I decided there was no better time than the present to resurrect it.
I started a complete refactoring of the entire codebase. I migrated everything to the ESP-IDF framework and designed a more flexible, modern architecture. My hope is that this new foundation will allow us (and the community) to easily and painlessly adapt the project to newer, simpler, or just different hardware in the future. Most importantly, I decided to take the project fully open-source.
Current Working Components (The MVP)
As of today, these are the key, functional parts of the project:
Network:
A smart Wi-Fi connection manager. On boot, it tries to connect to a network saved in NVS. If no network is found, it launches an HTTP server in AP (Access Point) mode. You can connect to this portal to scan for available networks, enter credentials, and set a custom name for your device.
All audio and discovery packets are sent via UDP using LwIP sockets.
Audio Pipeline: We are using a TLV320AIC3120 codec. This is admittedly overkill for the current task, but it was part of the original hardware design. It's fully configured and handles the analog microphone input and speaker output. The entire audio pipeline is controlled by the PTT button: when pressed, you transmit; when released, you listen.
Display & UI: We're using a 240x240 pixel display with the LVGL library. To simplify my life, the UI was generated in SquareLine Studio under a Personal License, so it's perfect for all hobbyists. You can find the SquareLine project file right in the repo at components/hal_display/ui to modify it yourself.
Controls: Input is handled by a magnetic rotary encoder and a custom-built keyboard. These hardware solutions are a bit specific to our prototype and will probably need their own documentation later on.
What the MVP Does (Summary)
The stray device connects to your Wi-Fi network. It actively pings other units to maintain a live "presence" list of who is online. When you press the PTT button, it streams audio to the selected channel (either Broadcast to all or Unicast to one). When an incoming call is received, the UI automatically focuses on the speaker, making it easy to see who is talking and to reply.
Future Plans & Roadmap
WAN Communication: I want to implement true peer-to-peer (P2P) communication for global use, without relying on a central server. This looks very promising.
LoRa & Codec2: I've already compiled and successfully tested Codec2 for audio compression. The ultimate goal is to move away from Wi-Fi and use LoRa for audio transmission. My initial tests with this are ambiguous and simply need more time.
A Simpler "stray v2": I plan to design a simpler, more accessible hardware version based on the ESP32-S3. This version would have simpler controls and, most importantly, support for a 1W LoRa module (like the E22-400M30S).
Crowd Supply: If there is enough genuine community interest in this project, I would love to explore launching this "v2" hardware on Crowd Supply.