Close

Solving ROS 2 Cross-Machine Performance Bottlenecks: From 1 Hz to 30 Hz

A project log for GestureBot - Computer Vision Robot

A mobile robot that responds to human gestures, facial expressions using real-time pose estimation and gesture recognition & intuitive HRI

vipin-mVipin M 08/05/2025 at 06:210 Comments

System Overview

I built a distributed ROS 2 system for gesture-based robotic control with three main nodes:

The expected data rate was 30 Hz end-to-end. Actual performance was 1 Hz with frequent topic dropouts and unresponsive GUI tools.

Problem 1: Network Saturation from Raw Image Streams

The Pi 5 camera node published /camera/image_raw and /camera/image_raw/compressed. Local testing confirmed 30 Hz performance. On the dev machine, the raw topic dropped frequently while the compressed topic remained stable.

Root Cause

Bandwidth calculations exposed the bottleneck:

ROS 2 discovery and transport failed to maintain raw topic streaming over WiFi. The system choked on the 27 MB/s stream.

Fix: Compressed-Only Streaming and Local Decompression

Step 1: Disable raw topic in launch file

<remap from="/camera/image_raw" to="/dev/null"/>

Step 2: Decompression node on dev machine

# Subscribes to /camera/image_raw/compressed
# Publishes /camera/image_raw

Results

ConfigurationBandwidthFrame RateReliability
Raw + Compressed~30 MB/s<1 HzUnstable
Compressed Only~3 MB/s30 HzStable


The decompression node restored full frame rate while maintaining image quality for downstream CV nodes.

Problem 2: GUI Lag from Resource Contention

With image transport fixed, RViz and parameter tuning interfaces remained unusable. RViz ran at 2–3 Hz with second-long UI lag.

Root Cause

The dev machine was overloaded:

Despite using a powerful workstation, this saturated CPU usage >90%.

Fix: Distribute GUI to a VM

I moved all GUI applications to a VM on my MacBook Pro using UTM with ARM-native Ubuntu 24.04:

Architecture After Split

Results

MetricBeforeAfter
RViz Frame Rate2–3 Hz30 Hz
GUI Response1–2 sec<100 ms
CPU Load (Dev Machine)>90>#/td###~60>#/td###


Distributing GUI apps restored responsiveness and dramatically improved the development experience.

Implementation Summary

Network Optimization

  1. Modify camera launch file to disable raw stream.

  2. Deploy decompression node on receiving machine.

  3. Monitor bandwidth via iftop or nload.

GUI Distribution

  1. Create Ubuntu VM with hardware acceleration + bridged networking.

  2. Run GUI-only nodes (RViz, rqt, dynamic reconfigure).

  3. Keep compute-heavy nodes on dedicated hardware.

Performance Summary

ResourceBeforeAfterImprovement
Network Bandwidth30 MB/s3 MB/s10x
Frame Rate<1 Hz30 Hz30x
Dev Machine CPU>90>#/td###~60>#/td###Reduced contention
GUI Responsiveness2–3 Hz30 HzReal-time capable

Discussions