Close

MediaPipe Pose Detection: Real-Time Performance Analysis

A project log for GestureBot - Computer Vision Robot

A mobile robot that responds to human gestures, facial expressions using real-time pose estimation and gesture recognition & intuitive HRI

vipin-mVipin M 08/08/2025 at 17:220 Comments

Validating Google's MediaPipe framework for robotics applications on ARM64 hardware - from installation challenges to 6 FPS pose tracking

The Raspberry Pi 5 represents a significant leap in ARM-based computing power, but can it handle Google's sophisticated MediaPipe pose detection framework in real-time? While developing a gesture-controlled robot, I needed to validate whether the Pi 5 could serve as both a camera platform and pose detection engine, or if I'd need to offload processing to more powerful hardware.

After extensive testing and optimization, I successfully integrated MediaPipe 0.10.18 with a 30 fps ROS 2 camera pipeline, achieving stable 6.1 FPS pose detection with full 33-landmark tracking. Here's the complete technical analysis of MediaPipe's performance on Pi 5 hardware, including the challenges, solutions, and honest assessment of its capabilities for robotics applications.

The Hardware Foundation

System Specifications:

Performance Baseline: Before MediaPipe integration, the camera system was already optimized to deliver:

The question was: could this foundation support real-time pose detection?

The MediaPipe Challenge

MediaPipe is Google's framework for building multimodal applied ML pipelines. The pose detection model uses BlazePose, a lightweight neural network designed for real-time inference. However, "real-time" on mobile devices doesn't necessarily translate to "real-time" on ARM64 single-board computers.

MediaPipe Pose Detection Features:

The challenge was integrating this sophisticated framework with ROS 2's image transport system while maintaining acceptable performance.

Installation Challenges and Solutions

The Python Environment Problem

Ubuntu 24.04 introduces externally-managed Python environments, preventing direct pip installations:

pi@RPi5:~$ pip3 install mediapipe
error: externally-managed-environment
× This environment is externally managed

This protection mechanism prevents conflicts between system packages and user-installed libraries, but it complicates MediaPipe installation.

Virtual Environment Solution

The solution required creating a dedicated virtual environment with proper ROS 2 integration:

# Install virtual environment support
sudo apt update && sudo apt install -y python3.12-venv python3-full

# Create dedicated environment for computer vision
python3 -m venv gesturebot_env

# Activate and install MediaPipe stack
source gesturebot_env/bin/activate
pip install mediapipe numpy opencv-python

# Install ROS 2 Python dependencies
pip install pyyaml setuptools jinja2 typeguard

Key Insight: The virtual environment needed both MediaPipe dependencies and ROS 2 Python packages to bridge the two ecosystems effectively.

MediaPipe Installation Verification

A simple test confirmed successful installation:

#!/usr/bin/env python3
import mediapipe as mp
import cv2
import numpy as np

# Test MediaPipe initialization
mp_pose = mp.solutions.pose
pose = mp_pose.Pose(
    static_image_mode=False,
    model_complexity=1,
    smooth_landmarks=True,
    min_detection_confidence=0.5,
    min_tracking_confidence=0.5
)

print("✅ MediaPipe Pose initialized successfully")

Installation Results:

ROS 2 Integration Architecture

The Integration Pipeline

The complete pipeline connects ROS 2's image transport with MediaPipe's pose detection:

Camera Node (30 Hz) → ROS Image → cv_bridge → OpenCV → MediaPipe → Pose Landmarks

MediaPipe ROS 2 Node Implementation

Here's the core integration node that bridges ROS 2 and MediaPipe:

#!/usr/bin/env python3
"""MediaPipe Pose Detection Node for ROS 2"""

import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image
from cv_bridge import CvBridge
import cv2
import mediapipe as mp
import numpy as np

class MediaPipeTestNode(Node):
    def __init__(self):
        super().__init__('mediapipe_test_node')
        
        # Initialize MediaPipe
        self.mp_pose = mp.solutions.pose
        self.pose = self.mp_pose.Pose(
            static_image_mode=False,
            model_complexity=1,  # Balance of speed vs accuracy
            smooth_landmarks=True,
            min_detection_confidence=0.5,
            min_tracking_confidence=0.5
        )
        self.mp_drawing = mp.solutions.drawing_utils
        
        # Initialize CV bridge
        self.bridge = CvBridge()
        
        # Subscribe to camera image
        self.image_subscription = self.create_subscription(
            Image,
            '/camera/image_raw',
            self.image_callback,
            10
        )
        
        # Publisher for processed image with pose overlay
        self.processed_image_pub = self.create_publisher(
            Image,
            '/camera/pose_detection',
            10
        )
        
        # Performance tracking
        self.frame_count = 0
        self.start_time = self.get_clock().now()
        
        self.get_logger().info('MediaPipe test node initialized')
    
    def image_callback(self, msg):
        """Process incoming camera image with MediaPipe."""
        try:
            # Convert ROS image to OpenCV format
            cv_image = self.bridge.imgmsg_to_cv2(msg, 'bgr8')
            
            # Convert BGR to RGB for MediaPipe
            rgb_image = cv2.cvtColor(cv_image, cv2.COLOR_BGR2RGB)
            
            # Process with MediaPipe
            results = self.pose.process(rgb_image)
            
            # Draw pose landmarks if detected
            if results.pose_landmarks:
                self.mp_drawing.draw_landmarks(
                    cv_image,
                    results.pose_landmarks,
                    self.mp_pose.POSE_CONNECTIONS
                )
                self.get_logger().info('Pose detected!', throttle_duration_sec=1.0)
            
            # Convert back to ROS message and publish
            processed_msg = self.bridge.cv2_to_imgmsg(cv_image, 'bgr8')
            processed_msg.header = msg.header
            self.processed_image_pub.publish(processed_msg)
            
            # Performance tracking
            self.frame_count += 1
            if self.frame_count % 30 == 0:
                current_time = self.get_clock().now()
                duration = (current_time - self.start_time).nanoseconds / 1e9
                fps = self.frame_count / duration
                self.get_logger().info(f'Processing FPS: {fps:.2f}')
                
        except Exception as e:
            self.get_logger().error(f'Error processing image: {str(e)}')

Key Integration Challenges Solved

1. Image Format Conversion:

2. Timestamp Preservation:

3. Performance Monitoring:

Performance Analysis and Results

Comprehensive Performance Testing

Testing was conducted under realistic robotics workload conditions:

# Terminal 1: Start optimized camera node
ros2 launch camera_ros camera_high_fps.launch.py

# Terminal 2: Run MediaPipe integration
source gesturebot_env/bin/activate
python3 scripts/test_mediapipe_integration.py

Measured Performance Results

MediaPipe Processing Performance:

[INFO] MediaPipe test node initialized
[INFO] Pose detected!
[INFO] Processing FPS: 6.55
[INFO] Processing FPS: 6.56
[INFO] Processing FPS: 6.52
[INFO] Processing FPS: 6.49
[INFO] Processing FPS: 6.47
[INFO] Processing FPS: 6.43

Sustained Performance Metrics:

Performance Breakdown Analysis

ComponentInput RateProcessing RateBottleneck Factor
Camera Capture30 Hz30 HzNone
ROS 2 Transport30 Hz30 HzNone
Image Conversion30 Hz30 HzMinimal
MediaPipe Inference30 Hz6.1 HzPrimary
Pose Rendering6.1 Hz6.1 HzNone

Key Finding: MediaPipe inference is the primary bottleneck, processing every ~5th frame from the 30 Hz camera stream.

Thermal Performance Under Load

Temperature Monitoring Results:

# During sustained MediaPipe processing
CPU Temperature: 65-72°C
GPU Temperature: 58-63°C
Throttling Events: None observed

 The Pi 5 with active cooling maintained safe operating temperatures even under sustained computer vision workload.

Resource Utilization Analysis

System Resources During Operation:

Model Complexity Trade-offs

MediaPipe offers three model complexity levels. Testing revealed significant performance differences:

Model Complexity Comparison

ComplexityProcessing FPSAccuracyUse Case
0 (Lite)~12 FPSGoodFast tracking
1 (Full)~6.1 FPSExcellentBalanced
2 (Heavy)~3.2 FPSBestHigh precision

Recommendation: Model complexity 1 provides the best balance of accuracy and performance for robotics applications.

Optimization Attempts and Results

Attempted Optimizations:

  1. Reduced image resolution: 640x480 → 320x240
    • Result: +40% FPS improvement, significant accuracy loss
  2. Frame skipping: Process every 2nd frame
    • Result: Maintained accuracy, reduced temporal smoothness
  3. Model complexity reduction: Level 1 → Level 0
    • Result: 2x FPS improvement, acceptable accuracy loss

Final Configuration:

Discussions