4-Pose Navigation with MediaPipe

When hand gestures aren't enough, your whole body becomes the remote control

We've all been there – trying to control a robot with hand gestures while your hands are full, wearing gloves, or when lighting conditions make finger detection unreliable. What if your robot could understand your intentions through simple body poses instead? That's exactly what we've implemented in the latest iteration of GestureBot, a Raspberry Pi 5-powered robot that now responds to four distinct body poses for intuitive navigation control.

Why Body Poses Beat Hand Gestures

While hand gesture recognition is impressive, it has practical limitations. Gestures require clear hand visibility, specific lighting conditions, and can be ambiguous when multiple people are present. Body poses, on the other hand, are larger, more distinctive, and work reliably even when hands are obscured or busy with other tasks.

Consider a warehouse worker guiding a robot while carrying boxes, or a surgeon directing a medical robot while maintaining sterile conditions. Full-body pose detection opens up robotics applications where traditional gesture control falls short.

The Technical Foundation: MediaPipe Pose Detection

At the heart of our system lies Google's MediaPipe Pose Landmarker, which provides real-time detection of 33 body landmarks covering the entire human skeleton – from head to toe. Running on a Raspberry Pi 5 with 8GB RAM and a Pi Camera Module 3, we achieve stable 3-7 FPS pose detection at 640x480 resolution.

The MediaPipe model excels at tracking key body points including shoulders, elbows, wrists, hips, and the torso center. What makes this particularly powerful for robotics is the consistency of landmark detection even with partial occlusion or varying lighting conditions.

# Core MediaPipe configuration optimized for Pi 5
pose_landmarker_options = {
    'base_options': BaseOptions(model_asset_path='pose_landmarker.task'),
    'running_mode': VisionRunningMode.LIVE_STREAM,
    'num_poses': 2,  # Track up to 2 people
    'min_pose_detection_confidence': 0.5,
    'min_pose_presence_confidence': 0.5,
    'min_tracking_confidence': 0.5
}

Simplicity Through Four Poses

After experimenting with complex pose vocabularies, we settled on four reliable poses that provide comprehensive robot control:

🙌 Arms Raised (Forward Motion): Both arms extended upward above shoulder level triggers forward movement at 0.3 m/s. This pose is unmistakable and feels natural for "go forward."

👈 Pointing Left (Turn Left): Left arm extended horizontally while right arm remains down commands a left turn at 0.8 rad/s. The asymmetry makes this pose highly distinctive.

👉 Pointing Right (Turn Right): Mirror of the left turn – right arm extended horizontally triggers rightward rotation.

🤸 T-Pose (Emergency Stop): Both arms extended horizontally creates the universal "stop" signal, immediately halting all robot motion.

The pose classification algorithm analyzes shoulder and wrist positions relative to the torso center, using angle calculations and position thresholds to distinguish between poses:

def classify_pose(self, landmarks):
    # Extract key landmarks
    left_shoulder = landmarks[11]
    right_shoulder = landmarks[12]
    left_wrist = landmarks[15]
    right_wrist = landmarks[16]
    
    # Calculate arm angles relative to shoulders
    left_arm_angle = self.calculate_arm_angle(left_shoulder, left_wrist)
    right_arm_angle = self.calculate_arm_angle(right_shoulder, right_wrist)
    
    # Classify based on arm positions
    if left_arm_angle > 60 and right_arm_angle > 60:
        return "arms_raised"
    elif abs(left_arm_angle) < 30 and abs(right_arm_angle) < 30:
        return "t_pose"
    # ... additional classification logic

ROS 2 Integration: From Pose to Motion

The system architecture follows a clean pipeline: pose detection → classification → navigation commands → smooth motion control. Built on ROS 2 Jazzy, the implementation uses three main components:

Pose Detection Node: Processes camera frames through MediaPipe, publishes 33-point landmark data and classified pose actions to /vision/poses topic.

Pose Navigation Bridge: Subscribes to pose classifications and converts them to velocity commands published on /cmd_vel. This node implements the critical safety and smoothing logic.

Velocity Smoothing System: Perhaps the most important component for real-world deployment, this 25 Hz control loop applies acceleration limiting to prevent jerky robot motion that could cause instability or discomfort.

# Launch the complete 4-pose navigation system
ros2 launch gesturebot pose_detection.launch.py
ros2 launch gesturebot pose_navigation_bridge.launch.py

# View real-time pose detection with skeleton overlay
ros2 launch gesturebot image_viewer.launch.py \
    image_topics:='["/vision/pose/annotated"]'

The navigation bridge includes multiple safety layers: pose confidence thresholds (0.7 minimum), timeout protection (2-second auto-stop), and velocity limits that prevent dangerous accelerations. If no valid pose is detected for two seconds, the robot automatically stops – a crucial safety feature for real-world deployment.

Hardware: Raspberry Pi 5 Proves Its Worth

The Raspberry Pi 5 represents a significant leap in embedded AI capability. With its ARM Cortex-A76 quad-core processor and 8GB RAM, it handles MediaPipe pose detection while simultaneously running ROS 2 navigation, camera processing, and system monitoring. The Pi Camera Module 3's 12MP sensor with autofocus provides the image quality needed for reliable landmark detection.

Power consumption remains reasonable at ~8W total system power, making this suitable for battery-powered mobile robots. We've found that active cooling is beneficial for sustained operation, but not strictly necessary for typical use cases.

Real-World Performance and Applications

In practice, the 4-pose system feels remarkably natural. The poses are intuitive enough that new users can control the robot within minutes without training. Response time from pose detection to robot motion is under 200ms, providing immediate feedback that makes the interaction feel responsive.

The system shines in scenarios where traditional interfaces fail:

Hands-free operation: Control robots while carrying objects or wearing protective equipment
Distance control: Operate robots from across a room where gesture details would be invisible
Multi-user environments: Body poses are less likely to trigger false positives from background activity
Industrial applications: Robust operation in challenging lighting or environmental conditions

We've tested the system with users of varying heights and body types, finding consistent performance across different demographics. The pose classification algorithms adapt well to individual differences in arm length and posture.

The Code: Open Source and Ready to Deploy

The entire implementation is open source and built with reproducibility in mind. The modular ROS 2 architecture means you can easily integrate pose control into existing robot platforms or extend the system with additional poses.

Key configuration parameters are exposed through launch files, allowing fine-tuning for different robot platforms:

pose_navigation_bridge:
  ros__parameters:
    pose_confidence_threshold: 0.7
    max_linear_velocity: 0.3      # m/s
    max_angular_velocity: 0.8     # rad/s
    pose_timeout: 2.0             # seconds
    motion_smoothing_enabled: true

The GestureBot project continues to evolve, with pose detection joining gesture recognition and autonomous person following as part of a comprehensive vision-based robotics platform. Each modality has its place, and together they're building toward more adaptable and intuitive robot companions.

Why Body Poses Beat Hand Gestures

The Technical Foundation: MediaPipe Pose Detection

Simplicity Through Four Poses

ROS 2 Integration: From Pose to Motion

Hardware: Raspberry Pi 5 Proves Its Worth

Real-World Performance and Applications

The Code: Open Source and Ready to Deploy

Gesture based Navigation

Building an Autonomous Person Following System with Computer Vision

Discussions

Become a Hackaday.io Member