Debugging Robot “Twitching” in GR00T N1.5 Deployment

Project Overview

I worked on debugging a puzzling issue where a fine-tuned NVIDIA GR00T N1.5 model was causing an SO-100 robotic arm to “twitch” instead of performing pick-and-place tasks. The robot would make tiny oscillating movements around the same position, with the gripper staying completely unresponsive.

This project log documents the systematic debugging process that revealed the root cause: an undertrained model that needed significantly more training steps to learn the complete task sequence.

Hardware Setup

Robot: SO-100 robotic arm (6 DOF: 5 arm joints + 1 gripper)
Cameras: Dual camera system (scene + wrist) at 640x480, 30fps
GPU: RTX 4080 Super with 16GB VRAM
Dataset: 20 episodes of pick-and-place demonstrations
Task: “Pick up the striped block and put it into the white plate”

The Problem: Robot Twitching

Initial Symptoms

When deploying the trained GR00T model:

Robot connected successfully
Model inference server running correctly
Robot made tiny oscillating movements around the same position
Robot was not executing the intended pick-and-place task

The model had been trained for 2000 steps and showed good loss convergence, but the physical deployment was completely unsuccessful.

Debugging Approach

Step 1: Enhanced Logging Implementation

Added comprehensive logging to both the inference server and robot client to understand what data was being exchanged.

Server-Side Logging (service.py):

Request counter for each inference call
Input data keys and shapes
Inference time in milliseconds
Output action statistics (min/max/mean values)

Client-Side Logging (eval_lerobot.py):

Step counter and observation keys
Current robot state (all 6 joints)
Received action chunks from server
First action being sent to robot

Example Output:

[Request #1] Endpoint: get_action  Inference time: 75.23ms  Response keys: ['action.single_arm', 'action.gripper']    action.single_arm: shape=(16, 5), min=-45.23, max=67.89, mean=12.34    action.gripper: shape=(16, 1), min=-0.30, max=0.50, mean=0.15

[CLIENT] First action to send to robot:    shoulder_pan.pos: -12.34

Step 2: Diagnostic Tools Development

Created several diagnostic scripts to isolate the issue:

Joint Testing Tool (test_joint.py):

Tests individual joint control to verify hardware functionality
Takes joint number (1-6) and value (-100 to 100) as input
Helps isolate hardware vs. software issues

Robot State Monitor (monitor_robot_state.py):

Real-time monitoring of robot joint positions
Verifies encoder readings match values sent to server

Step 3: Dataset Visualization

Uploaded the dataset to Hugging Face Hub and used Rerun visualization to inspect the recorded episodes:

# Upload dataset for analysis
python scripts/so100_groot/upload_to_huggingface.py \    --local-dir ~/.cache/huggingface/lerobot/rubbotix/striped-block \    --repo-id sparkmt/so100-striped-block

# Visualize episodes
./scripts/so100_groot/visualize_episodes.sh 0

This revealed the difference between State (robot’s actual position) and Action (commanded target position), which was crucial for diagnosis.

Critical Discovery: The Root Cause

Key Finding from Logs

The robot was making very small, uncertain movements instead of decisive actions. The logging revealed that the model was outputting actions with very small magnitudes, indicating high uncertainty.

The Root Cause: Undertrained Model

Analysis revealed that the model was severely undertrained at 2000 steps.

Evidence:

Tiny action magnitudes: Model outputting very small actions due to high uncertainty
Lack of task structure understanding: Model hadn’t learned the full sequence (approach → grasp → lift → move → release)
Closed-loop instability: Small errors accumulating, causing the robot to end up in states the model never saw during training

The Solution: Extended Training

Training Requirements Analysis

Task Complexity	Minimum Steps	Recommended Steps
Simple reaching	1,000-2,000	5,000
Pick and place	5,000-10,000	10,000-20,000
Complex manipulation	10,000-20,000	20,000-50,000

The pick-and-place task required 10,000-20,000 steps, not the 2000 steps initially used.

Training Configuration Update

Updated the training script to resume from checkpoint-2000 and continue to 10,000 steps:

# Resume training configuration
RESUME_TRAINING="true"
MAX_STEPS=10000  # Increased from 2000
BATCH_SIZE=16
GRADIENT_ACCUMULATION_STEPS=8
LORA_RANK=32
LORA_ALPHA=64

Automatic Checkpoint Detection:

if [ "$RESUME_TRAINING" = "true" ]; then    if ls "$OUTPUT_DIR"/checkpoint-* 1> /dev/null 2>&1; then        LATEST_CHECKPOINT=$(ls -td "$OUTPUT_DIR"/checkpoint-* | head -1)        echo "Resuming from latest checkpoint: ${LATEST_CHECKPOINT}"        TRAIN_CMD="$TRAIN_CMD --resume"    fi
fi

Why 2000 Steps Was Insufficient

1. Model Hadn’t Learned Task Structure

At 2000 steps: Learning basic correlations between observations and actions
Missing: Full sequence understanding of the manipulation task

2. Action Magnitude Learning

From deployment logs at 2000 steps, the model was outputting very small actions because:

Hadn’t learned correct action scale
Being overly cautious due to high uncertainty
Loss function hadn’t fully converged

3. Closed-Loop Instability

Small errors accumulate: Undertrained model makes uncertain movements
Compounding problem: Robot ends up in states model never saw during training
Result: Model gets “confused” and twitches in place

Technical Implementation Details

Enhanced Logging Code

Server-side logging addition:

logger.info(f"[Request #{request_counter}] Endpoint: {endpoint}")
logger.info(f"  Data keys: {list(data.keys())}")
logger.info(f"  Inference time: {inference_time:.2f}ms")
for key, value in response.items():    if isinstance(value, np.ndarray):        logger.info(f"    {key}: shape={value.shape}, min={value.min():.2f}, max={value.max():.2f}, mean={value.mean():.2f}")

Client-side logging addition:

logger.info(f"[STEP {step_count}] Getting observation...")
logger.info(f"  Current robot state:")
for key, value in current_state.items():    logger.info(f"    {key}: {value:.2f}")
logger.info(f"[CLIENT] First action to send to robot:")
for key, value in first_action.items():    logger.info(f"    {key}: {value:.2f}")

Dataset Upload and Visualization

Created tools for dataset management and analysis:

# Upload script for Hugging Face Hub
def upload_dataset(local_dir, repo_id):    # Validate dataset structure    required_files = ['meta/info.json', 'meta/stats.json', 'meta/tasks.parquet']    for file in required_files:        if not os.path.exists(os.path.join(local_dir, file)):            raise FileNotFoundError(f"Required file {file} not found")        # Create repository and upload    api = HfApi()    api.create_repo(repo_id, repo_type="dataset", exist_ok=True)    api.upload_folder(folder_path=local_dir, repo_id=repo_id, repo_type="dataset")

Results and Validation

Training Progress

After resuming training from 2000 to 10,000 steps:

Significant MSE improvement: From ~24 at 2,000 steps to ~6.3 at 10,000 steps
Loss continued to decrease: Model learned more complex patterns
Action magnitudes increased: Actions became more decisive
Task structure emerged: Model learned the complete manipulation sequence

Deployment Results

With the extended training at 10,000 steps:

Task execution achieved: Robot now performs the complete sequence (approach → open → grasp → lift → move → release)
Mixed joint performance: Some joints (1, 2, 3, and 5) showed accurate predictions matching ground truth, while others (joints 0 and 4) had less precise control
Execution challenges: Task completion takes 3+ minutes with multiple retries due to shaky movements
No more twitching: Robot executes purposeful movements instead of oscillating in place

Performance Assessment

The model demonstrates partial success:

✅ Complete task sequence understanding
✅ Elimination of twitching behavior
⚠️ Uneven accuracy across different joints
⚠️ Execution speed and precision need improvement
⏳ Further iteration required for reliable performance

Technical Insights

1. Training Duration is Critical

2000 steps = memorized patterns (MSE ~24, twitching behavior)
10,000 steps = learned task structure (MSE ~6.3, complete sequence execution)
Manipulation tasks require significantly more training than simple reaching
Even at 10,000 steps, performance varies across joints, suggesting more training may be beneficial

2. Logging is Essential for Debugging

Without detailed logs, impossible to diagnose model-robot mismatch
Action statistics (min/max/mean) reveal model confidence levels
State vs. action comparison shows tracking performance

3. Visualization Tools are Invaluable

Dataset visualization revealed data quality and action ranges
State vs. Action plots diagnosed tracking issues
Essential for understanding model behavior

Current Status

Extended training completed (2000 → 10,000 steps)
MSE improved from ~24 to ~6.3 (74% improvement)
Robot deployment shows partial success with complete task sequence execution
Performance varies across joints with some showing accurate control while others need improvement
Comprehensive debugging infrastructure in place
Dataset published to Hugging Face Hub: sparkmt/so100-striped-block

Summary

This debugging session demonstrated that what appeared to be a complex hardware or software integration issue was actually a fundamental training problem. The “twitching” behavior was caused by an undertrained model that hadn’t learned the complete task structure.

The systematic debugging approach using enhanced logging, diagnostic tools, and dataset visualization was crucial for identifying the root cause. The solution required extending training from 2000 to 10,000 steps, resulting in a 74% improvement in MSE (from ~24 to ~6.3) and enabling the robot to execute the complete pick-and-place sequence.

While the model now performs the full task (approach → open → grasp → lift → move → release), execution remains slow and imprecise, with uneven performance across different joints. This suggests that further data collection and training iterations will be needed to achieve reliable, smooth manipulation.

The project demonstrates the iterative nature of robotic AI development and the importance of adequate training duration for manipulation tasks. The debugging infrastructure and systematic approach provide a foundation for continued improvement.

Next: Collecting additional training episodes and exploring Isaac Sim integration for synthetic data generation.

Model: NVIDIA GR00T N1.5 (3B parameters)
Training Method: LoRA fine-tuning (extended to 10,000 steps)
Hardware: SO-100 Robot Arm, RTX 4080 Super
Framework: Isaac-GR00T + LeRobot