Project Overview
I worked on debugging a puzzling issue where a fine-tuned NVIDIA GR00T N1.5 model was causing an SO-100 robotic arm to “twitch” instead of performing pick-and-place tasks. The robot would make tiny oscillating movements around the same position, with the gripper staying completely unresponsive.
This project log documents the systematic debugging process that revealed the root cause: an undertrained model that needed significantly more training steps to learn the complete task sequence.
Hardware Setup
- Robot: SO-100 robotic arm (6 DOF: 5 arm joints + 1 gripper)
- Cameras: Dual camera system (scene + wrist) at 640x480, 30fps
- GPU: RTX 4080 Super with 16GB VRAM
- Dataset: 20 episodes of pick-and-place demonstrations
- Task: “Pick up the striped block and put it into the white plate”
The Problem: Robot Twitching
Initial Symptoms
When deploying the trained GR00T model:
- Robot connected successfully
- Model inference server running correctly
- Robot made tiny oscillating movements around the same position
- Robot was not executing the intended pick-and-place task
The model had been trained for 2000 steps and showed good loss convergence, but the physical deployment was completely unsuccessful.
Debugging Approach
Step 1: Enhanced Logging Implementation
Added comprehensive logging to both the inference server and robot client to understand what data was being exchanged.
Server-Side Logging (service.py):
- Request counter for each inference call
- Input data keys and shapes
- Inference time in milliseconds
- Output action statistics (min/max/mean values)
Client-Side Logging (eval_lerobot.py):
- Step counter and observation keys
- Current robot state (all 6 joints)
- Received action chunks from server
- First action being sent to robot
Example Output:
[Request #1] Endpoint: get_action Inference time: 75.23ms Response keys: ['action.single_arm', 'action.gripper'] action.single_arm: shape=(16, 5), min=-45.23, max=67.89, mean=12.34 action.gripper: shape=(16, 1), min=-0.30, max=0.50, mean=0.15 [CLIENT] First action to send to robot: shoulder_pan.pos: -12.34
Step 2: Diagnostic Tools Development
Created several diagnostic scripts to isolate the issue:
Joint Testing Tool (test_joint.py):
- Tests individual joint control to verify hardware functionality
- Takes joint number (1-6) and value (-100 to 100) as input
- Helps isolate hardware vs. software issues
Robot State Monitor (monitor_robot_state.py):
- Real-time monitoring of robot joint positions
- Verifies encoder readings match values sent to server
Step 3: Dataset Visualization
Uploaded the dataset to Hugging Face Hub and used Rerun visualization to inspect the recorded episodes:
# Upload dataset for analysis python scripts/so100_groot/upload_to_huggingface.py \ --local-dir ~/.cache/huggingface/lerobot/rubbotix/striped-block \ --repo-id sparkmt/so100-striped-block # Visualize episodes ./scripts/so100_groot/visualize_episodes.sh 0
This revealed the difference between State (robot’s actual position) and Action (commanded target position), which was crucial for diagnosis.
Critical Discovery: The Root Cause
Key Finding from Logs
The robot was making very small, uncertain movements instead of decisive actions. The logging revealed that the model was outputting actions with very small magnitudes, indicating high uncertainty.
The Root Cause: Undertrained Model
Analysis revealed that the model was severely undertrained at 2000 steps.
Evidence:
- Tiny action magnitudes: Model outputting very small actions due to high uncertainty
- Lack of task structure understanding: Model hadn’t learned the full sequence (approach → grasp → lift → move → release)
- Closed-loop instability: Small errors accumulating, causing the robot to end up in states the model never saw during training
The Solution: Extended Training
Training Requirements Analysis
| Task Complexity | Minimum Steps | Recommended Steps |
|---|---|---|
| Simple reaching | 1,000-2,000 | 5,000 |
| Pick and place | 5,000-10,000 | 10,000-20,000 |
| Complex manipulation | 10,000-20,000 | 20,000-50,000 |
The pick-and-place task required 10,000-20,000 steps, not the 2000 steps initially used.
Training Configuration Update
Updated the training script to resume from checkpoint-2000 and continue to 10,000 steps:
# Resume training configuration RESUME_TRAINING="true" MAX_STEPS=10000 # Increased from 2000 BATCH_SIZE=16 GRADIENT_ACCUMULATION_STEPS=8 LORA_RANK=32 LORA_ALPHA=64
Automatic Checkpoint Detection:
if [ "$RESUME_TRAINING" = "true" ]; then if ls "$OUTPUT_DIR"/checkpoint-* 1> /dev/null 2>&1; then LATEST_CHECKPOINT=$(ls -td "$OUTPUT_DIR"/checkpoint-* | head -1) echo "Resuming from latest checkpoint: ${LATEST_CHECKPOINT}" TRAIN_CMD="$TRAIN_CMD --resume" fi
fi
Why 2000 Steps Was Insufficient
1. Model Hadn’t Learned Task Structure
- At 2000 steps: Learning basic correlations between observations and actions
- Missing: Full sequence understanding of the manipulation task
2. Action Magnitude Learning
From deployment logs at 2000 steps, the model was outputting very small actions because:
- Hadn’t learned correct action scale
- Being overly cautious due to high uncertainty
- Loss function hadn’t fully converged
3. Closed-Loop Instability
- Small errors accumulate: Undertrained model makes uncertain movements
- Compounding problem: Robot ends up in states model never saw during training
- Result: Model gets “confused” and twitches in place
Technical Implementation Details
Enhanced Logging Code
Server-side logging addition:
logger.info(f"[Request #{request_counter}] Endpoint: {endpoint}")
logger.info(f" Data keys: {list(data.keys())}")
logger.info(f" Inference time: {inference_time:.2f}ms")
for key, value in response.items(): if isinstance(value, np.ndarray): logger.info(f" {key}: shape={value.shape}, min={value.min():.2f}, max={value.max():.2f}, mean={value.mean():.2f}")
Client-side logging addition:
logger.info(f"[STEP {step_count}] Getting observation...")
logger.info(f" Current robot state:")
for key, value in current_state.items(): logger.info(f" {key}: {value:.2f}")
logger.info(f"[CLIENT] First action to send to robot:")
for key, value in first_action.items(): logger.info(f" {key}: {value:.2f}")
Dataset Upload and Visualization
Created tools for dataset management and analysis:
# Upload script for Hugging Face Hub
def upload_dataset(local_dir, repo_id): # Validate dataset structure required_files = ['meta/info.json', 'meta/stats.json', 'meta/tasks.parquet'] for file in required_files: if not os.path.exists(os.path.join(local_dir, file)): raise FileNotFoundError(f"Required file {file} not found") # Create repository and upload api = HfApi() api.create_repo(repo_id, repo_type="dataset", exist_ok=True) api.upload_folder(folder_path=local_dir, repo_id=repo_id, repo_type="dataset")
Results and Validation
Training Progress
After resuming training from 2000 to 10,000 steps:
- Significant MSE improvement: From ~24 at 2,000 steps to ~6.3 at 10,000 steps
- Loss continued to decrease: Model learned more complex patterns
- Action magnitudes increased: Actions became more decisive
- Task structure emerged: Model learned the complete manipulation sequence
Deployment Results
With the extended training at 10,000 steps:
- Task execution achieved: Robot now performs the complete sequence (approach → open → grasp → lift → move → release)
- Mixed joint performance: Some joints (1, 2, 3, and 5) showed accurate predictions matching ground truth, while others (joints 0 and 4) had less precise control
- Execution challenges: Task completion takes 3+ minutes with multiple retries due to shaky movements
- No more twitching: Robot executes purposeful movements instead of oscillating in place
Performance Assessment
The model demonstrates partial success:
- ✅ Complete task sequence understanding
- ✅ Elimination of twitching behavior
- ⚠️ Uneven accuracy across different joints
- ⚠️ Execution speed and precision need improvement
- ⏳ Further iteration required for reliable performance
Technical Insights
1. Training Duration is Critical
- 2000 steps = memorized patterns (MSE ~24, twitching behavior)
- 10,000 steps = learned task structure (MSE ~6.3, complete sequence execution)
- Manipulation tasks require significantly more training than simple reaching
- Even at 10,000 steps, performance varies across joints, suggesting more training may be beneficial
2. Logging is Essential for Debugging
- Without detailed logs, impossible to diagnose model-robot mismatch
- Action statistics (min/max/mean) reveal model confidence levels
- State vs. action comparison shows tracking performance
3. Visualization Tools are Invaluable
- Dataset visualization revealed data quality and action ranges
- State vs. Action plots diagnosed tracking issues
- Essential for understanding model behavior
Current Status
- Extended training completed (2000 → 10,000 steps)
- MSE improved from ~24 to ~6.3 (74% improvement)
- Robot deployment shows partial success with complete task sequence execution
- Performance varies across joints with some showing accurate control while others need improvement
- Comprehensive debugging infrastructure in place
- Dataset published to Hugging Face Hub: sparkmt/so100-striped-block
Summary
This debugging session demonstrated that what appeared to be a complex hardware or software integration issue was actually a fundamental training problem. The “twitching” behavior was caused by an undertrained model that hadn’t learned the complete task structure.
The systematic debugging approach using enhanced logging, diagnostic tools, and dataset visualization was crucial for identifying the root cause. The solution required extending training from 2000 to 10,000 steps, resulting in a 74% improvement in MSE (from ~24 to ~6.3) and enabling the robot to execute the complete pick-and-place sequence.
While the model now performs the full task (approach → open → grasp → lift → move → release), execution remains slow and imprecise, with uneven performance across different joints. This suggests that further data collection and training iterations will be needed to achieve reliable, smooth manipulation.
The project demonstrates the iterative nature of robotic AI development and the importance of adequate training duration for manipulation tasks. The debugging infrastructure and systematic approach provide a foundation for continued improvement.
Next: Collecting additional training episodes and exploring Isaac Sim integration for synthetic data generation.
Model: NVIDIA GR00T N1.5 (3B parameters)
Training Method: LoRA fine-tuning (extended to 10,000 steps)
Hardware: SO-100 Robot Arm, RTX 4080 Super
Framework: Isaac-GR00T + LeRobot
Vipin M
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.