Data engineering copilot for robot imitation learning datasets
Project description
ORBIT — Know If Your Robot Data Will Train Before You Burn GPU Hours
You collected 200 demonstrations. You trained for 12 hours. The robot doesn't move. Was it the data? The hyperparameters? A dead servo you didn't notice?
ORBIT tells you in 10 seconds.
pip install orbit-robotics
orbit analyze lerobot/pusht
ORBIT Analysis: lerobot/pusht
206 episodes · DIFFUSION_POLICY
Grade: A (98/100) — Ready to train — expect strong results
Similar datasets trained at: 63%, 84%, 91%, 95%, 100% (5 nearest matches)
1 issue found:
! Jerk varies across episodes (CV=0.57) — some demos are much jerkier
Run with --detail for full diagnostics
No GPU. No API keys. No setup. Just answers.
The Problem
Every robotics lab has hit this: you collect demos, start training, wait hours — and the policy fails. You don't know if you need more data, better data, or different hyperparameters. There's no tool that checks your data quality before training.
ORBIT is that tool. It catches dead joints, contradictory demonstrations, inconsistent episodes, poor coverage, and clipping — the silent killers of robot policy training. Grades are calibrated against 82 real training runs with known success rates, so when ORBIT says "A", it means datasets like yours actually worked.
What You Get
Quality Grade (A through F)
A single, calibrated score. Not a vague "looks okay" — a grade backed by 82 ground-truth training outcomes from published results across ACT, Diffusion Policy, BC, and more.
- A (85+) — Ready to train. Datasets like yours succeed.
- B (72-84) — Good data with minor issues. Should train well.
- C (58-71) — Has problems. Clean your data first or expect poor results.
- D (40-57) — Significant issues. Collect more or better demonstrations.
- F (below 40) — Critical problems. Don't waste compute on this.
12 Diagnostic Checks
Every analysis runs these automatically:
| Check | What it catches |
|---|---|
| Dead joint detection | Servos that never move — wastes model capacity, masks hardware failures |
| Action divergence | Same state, different actions — directly confuses the policy |
| Joint clipping | Joints hitting mechanical limits in >10% of frames |
| Episode consistency | Wild variation in demo length, speed, or strategy |
| Outlier episodes | Demos that are statistically different from the rest |
| Workspace coverage | Whether demos actually cover the task space |
| Temporal alignment | State-action lag that causes the policy to learn the wrong timing |
| Directional bias | Joints that only move one way (ignoring grippers, where this is normal) |
| Smoothness analysis | Jerk and curvature variation across episodes |
| Policy fit scoring | How well your data matches ACT vs Diffusion Policy vs BC vs SmolVLA |
| Episode count validation | Whether you have enough demos for your chosen policy |
| Scaling advice | How many more episodes you actually need |
Benchmark Comparison
ORBIT knows what worked. It compares your dataset against 82 validated training runs and shows you the nearest matches:
Similar datasets trained at: 63%, 84%, 91%, 95%, 100% (5 nearest matches)
Closest match: Push-T (state) — 206 episodes, 91% success (diffusion_policy)
Ready-to-Run Training Commands
Don't guess hyperparameters. ORBIT picks the best policy for your data and gives you a command you can copy-paste:
orbit suggest lerobot/my-dataset
Recommended: Diffusion Policy (fit: 0.90)
Copy and run:
lerobot-train \
--dataset.repo_id=lerobot/my-dataset \
--policy.type=diffusion_policy \
--batch_size=32 \
--steps=500000 \
...
Training tips:
- Loss should drop below 0.1 by step 100000
- If loss plateaus above 0.2: your demonstrations may be too inconsistent
Automatic Cleaning
Bad episodes drag your whole training down. ORBIT finds them and removes them:
orbit clean lerobot/my-dataset # Remove bad episodes
orbit fix lerobot/my-dataset # Analyze + clean + suggest, one shot
CI/CD Quality Gate
Block bad data from entering your training pipeline:
orbit gate lerobot/my-dataset --policy act --min-grade B
# Exit code 0 = pass, 1 = fail
Full Command Reference
Core Workflow
| Command | What it does |
|---|---|
orbit analyze <dataset> |
Full quality analysis — grade, diagnostics, recommendations |
orbit suggest <dataset> |
Best policy + ready-to-run training command with tuned hyperparameters |
orbit clean <dataset> |
Find and remove bad episodes automatically |
orbit fix <dataset> |
Analyze, clean, and suggest — one command does it all |
orbit compare <a> <b> |
Side-by-side comparison of two datasets |
Discovery & Benchmarking
| Command | What it does |
|---|---|
orbit explore |
Browse and discover LeRobot datasets on HuggingFace |
orbit benchmark <task> |
Search 82 validated results by task, policy, or hardware |
orbit leaderboard |
Quality leaderboard of scored robotics datasets |
Training Pipeline
| Command | What it does |
|---|---|
orbit gate <dataset> |
CI/CD quality gate — pass/fail with exit codes |
orbit train <dataset> |
Full pipeline: gate, train, evaluate |
orbit monitor |
Watch a training run in real-time |
orbit debug |
Diagnose a failed or underperforming training run |
orbit verify |
Compare training outcome against predicted quality |
orbit report |
Save training results locally and to dashboard |
Data Engineering
| Command | What it does |
|---|---|
orbit curate <dataset> |
Select the best episodes from a dataset |
orbit improve <dataset> |
Clean bad episodes and prove the score went up |
orbit convert <dataset> |
Convert from any format to LeRobot v3 |
orbit plan |
Plan a data collection strategy |
orbit coach |
Real-time guidance during data collection |
orbit badge <dataset> |
Generate a shields.io quality badge for your dataset card |
Setup & Config
| Command | What it does |
|---|---|
orbit doctor |
Check environment health — Python, deps, AI providers |
orbit setup-ai |
Check and configure AI providers (Ollama, Gemini, OpenAI) |
orbit quickstart |
Get started in 30 seconds |
orbit init |
Scaffold a training project with Makefile and config |
orbit assist |
Interactive AI troubleshooter — ask questions about your data |
Analyze Options
# Basics
orbit analyze lerobot/my-dataset # Quick analysis (samples 50 episodes)
orbit analyze lerobot/my-dataset --full # Analyze every episode
orbit analyze lerobot/my-dataset --episodes 100 # Analyze exactly 100 episodes
orbit analyze lerobot/my-dataset --detail # Full diagnostic report with all sections
orbit -q analyze lerobot/my-dataset # Quiet: just "A (98/100)" — for scripts
# Policy-specific
orbit analyze lerobot/my-dataset --policy act # Check fit for ACT
orbit analyze lerobot/my-dataset --policy diffusion_policy
orbit analyze lerobot/my-dataset --policy smolvla
# AI-powered (optional — works without, better with)
orbit analyze lerobot/my-dataset --deep # LLM diagnosis with specific fix instructions
orbit analyze lerobot/my-dataset --ai # AI second opinion on A grades
orbit analyze lerobot/my-dataset --vlm # Vision-language model assessment of frames
orbit analyze lerobot/my-dataset --proxy # Train a quick BC model as ground truth signal
# Output
orbit analyze lerobot/my-dataset --json # Machine-readable JSON for pipelines
orbit analyze lerobot/my-dataset --json | jq '.readiness.grade'
# Local files
orbit analyze ./my-local-data/ # Local LeRobot directory
orbit analyze ./data.hdf5 --format hdf5 # HDF5 file (RoboMimic, robosuite)
orbit analyze ./recording.bag --format rosbag # ROS bag file
orbit analyze ./data/ --format rlds # RLDS TFRecord dataset
Supported Data Formats
| Format | How to use | Common sources |
|---|---|---|
| LeRobot (HuggingFace) | orbit analyze lerobot/pusht |
Any dataset on HuggingFace Hub |
| LeRobot (local) | orbit analyze ./my-data/ |
Local LeRobot recordings |
| HDF5 | orbit analyze data.hdf5 --format hdf5 |
RoboMimic, robosuite, custom |
| RLDS | orbit analyze ./data/ --format rlds |
TFRecord datasets (needs pip install orbit-robotics[rlds]) |
| ROS bags | orbit analyze rec.bag --format rosbag |
.bag and .mcap files (needs pip install orbit-robotics[rosbag]) |
ORBIT auto-detects the format. Use --format to override if needed.
AI Features (Optional)
The core analysis — grading, diagnostics, all 12 checks — works fully without any AI or API keys. AI adds deeper natural-language diagnosis on top.
Local AI with Ollama (Free, Private)
# One-time setup
curl -fsSL https://ollama.com/install.sh | sh
ollama pull gemma4
# Now these just work — ORBIT auto-detects Ollama
orbit analyze lerobot/my-dataset --deep # "Your action divergence is high because..."
orbit assist # "Why is my policy not learning?" → specific answers
Your data never leaves your machine. No API keys. No cost.
Cloud AI (Gemini / OpenAI)
export GOOGLE_API_KEY=your-key # Gemini (~$0.001 per analysis)
# or
export OPENAI_API_KEY=your-key # OpenAI (~$0.01 per analysis)
ORBIT auto-detects: Ollama (local) > Gemini > OpenAI. Check your setup:
orbit setup-ai
Override:
orbit auth set ai-provider ollama # Force a specific provider
orbit auth set ai-model gemma4 # Force a specific model
How Grading Works
ORBIT doesn't guess. Every grade is calibrated against real outcomes.
We collected 82 training runs from published papers and community results — datasets where we know the actual success rate of the trained policy. ORBIT's grading formula was tuned to match: datasets that trained successfully get A's, datasets that failed get D's and F's.
When ORBIT shows "Similar datasets trained at: 63%, 84%, 91%", those are real results from real papers. When it says "Grade B — should train well", that's based on what actually happened to similar data.
What each grade means in practice
| Grade | What to expect | What to do |
|---|---|---|
| A | Your data is solid. Policies trained on similar datasets succeeded. | Start training. |
| B | Minor issues detected, but datasets like this usually train fine. | Train, but review the flagged issues if results disappoint. |
| C | Real problems found. Training might work but expect lower success rates. | Run orbit clean first. Fix flagged issues. Then re-analyze. |
| D | Significant quality problems. Training is likely to fail or underperform. | Collect more data, fix hardware issues, or change your collection strategy. |
| F | Critical failures — dead joints, extreme divergence, broken data. | Don't train on this. Fix the root cause first. |
Real-World Impact
We analyzed 55 popular datasets on HuggingFace. Findings:
- Only 46% were ready to train (Grade A). 19% had significant problems.
- 75% had action divergence — the #1 silent training killer.
stanford_kuka_multimodalhas 3,000 episodes but scores Grade D (43/100) — 3 dead joints and 50% outlier episodes. More data doesn't mean better data.- 7 out of 15 community datasets failed to even load — 81,000+ downloads on broken data.
A 10-second orbit analyze would have caught every one of these issues.
Installation
pip install orbit-robotics
That's it. Everything works out of the box.
Optional extras
pip install orbit-robotics[vlm] # Vision-language model features
pip install orbit-robotics[rlds] # RLDS/TFRecord format support
pip install orbit-robotics[rosbag] # ROS bag format support
pip install orbit-robotics[all] # Everything
Requirements
- Python 3.10+
- No GPU needed
- No API keys needed (AI features are optional enhancements)
What's New in v0.6.0
- Local AI via Ollama —
--deep,--ai, andorbit assistnow work locally with Gemma 4. No API keys, no cost, no data leaves your machine. - Multi-provider AI — auto-detects Ollama > Gemini > OpenAI. Configure with
orbit auth set ai-provider. orbit setup-ai— check and configure AI providers in one command.- Grading accuracy overhaul — task difficulty adjustment from ground truth, policy-specific divergence penalties (BC penalized more than Diffusion Policy), calibrated summaries that tell you when a task is inherently hard vs when your data is bad.
- Nearest-match display — shows actual success rates from similar datasets instead of abstract confidence intervals.
- Sampling warnings — tells you when you're analyzing too few episodes for a reliable grade.
- Smarter
orbit doctor— checks Ollama, Gemini, and OpenAI availability alongside all dependencies.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file orbit_robotics-0.6.1.tar.gz.
File metadata
- Download URL: orbit_robotics-0.6.1.tar.gz
- Upload date:
- Size: 399.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e21753d3ebec46abdf09aba9ea9d38c9320fe309f6362a0b9cff514227a0321
|
|
| MD5 |
b3f31c13d6f26421ccb26bee05ab2faa
|
|
| BLAKE2b-256 |
80225b8f02af9ede5282b933a6b39a8faeb78072ffabd74340d30cff8caecc23
|
File details
Details for the file orbit_robotics-0.6.1-py3-none-any.whl.
File metadata
- Download URL: orbit_robotics-0.6.1-py3-none-any.whl
- Upload date:
- Size: 358.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f43c1bfc899bec7178208abb8a8415a348b4da64864f746068e9357a8a1c8df6
|
|
| MD5 |
2ef39e8c6f09621d22f38d39c00c7158
|
|
| BLAKE2b-256 |
cd23515a36f353e86c113f00822722b2af2de0785747dfba774f3c0f5a224a45
|