Skip to main content

Data engineering copilot for robot imitation learning datasets

Project description

ORBIT — Know If Your Robot Data Will Train Before You Burn GPU Hours

You collected 200 demonstrations. You trained for 12 hours. The robot doesn't move. Was it the data? The hyperparameters? A dead servo you didn't notice?

ORBIT tells you in 10 seconds.

pip install orbit-robotics
orbit analyze lerobot/pusht
ORBIT Analysis: lerobot/pusht
206 episodes · DIFFUSION_POLICY

  Grade: A (98/100) — Ready to train — expect strong results
  Similar datasets trained at: 63%, 84%, 91%, 95%, 100% (5 nearest matches)

  1 issue found:
    ! Jerk varies across episodes (CV=0.57) — some demos are much jerkier

  Run with --detail for full diagnostics

No GPU. No API keys. No setup. Just answers.


The Problem

Every robotics lab has hit this: you collect demos, start training, wait hours — and the policy fails. You don't know if you need more data, better data, or different hyperparameters. There's no tool that checks your data quality before training.

ORBIT is that tool. It catches dead joints, contradictory demonstrations, inconsistent episodes, poor coverage, and clipping — the silent killers of robot policy training. Grades are calibrated against 82 real training runs with known success rates, so when ORBIT says "A", it means datasets like yours actually worked.

What You Get

Quality Grade (A through F)

A single, calibrated score. Not a vague "looks okay" — a grade backed by 82 ground-truth training outcomes from published results across ACT, Diffusion Policy, BC, and more.

  • A (85+) — Ready to train. Datasets like yours succeed.
  • B (72-84) — Good data with minor issues. Should train well.
  • C (58-71) — Has problems. Clean your data first or expect poor results.
  • D (40-57) — Significant issues. Collect more or better demonstrations.
  • F (below 40) — Critical problems. Don't waste compute on this.

12 Diagnostic Checks

Every analysis runs these automatically:

Check What it catches
Dead joint detection Servos that never move — wastes model capacity, masks hardware failures
Action divergence Same state, different actions — directly confuses the policy
Joint clipping Joints hitting mechanical limits in >10% of frames
Episode consistency Wild variation in demo length, speed, or strategy
Outlier episodes Demos that are statistically different from the rest
Workspace coverage Whether demos actually cover the task space
Temporal alignment State-action lag that causes the policy to learn the wrong timing
Directional bias Joints that only move one way (ignoring grippers, where this is normal)
Smoothness analysis Jerk and curvature variation across episodes
Policy fit scoring How well your data matches ACT vs Diffusion Policy vs BC vs SmolVLA
Episode count validation Whether you have enough demos for your chosen policy
Scaling advice How many more episodes you actually need

Benchmark Comparison

ORBIT knows what worked. It compares your dataset against 82 validated training runs and shows you the nearest matches:

Similar datasets trained at: 63%, 84%, 91%, 95%, 100% (5 nearest matches)
Closest match: Push-T (state) — 206 episodes, 91% success (diffusion_policy)

Ready-to-Run Training Commands

Don't guess hyperparameters. ORBIT picks the best policy for your data and gives you a command you can copy-paste:

orbit suggest lerobot/my-dataset
Recommended: Diffusion Policy (fit: 0.90)

Copy and run:
lerobot-train \
  --dataset.repo_id=lerobot/my-dataset \
  --policy.type=diffusion_policy \
  --batch_size=32 \
  --steps=500000 \
  ...

Training tips:
  - Loss should drop below 0.1 by step 100000
  - If loss plateaus above 0.2: your demonstrations may be too inconsistent

Automatic Cleaning

Bad episodes drag your whole training down. ORBIT finds them and removes them:

orbit clean lerobot/my-dataset              # Remove bad episodes
orbit fix lerobot/my-dataset                # Analyze + clean + suggest, one shot

CI/CD Quality Gate

Block bad data from entering your training pipeline:

orbit gate lerobot/my-dataset --policy act --min-grade B
# Exit code 0 = pass, 1 = fail

Full Command Reference

Core Workflow

Command What it does
orbit analyze <dataset> Full quality analysis — grade, diagnostics, recommendations
orbit suggest <dataset> Best policy + ready-to-run training command with tuned hyperparameters
orbit clean <dataset> Find and remove bad episodes automatically
orbit fix <dataset> Analyze, clean, and suggest — one command does it all
orbit compare <a> <b> Side-by-side comparison of two datasets

Discovery & Benchmarking

Command What it does
orbit explore Browse and discover LeRobot datasets on HuggingFace
orbit benchmark <task> Search 82 validated results by task, policy, or hardware
orbit leaderboard Quality leaderboard of scored robotics datasets

Training Pipeline

Command What it does
orbit gate <dataset> CI/CD quality gate — pass/fail with exit codes
orbit train <dataset> Full pipeline: gate, train, evaluate
orbit monitor Watch a training run in real-time
orbit debug Diagnose a failed or underperforming training run
orbit verify Compare training outcome against predicted quality
orbit report Save training results locally and to dashboard

Data Engineering

Command What it does
orbit curate <dataset> Select the best episodes from a dataset
orbit improve <dataset> Clean bad episodes and prove the score went up
orbit convert <dataset> Convert from any format to LeRobot v3
orbit plan Plan a data collection strategy
orbit coach Real-time guidance during data collection
orbit badge <dataset> Generate a shields.io quality badge for your dataset card

Setup & Config

Command What it does
orbit doctor Check environment health — Python, deps, AI providers
orbit setup-ai Check and configure AI providers (Ollama, Gemini, OpenAI)
orbit quickstart Get started in 30 seconds
orbit init Scaffold a training project with Makefile and config
orbit assist Interactive AI troubleshooter — ask questions about your data

Analyze Options

# Basics
orbit analyze lerobot/my-dataset                    # Quick analysis (samples 50 episodes)
orbit analyze lerobot/my-dataset --full              # Analyze every episode
orbit analyze lerobot/my-dataset --episodes 100      # Analyze exactly 100 episodes
orbit analyze lerobot/my-dataset --detail            # Full diagnostic report with all sections
orbit -q analyze lerobot/my-dataset                  # Quiet: just "A (98/100)" — for scripts

# Policy-specific
orbit analyze lerobot/my-dataset --policy act        # Check fit for ACT
orbit analyze lerobot/my-dataset --policy diffusion_policy
orbit analyze lerobot/my-dataset --policy smolvla

# AI-powered (optional — works without, better with)
orbit analyze lerobot/my-dataset --deep              # LLM diagnosis with specific fix instructions
orbit analyze lerobot/my-dataset --ai                # AI second opinion on A grades
orbit analyze lerobot/my-dataset --vlm               # Vision-language model assessment of frames
orbit analyze lerobot/my-dataset --proxy             # Train a quick BC model as ground truth signal

# Output
orbit analyze lerobot/my-dataset --json              # Machine-readable JSON for pipelines
orbit analyze lerobot/my-dataset --json | jq '.readiness.grade'

# Local files
orbit analyze ./my-local-data/                       # Local LeRobot directory
orbit analyze ./data.hdf5 --format hdf5              # HDF5 file (RoboMimic, robosuite)
orbit analyze ./recording.bag --format rosbag         # ROS bag file
orbit analyze ./data/ --format rlds                   # RLDS TFRecord dataset

Supported Data Formats

Format How to use Common sources
LeRobot (HuggingFace) orbit analyze lerobot/pusht Any dataset on HuggingFace Hub
LeRobot (local) orbit analyze ./my-data/ Local LeRobot recordings
HDF5 orbit analyze data.hdf5 --format hdf5 RoboMimic, robosuite, custom
RLDS orbit analyze ./data/ --format rlds TFRecord datasets (needs pip install orbit-robotics[rlds])
ROS bags orbit analyze rec.bag --format rosbag .bag and .mcap files (needs pip install orbit-robotics[rosbag])

ORBIT auto-detects the format. Use --format to override if needed.


AI Features (Optional)

The core analysis — grading, diagnostics, all 12 checks — works fully without any AI or API keys. AI adds deeper natural-language diagnosis on top.

Local AI with Ollama (Free, Private)

# One-time setup
curl -fsSL https://ollama.com/install.sh | sh
ollama pull gemma4

# Now these just work — ORBIT auto-detects Ollama
orbit analyze lerobot/my-dataset --deep     # "Your action divergence is high because..."
orbit assist                                # "Why is my policy not learning?" → specific answers

Your data never leaves your machine. No API keys. No cost.

Cloud AI (Gemini / OpenAI)

export GOOGLE_API_KEY=your-key              # Gemini (~$0.001 per analysis)
# or
export OPENAI_API_KEY=your-key              # OpenAI (~$0.01 per analysis)

ORBIT auto-detects: Ollama (local) > Gemini > OpenAI. Check your setup:

orbit setup-ai

Override:

orbit auth set ai-provider ollama           # Force a specific provider
orbit auth set ai-model gemma4              # Force a specific model

How Grading Works

ORBIT doesn't guess. Every grade is calibrated against real outcomes.

We collected 82 training runs from published papers and community results — datasets where we know the actual success rate of the trained policy. ORBIT's grading formula was tuned to match: datasets that trained successfully get A's, datasets that failed get D's and F's.

When ORBIT shows "Similar datasets trained at: 63%, 84%, 91%", those are real results from real papers. When it says "Grade B — should train well", that's based on what actually happened to similar data.

What each grade means in practice

Grade What to expect What to do
A Your data is solid. Policies trained on similar datasets succeeded. Start training.
B Minor issues detected, but datasets like this usually train fine. Train, but review the flagged issues if results disappoint.
C Real problems found. Training might work but expect lower success rates. Run orbit clean first. Fix flagged issues. Then re-analyze.
D Significant quality problems. Training is likely to fail or underperform. Collect more data, fix hardware issues, or change your collection strategy.
F Critical failures — dead joints, extreme divergence, broken data. Don't train on this. Fix the root cause first.

Real-World Impact

We analyzed 55 popular datasets on HuggingFace. Findings:

  • Only 46% were ready to train (Grade A). 19% had significant problems.
  • 75% had action divergence — the #1 silent training killer.
  • stanford_kuka_multimodal has 3,000 episodes but scores Grade D (43/100) — 3 dead joints and 50% outlier episodes. More data doesn't mean better data.
  • 7 out of 15 community datasets failed to even load — 81,000+ downloads on broken data.

A 10-second orbit analyze would have caught every one of these issues.


Installation

pip install orbit-robotics

That's it. Everything works out of the box.

Optional extras

pip install orbit-robotics[vlm]       # Vision-language model features
pip install orbit-robotics[rlds]      # RLDS/TFRecord format support
pip install orbit-robotics[rosbag]    # ROS bag format support
pip install orbit-robotics[all]       # Everything

Requirements

  • Python 3.10+
  • No GPU needed
  • No API keys needed (AI features are optional enhancements)

What's New in v0.6.0

  • Local AI via Ollama--deep, --ai, and orbit assist now work locally with Gemma 4. No API keys, no cost, no data leaves your machine.
  • Multi-provider AI — auto-detects Ollama > Gemini > OpenAI. Configure with orbit auth set ai-provider.
  • orbit setup-ai — check and configure AI providers in one command.
  • Grading accuracy overhaul — task difficulty adjustment from ground truth, policy-specific divergence penalties (BC penalized more than Diffusion Policy), calibrated summaries that tell you when a task is inherently hard vs when your data is bad.
  • Nearest-match display — shows actual success rates from similar datasets instead of abstract confidence intervals.
  • Sampling warnings — tells you when you're analyzing too few episodes for a reliable grade.
  • Smarter orbit doctor — checks Ollama, Gemini, and OpenAI availability alongside all dependencies.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orbit_robotics-0.6.1.tar.gz (399.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

orbit_robotics-0.6.1-py3-none-any.whl (358.8 kB view details)

Uploaded Python 3

File details

Details for the file orbit_robotics-0.6.1.tar.gz.

File metadata

  • Download URL: orbit_robotics-0.6.1.tar.gz
  • Upload date:
  • Size: 399.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for orbit_robotics-0.6.1.tar.gz
Algorithm Hash digest
SHA256 1e21753d3ebec46abdf09aba9ea9d38c9320fe309f6362a0b9cff514227a0321
MD5 b3f31c13d6f26421ccb26bee05ab2faa
BLAKE2b-256 80225b8f02af9ede5282b933a6b39a8faeb78072ffabd74340d30cff8caecc23

See more details on using hashes here.

File details

Details for the file orbit_robotics-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: orbit_robotics-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 358.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for orbit_robotics-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f43c1bfc899bec7178208abb8a8415a348b4da64864f746068e9357a8a1c8df6
MD5 2ef39e8c6f09621d22f38d39c00c7158
BLAKE2b-256 cd23515a36f353e86c113f00822722b2af2de0785747dfba774f3c0f5a224a45

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page