Skip to main content

Data engineering copilot for robot imitation learning datasets

Project description

ORBIT — Know If Your Robot Data Will Train Before You Burn GPU Hours

You collected 200 demonstrations. You trained for 12 hours. The robot doesn't move. Was it the data? The hyperparameters? A dead servo you didn't notice?

ORBIT tells you in 10 seconds.

pip install orbit-robotics
orbit analyze lerobot/pusht
ORBIT Analysis: lerobot/pusht
206 episodes · DIFFUSION_POLICY

  Grade: A (98/100) — Ready to train — expect strong results
  Similar datasets trained at: 63%, 84%, 91%, 95%, 100% (5 nearest matches)

  1 issue found:
    ! Jerk varies across episodes (CV=0.57) — some demos are much jerkier

  Run with --detail for full diagnostics

No GPU. No API keys. No setup. Just answers.


The Problem

Every robotics lab has hit this: you collect demos, start training, wait hours — and the policy fails. You don't know if you need more data, better data, or different hyperparameters. There's no tool that checks your data quality before training.

ORBIT is that tool. It catches dead joints, contradictory demonstrations, inconsistent episodes, poor coverage, and clipping — the silent killers of robot policy training. Grades are calibrated against 82 real training runs with known success rates, so when ORBIT says "A", it means datasets like yours actually worked.

What You Get

Quality Grade (A through F)

A single, calibrated score. Not a vague "looks okay" — a grade backed by 82 ground-truth training outcomes from published results across ACT, Diffusion Policy, BC, and more.

  • A (85+) — Ready to train. Datasets like yours succeed.
  • B (72-84) — Good data with minor issues. Should train well.
  • C (58-71) — Has problems. Clean your data first or expect poor results.
  • D (40-57) — Significant issues. Collect more or better demonstrations.
  • F (below 40) — Critical problems. Don't waste compute on this.

12 Diagnostic Checks

Every analysis runs these automatically:

Check What it catches
Dead joint detection Servos that never move — wastes model capacity, masks hardware failures
Action divergence Same state, different actions — directly confuses the policy
Joint clipping Joints hitting mechanical limits in >10% of frames
Episode consistency Wild variation in demo length, speed, or strategy
Outlier episodes Demos that are statistically different from the rest
Workspace coverage Whether demos actually cover the task space
Temporal alignment State-action lag that causes the policy to learn the wrong timing
Directional bias Joints that only move one way (ignoring grippers, where this is normal)
Smoothness analysis Jerk and curvature variation across episodes
Policy fit scoring How well your data matches ACT vs Diffusion Policy vs BC vs SmolVLA
Episode count validation Whether you have enough demos for your chosen policy
Scaling advice How many more episodes you actually need

Benchmark Comparison

ORBIT knows what worked. It compares your dataset against 82 validated training runs and shows you the nearest matches:

Similar datasets trained at: 63%, 84%, 91%, 95%, 100% (5 nearest matches)
Closest match: Push-T (state) — 206 episodes, 91% success (diffusion_policy)

Ready-to-Run Training Commands

Don't guess hyperparameters. ORBIT picks the best policy for your data and gives you a command you can copy-paste:

orbit suggest lerobot/my-dataset
Recommended: Diffusion Policy (fit: 0.90)

Copy and run:
lerobot-train \
  --dataset.repo_id=lerobot/my-dataset \
  --policy.type=diffusion_policy \
  --batch_size=32 \
  --steps=500000 \
  ...

Training tips:
  - Loss should drop below 0.1 by step 100000
  - If loss plateaus above 0.2: your demonstrations may be too inconsistent

Automatic Cleaning

Bad episodes drag your whole training down. ORBIT finds them and removes them:

orbit clean lerobot/my-dataset              # Remove bad episodes
orbit fix lerobot/my-dataset                # Analyze + clean + suggest, one shot

CI/CD Quality Gate

Block bad data from entering your training pipeline:

orbit gate lerobot/my-dataset --policy act --min-grade B
# Exit code 0 = pass, 1 = fail

Full Command Reference

Core Workflow

Command What it does
orbit analyze <dataset> Full quality analysis — grade, diagnostics, recommendations
orbit suggest <dataset> Best policy + ready-to-run training command with tuned hyperparameters
orbit clean <dataset> Find and remove bad episodes automatically
orbit fix <dataset> Analyze, clean, and suggest — one command does it all
orbit compare <a> <b> Side-by-side comparison of two datasets

Discovery & Benchmarking

Command What it does
orbit explore Browse and discover LeRobot datasets on HuggingFace
orbit benchmark <task> Search 82 validated results by task, policy, or hardware
orbit leaderboard Quality leaderboard of 575 scored robotics datasets

Training Pipeline

Command What it does
orbit gate <dataset> CI/CD quality gate — pass/fail with exit codes
orbit train <dataset> Full pipeline: gate, train, evaluate
orbit monitor Watch a training run in real-time
orbit debug Diagnose a failed or underperforming training run
orbit verify Compare training outcome against predicted quality
orbit report Save training results locally and to dashboard

Data Engineering

Command What it does
orbit curate <dataset> Select the best episodes from a dataset
orbit improve <dataset> Clean bad episodes and prove the score went up
orbit convert <dataset> Convert from any format to LeRobot v3
orbit plan Plan a data collection strategy
orbit coach Real-time guidance during data collection
orbit badge <dataset> Generate a shields.io quality badge for your dataset card

ROS 2 & Live Collection

Command What it does
orbit ros-doctor Pre-collection ROS 2 hardware diagnostics — joints, cameras, URDF, DDS
orbit watch Real-time quality monitor with Foxglove dashboard during collection
orbit coach <dir> Live feedback as new episodes are added to a dataset directory

Setup & Config

Command What it does
orbit doctor Check environment health — Python, deps, AI providers
orbit setup-ai Check and configure AI providers (Ollama, Gemini, OpenAI)
orbit quickstart Get started in 30 seconds
orbit init Scaffold a training project with Makefile and config
orbit assist Interactive AI troubleshooter — ask questions about your data

Analyze Options

# Basics
orbit analyze lerobot/my-dataset                    # Quick analysis (samples 50 episodes)
orbit analyze lerobot/my-dataset --full              # Analyze every episode
orbit analyze lerobot/my-dataset --episodes 100      # Analyze exactly 100 episodes
orbit analyze lerobot/my-dataset --detail            # Full diagnostic report with all sections
orbit -q analyze lerobot/my-dataset                  # Quiet: just "A (98/100)" — for scripts

# Policy-specific
orbit analyze lerobot/my-dataset --policy act        # Check fit for ACT
orbit analyze lerobot/my-dataset --policy diffusion_policy
orbit analyze lerobot/my-dataset --policy smolvla

# AI-powered (optional — works without, better with)
orbit analyze lerobot/my-dataset --deep              # LLM diagnosis with specific fix instructions
orbit analyze lerobot/my-dataset --ai                # AI second opinion on A grades
orbit analyze lerobot/my-dataset --vlm               # Vision-language model assessment of frames
orbit analyze lerobot/my-dataset --proxy             # Train a quick BC model as ground truth signal

# Output
orbit analyze lerobot/my-dataset --json              # Machine-readable JSON for pipelines
orbit analyze lerobot/my-dataset --json | jq '.readiness.grade'

# Local files
orbit analyze ./my-local-data/                       # Local LeRobot directory
orbit analyze ./data.hdf5 --format hdf5              # HDF5 file (RoboMimic, robosuite)
orbit analyze ./recording.bag --format rosbag         # ROS bag file
orbit analyze ./data/ --format rlds                   # RLDS TFRecord dataset

Supported Data Formats

Format How to use Common sources
LeRobot (HuggingFace) orbit analyze lerobot/pusht Any dataset on HuggingFace Hub
LeRobot (local) orbit analyze ./my-data/ Local LeRobot recordings
HDF5 orbit analyze data.hdf5 --format hdf5 RoboMimic, robosuite, custom
RLDS orbit analyze ./data/ --format rlds TFRecord datasets (needs pip install orbit-robotics[rlds])
ROS bags orbit analyze rec.bag --format rosbag .bag and .mcap files (needs pip install orbit-robotics[rosbag])

ORBIT auto-detects the format. Use --format to override if needed.


AI Features (Optional)

The core analysis — grading, diagnostics, all 12 checks — works fully without any AI or API keys. AI adds deeper natural-language diagnosis on top.

Local AI with Ollama (Free, Private)

# One-time setup
curl -fsSL https://ollama.com/install.sh | sh
ollama pull gemma4

# Now these just work — ORBIT auto-detects Ollama
orbit analyze lerobot/my-dataset --deep     # "Your action divergence is high because..."
orbit assist                                # "Why is my policy not learning?" → specific answers

Your data never leaves your machine. No API keys. No cost.

Cloud AI (Gemini / OpenAI)

export GOOGLE_API_KEY=your-key              # Gemini (~$0.001 per analysis)
# or
export OPENAI_API_KEY=your-key              # OpenAI (~$0.01 per analysis)

ORBIT auto-detects: Ollama (local) > Gemini > OpenAI. Check your setup:

orbit setup-ai

Override:

orbit auth set ai-provider ollama           # Force a specific provider
orbit auth set ai-model gemma4              # Force a specific model

How Grading Works

ORBIT doesn't guess. Every grade is calibrated against real outcomes.

We collected 82 training runs from published papers and community results — datasets where we know the actual success rate of the trained policy. ORBIT's grading formula was tuned to match: datasets that trained successfully get A's, datasets that failed get D's and F's.

When ORBIT shows "Similar datasets trained at: 63%, 84%, 91%", those are real results from real papers. When it says "Grade B — should train well", that's based on what actually happened to similar data.

What each grade means in practice

Grade What to expect What to do
A Your data is solid. Policies trained on similar datasets succeeded. Start training.
B Minor issues detected, but datasets like this usually train fine. Train, but review the flagged issues if results disappoint.
C Real problems found. Training might work but expect lower success rates. Run orbit clean first. Fix flagged issues. Then re-analyze.
D Significant quality problems. Training is likely to fail or underperform. Collect more data, fix hardware issues, or change your collection strategy.
F Critical failures — dead joints, extreme divergence, broken data. Don't train on this. Fix the root cause first.

Real-World Impact

We analyzed 55 popular datasets on HuggingFace. Findings:

  • Only 46% were ready to train (Grade A). 19% had significant problems.
  • 75% had action divergence — the #1 silent training killer.
  • stanford_kuka_multimodal has 3,000 episodes but scores Grade D (43/100) — 3 dead joints and 50% outlier episodes. More data doesn't mean better data.
  • 7 out of 15 community datasets failed to even load — 81,000+ downloads on broken data.

A 10-second orbit analyze would have caught every one of these issues.


Installation

pip install orbit-robotics

That's it. Everything works out of the box.

Optional extras

pip install orbit-robotics[vlm]       # Vision-language model features
pip install orbit-robotics[rlds]      # RLDS/TFRecord format support
pip install orbit-robotics[rosbag]    # ROS bag format support
pip install orbit-robotics[all]       # Everything

Requirements

  • Python 3.10+
  • No GPU needed
  • No API keys needed (AI features are optional enhancements)

What's New in v0.7.1

  • orbit ros-doctor — Pre-collection hardware diagnostics for ROS 2 robots. Checks joints, cameras, URDF, DDS/QoS, TF tree, and optionally records a test bag. Every issue includes a specific fix. Works via rclpy or subprocess fallback; gracefully reports when ROS 2 is not available.
  • orbit watch — Real-time data quality monitor with Foxglove dashboard. Monitors demonstrations during collection and gives per-episode quality feedback. Supports simulation mode (--sim), live ROS 2, AI-powered smart setup (--smart), and audio/speech feedback.
  • orbit coach — Real-time data collection coach. Watches a dataset directory as new episodes are added and gives live feedback on consistency, smoothness, and quality.
  • Ollama backend for orbit assist — The interactive AI assistant now supports Ollama as a backend alongside Gemini.
  • orbit convert — Convert datasets from ROS bags (.bag/.mcap), HDF5, RLDS, or local LeRobot directories to LeRobot v3 format.
  • 575-entry quality leaderboardorbit leaderboard now ranks 575 scored community datasets, up from the benchmark-only view.
  • Input validation--episodes 0 and --episodes -1 now give clear errors instead of silently running.
  • Python 3.11 compatibility fix — Fixed f-string syntax error in orbit debug/orbit verify that crashed on Python < 3.12.

Previous: v0.6.0

  • Local AI via Ollama — --deep, --ai, and orbit assist work locally with Gemma 4.
  • Multi-provider AI — auto-detects Ollama > Gemini > OpenAI.
  • orbit setup-ai — check and configure AI providers.
  • Grading accuracy overhaul — task difficulty adjustment, policy-specific divergence penalties, calibrated summaries.
  • Nearest-match display, sampling warnings, smarter orbit doctor.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orbit_robotics-0.7.1.tar.gz (439.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

orbit_robotics-0.7.1-py3-none-any.whl (405.4 kB view details)

Uploaded Python 3

File details

Details for the file orbit_robotics-0.7.1.tar.gz.

File metadata

  • Download URL: orbit_robotics-0.7.1.tar.gz
  • Upload date:
  • Size: 439.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for orbit_robotics-0.7.1.tar.gz
Algorithm Hash digest
SHA256 94322e4e58c1b89520a3a1f3d62a1fa80e390ee48676952eefe3c14a28a195e3
MD5 713ad1a8fc310cd60a93b946aa40f2c4
BLAKE2b-256 330f4b88a9224c2c8af92ab2b65432cff9b4c18fb5ca1fd18ee6dbbb9ce2ab1d

See more details on using hashes here.

File details

Details for the file orbit_robotics-0.7.1-py3-none-any.whl.

File metadata

  • Download URL: orbit_robotics-0.7.1-py3-none-any.whl
  • Upload date:
  • Size: 405.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for orbit_robotics-0.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d413388eb842ef6f54ccb94124f52c0827859ef1635a57325d16a37bcfbc67c0
MD5 c15792d3cb27eb9faf7a5249283236fc
BLAKE2b-256 4abe07a79b515b9ae5cdf521a5686cd443d5c5d939983dbc23f6d2d40421ed6c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page