Skip to main content

Data engineering copilot for robot imitation learning datasets

Project description

ORBIT

Data quality tool for robot imitation learning. Analyzes your demonstration datasets, finds problems, and tells you what to fix before you waste hours training a policy that won't work.

Who is this for? Anyone training robot policies with imitation learning — whether you're using LeRobot, RoboMimic, or your own pipeline. If you've collected teleoperation demonstrations and want to know if they're good enough to train on, ORBIT tells you.

Install

pip install orbit-robotics

Requires Python 3.10+. No GPU needed for core analysis.

# Optional extras
pip install orbit-robotics[vision]   # SigLIP embedding analysis (needs torch)
pip install orbit-robotics[vlm]      # Gemini VLM task analysis
pip install orbit-robotics[hdf5]     # HDF5 dataset support
pip install orbit-robotics[rosbag]   # ROS bag support
pip install orbit-robotics[all]      # Everything

Verify your install:

orbit doctor

Setting Up API Keys

Some features use Google's Gemini API. These are optional — core analysis works without any API keys.

# Get a free API key from https://aistudio.google.com/apikey
export GOOGLE_API_KEY="your-key-here"

# Add to your shell profile to persist:
echo 'export GOOGLE_API_KEY="your-key-here"' >> ~/.zshrc   # macOS
echo 'export GOOGLE_API_KEY="your-key-here"' >> ~/.bashrc  # Linux

What uses GOOGLE_API_KEY:

Feature Flag Cost What it does
AI quality judge --ai ~$0.001/run Verifies A grades with Gemini
Deep analysis --deep ~$0.01/run AI-powered root cause analysis
VLM assessment --vlm ~$0.01/run Vision-language task understanding
AI assistant orbit assist varies Interactive AI copilot

Quick Start

Try ORBIT in 30 seconds — no setup, no data, no API keys:

orbit quickstart

This downloads a public reference dataset and runs a full analysis so you can see what ORBIT does. Then point it at your own data:

orbit quickstart lerobot/your-dataset

The Workflow

1. Analyze your dataset

Point ORBIT at any dataset on HuggingFace Hub, a local directory, or a file:

orbit analyze lerobot/xarm_lift_medium
Dataset Readiness: C (score: 65/100)
Usable but has problems — run orbit clean first

  ✓ High consistency (1.00)
  ✓ Sufficient episodes (800) for diffusion_policy
  ✓ Good policy fit (1.00)
  ✗ 4 joints clipping (>10% of frames)
  ✗ High action divergence (0.46) — demos contradict each other

YOUR DATA AT A GLANCE
  Episodes:       800     (top 25%)
  Coverage:       0.84  ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░
  Signal Health:  0.00  ░░░░░░░░░░░░░░░░░░░░░░░░░

Common options:

orbit analyze lerobot/my-dataset --policy act          # Check fit for specific policy
orbit analyze lerobot/my-dataset --deep                # AI-powered deep analysis (needs GOOGLE_API_KEY)
orbit analyze lerobot/my-dataset --ai                  # AI verification of grades (needs GOOGLE_API_KEY)
orbit analyze lerobot/my-dataset --vlm                 # VLM vision assessment (needs GOOGLE_API_KEY)
orbit analyze lerobot/my-dataset --proxy               # Proxy training go/no-go (needs torch, ~2 min)
orbit analyze lerobot/my-dataset --json                # Machine-readable output
orbit analyze lerobot/my-dataset --full                # All episodes (no sampling)
orbit analyze lerobot/my-dataset --episodes 100        # Limit to 100 episodes
orbit analyze ./local-data/ --format hdf5              # Local HDF5 files
orbit analyze ./recording.mcap --format rosbag         # ROS bag files

2. Clean bad episodes

orbit clean lerobot/my-dataset --dry-run    # Preview what would be removed
orbit clean lerobot/my-dataset              # Actually remove bad episodes

Identifies and removes bad episodes — aborted demos, dead servos, outliers. Outputs a cleaned dataset you can train on.

orbit clean lerobot/my-dataset -o user/my-clean-dataset   # Save to new repo
orbit clean lerobot/my-dataset --aggressive               # Also remove borderline episodes
orbit clean lerobot/my-dataset -p act                     # Policy-aware cleaning

3. Get a training command

orbit suggest lerobot/my-dataset

Recommends the best policy for your data and generates a ready-to-run training command with tuned hyperparameters (batch size, learning rate, horizon, steps).

orbit suggest lerobot/my-dataset --policy act              # Force specific policy
orbit suggest lerobot/my-dataset --gpu-memory 16           # Optimize for your GPU
orbit suggest lerobot/my-dataset --framework openvla       # Use OpenVLA instead of LeRobot
orbit suggest lerobot/my-dataset --framework all           # Show all framework options

Supported frameworks: lerobot, openvla, openpi, groot, osmo, custom.

4. Or do it all in one command

orbit fix lerobot/my-dataset

Runs the full pipeline — analyze, clean, suggest — in one shot.

orbit fix lerobot/my-dataset --dry-run                     # Preview changes first
orbit fix lerobot/my-dataset -p act -o user/clean-data     # Policy + output path

Understanding Grades

ORBIT grades your dataset A through F based on detected problems:

Grade Score Meaning
A 85-100 Ready to train — expect strong results
B 72-84 Good data — minor issues, should train well
C 58-71 Usable but has problems — clean first
D 40-57 Significant issues — collect more or better data
F 0-39 Critical problems — fix before training

Grades are calibrated against 27 real datasets with known training outcomes. An A means the data actually trained successfully; an F means it didn't.

What ORBIT Checks

  • Dead servos — joints that stopped moving during collection, wasting model capacity on zero outputs
  • Aborted/corrupted episodes — too short, too long, or no meaningful motion
  • Clipping joints — hitting position limits, creating discontinuous action targets
  • Inconsistent demonstrations — doing different things in similar states, confusing the policy
  • Wrong policy for your data — ACT needs consistent demos, Diffusion Policy handles multimodal strategies
  • Insufficient data — not enough episodes for your chosen policy (ACT wants 50+, DP wants 100+)
  • Timing problems — frame drops, FPS jitter, state-action lag
  • Low workspace coverage — demos that only cover a narrow region of the task space
  • Episode outliers — demonstrations that are statistically different from the rest (Modified Z-Score detection)

Supported Data Formats

Format Flag Source
LeRobot (Hub) --format lerobot HuggingFace datasets (lerobot/...)
LeRobot (local) --format lerobot-local Local LeRobot directories
HDF5 --format hdf5 RoboMimic, robosuite, custom .hdf5 files
RLDS --format rlds TFRecord-based datasets (requires pip install orbit-robotics[rlds])
ROS bags --format rosbag .bag and .mcap files (requires pip install orbit-robotics[rosbag])
Directory --format directory Flat directories of numpy/CSV files

Format is auto-detected in most cases. Use --format to override when needed.

Policy Support

Policy Flag Best for
ACT --policy act Consistent, high-res demos (50+ episodes)
Diffusion Policy --policy diffusion_policy Multimodal strategies (100+ episodes)
SmolVLA --policy smolvla Vision-language tasks, fewer episodes needed
DP3 --policy dp3 3D point cloud observations
BC / BC-RNN --policy bc Large datasets (200+ episodes)

--policy auto (default) recommends the best policy for your data.

All Commands

Data Collection

# Plan a data collection session — generates task-specific checklists
orbit plan "pick up cups" --robot so100 --policy act

# Real-time coach — watches as you collect and gives live feedback
orbit coach ./my-dataset/ --target 50 --policy act

# Select the best episodes from a large dataset
orbit curate lerobot/my-dataset --budget 50 -o curated.json

Training Pipeline

# Initialize a training project with Makefile and scripts
orbit init --dataset lerobot/my-dataset --policy act -o ./my-project/

# CI/CD quality gate — exits 1 if grade below threshold
orbit gate lerobot/my-dataset -p act --min-grade C

# Full pipeline: gate → train → verify (wraps ANY training command)
orbit train --gate "act --min-grade B" -- lerobot-train --dataset.repo_id=./data --policy.type=act

# Monitor a training run in real-time (alerts on NaN loss, plateaus)
orbit monitor ./outputs/train/my-run/

# Verify training outcome against predicted quality
orbit verify ./outputs/train/my-run/ --success-rate 0.82

# Diagnose a failed training run
orbit debug ./outputs/train/my-run/ --ai

Discovery & Benchmarks

# Browse LeRobot datasets on HuggingFace
orbit explore --robot koch --limit 10

# Browse 82 published benchmarks with known success rates
orbit benchmark --task pick_and_place --min-success 0.7
orbit benchmark --policy act --top 5

# Compare your results against the community
orbit report lerobot/my-dataset --policy act --success-rate 0.82 --eval-trials 20

Data Conversion

# Convert between formats (HDF5, ROS bags, RLDS → LeRobot v3)
orbit convert ./recording.mcap --to lerobot-v3 -o ./my-dataset/
orbit convert ./demo.hdf5 --to lerobot-v3 -o ./my-dataset/
orbit convert ./rlds-data/ --to lerobot-v3 -o ./my-dataset/

Utilities

# Check environment health and dependencies
orbit doctor

# Generate an ORBIT quality badge for your dataset
orbit badge lerobot/my-dataset --push

# Interactive AI assistant for your entire workflow
orbit assist
orbit assist --task "analyze my dataset and tell me if it's ready for ACT"

# Manage configuration
orbit config --show
orbit config --init
orbit config --set default_policy act

# Track progress against a collection plan
orbit track plan.json --dataset lerobot/my-dataset

# Install shell tab-completion
orbit install-completion zsh

JSON Output

Every analysis command supports --json for scripting and CI/CD:

orbit analyze lerobot/my-dataset --json | jq '.grade'
orbit gate lerobot/my-dataset -p act --min-grade B --json
orbit clean lerobot/my-dataset --dry-run --json

Configuration

ORBIT can be configured via ~/.orbit/config.yaml:

orbit config --init          # Create default config
orbit config --show          # View current config
orbit config --set default_policy act --set gpu_memory 24

Requirements

  • Python 3.10+ (tested on 3.10, 3.11, 3.12)
  • Internet for HuggingFace Hub datasets (local files work fully offline)
  • GPU optional — only needed for --proxy training signal and SigLIP embeddings
  • GOOGLE_API_KEY optional — only for --deep, --ai, --vlm, and orbit assist

License

MIT — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orbit_robotics-0.4.1.tar.gz (322.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

orbit_robotics-0.4.1-py3-none-any.whl (287.1 kB view details)

Uploaded Python 3

File details

Details for the file orbit_robotics-0.4.1.tar.gz.

File metadata

  • Download URL: orbit_robotics-0.4.1.tar.gz
  • Upload date:
  • Size: 322.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for orbit_robotics-0.4.1.tar.gz
Algorithm Hash digest
SHA256 ac722400e46dcf74430681905334eb73363b9af31806f8ddf1fad462de89ca99
MD5 bb6f266cd22131c6d80be9d8d2ead412
BLAKE2b-256 518c30749ff056e95898f55cfd3fe51398b6dde977cb71d9fce8cc323b7abe94

See more details on using hashes here.

File details

Details for the file orbit_robotics-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: orbit_robotics-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 287.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for orbit_robotics-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e83ae3b59233c3f37bef7e2bf0086bd72004a0b517af67dfd552ad4ec885efe0
MD5 558d2efcbe489dd69c73da8b21166e99
BLAKE2b-256 0fe316e43c45c1fecb27760f8cdcb6e84d3c287ef19a7486aac6a84f63feeff5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page