Data engineering copilot for robot imitation learning datasets

These details have not been verified by PyPI

Project links

Project description

ORBIT

Data quality tool for robot imitation learning. Analyzes your demonstration datasets, finds problems, and tells you what to fix before you waste hours training a policy that won't work.

Who is this for? Anyone training robot policies with imitation learning — whether you're using LeRobot, RoboMimic, or your own pipeline. If you've collected teleoperation demonstrations and want to know if they're good enough to train on, ORBIT tells you.

New in v0.5.0: Quality Leaderboard scoring every public robotics dataset | GitHub Action for CI/CD quality gates | orbit compare for side-by-side dataset diffs | orbit auth for API key management | --output-format github for Actions annotations

Install

pip install orbit-robotics

Requires Python 3.10+. No GPU needed for core analysis.

# Optional extras
pip install orbit-robotics[vision]   # SigLIP embedding analysis (needs torch)
pip install orbit-robotics[vlm]      # Gemini VLM task analysis
pip install orbit-robotics[hdf5]     # HDF5 dataset support
pip install orbit-robotics[rosbag]   # ROS bag support
pip install orbit-robotics[all]      # Everything

Verify your install:

orbit doctor

Setting Up API Keys

Some features use Google's Gemini API. These are optional — core analysis works without any API keys.

# Get a free API key from https://aistudio.google.com/apikey
export GOOGLE_API_KEY="your-key-here"

# Add to your shell profile to persist:
echo 'export GOOGLE_API_KEY="your-key-here"' >> ~/.zshrc   # macOS
echo 'export GOOGLE_API_KEY="your-key-here"' >> ~/.bashrc  # Linux

What uses GOOGLE_API_KEY:

Feature	Flag	Cost	What it does
AI quality judge	`--ai`	~$0.001/run	Verifies A grades with Gemini
Deep analysis	`--deep`	~$0.01/run	AI-powered root cause analysis
VLM assessment	`--vlm`	~$0.01/run	Vision-language task understanding
AI assistant	`orbit assist`	varies	Interactive AI copilot

Quick Start

Try ORBIT in 30 seconds — no setup, no data, no API keys:

orbit quickstart

This downloads a public reference dataset and runs a full analysis so you can see what ORBIT does. Then point it at your own data:

orbit quickstart lerobot/your-dataset

The Workflow

1. Analyze your dataset

Point ORBIT at any dataset on HuggingFace Hub, a local directory, or a file:

orbit analyze lerobot/xarm_lift_medium

Dataset Readiness: C (score: 65/100)
Usable but has problems — run orbit clean first

  ✓ High consistency (1.00)
  ✓ Sufficient episodes (800) for diffusion_policy
  ✓ Good policy fit (1.00)
  ✗ 4 joints clipping (>10% of frames)
  ✗ High action divergence (0.46) — demos contradict each other

YOUR DATA AT A GLANCE
  Episodes:       800     (top 25%)
  Coverage:       0.84  ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░
  Signal Health:  0.00  ░░░░░░░░░░░░░░░░░░░░░░░░░

Common options:

orbit analyze lerobot/my-dataset --policy act          # Check fit for specific policy
orbit analyze lerobot/my-dataset --deep                # AI-powered deep analysis (needs GOOGLE_API_KEY)
orbit analyze lerobot/my-dataset --ai                  # AI verification of grades (needs GOOGLE_API_KEY)
orbit analyze lerobot/my-dataset --vlm                 # VLM vision assessment (needs GOOGLE_API_KEY)
orbit analyze lerobot/my-dataset --proxy               # Proxy training go/no-go (needs torch, ~2 min)
orbit analyze lerobot/my-dataset --json                # Machine-readable output
orbit analyze lerobot/my-dataset --full                # All episodes (no sampling)
orbit analyze lerobot/my-dataset --episodes 100        # Limit to 100 episodes
orbit analyze ./local-data/ --format hdf5              # Local HDF5 files
orbit analyze ./recording.mcap --format rosbag         # ROS bag files

2. Clean bad episodes

orbit clean lerobot/my-dataset --dry-run    # Preview what would be removed
orbit clean lerobot/my-dataset              # Actually remove bad episodes

Identifies and removes bad episodes — aborted demos, dead servos, outliers. Outputs a cleaned dataset you can train on.

orbit clean lerobot/my-dataset -o user/my-clean-dataset   # Save to new repo
orbit clean lerobot/my-dataset --aggressive               # Also remove borderline episodes
orbit clean lerobot/my-dataset -p act                     # Policy-aware cleaning

2b. Prove that cleaning helps

orbit improve lerobot/my-dataset --policy bc

Runs the full pipeline twice — before and after outlier removal — to demonstrate measurable improvement in data learnability with proxy training.

3. Get a training command

orbit suggest lerobot/my-dataset

Recommends the best policy for your data and generates a ready-to-run training command with tuned hyperparameters (batch size, learning rate, horizon, steps).

orbit suggest lerobot/my-dataset --policy act              # Force specific policy
orbit suggest lerobot/my-dataset --gpu-memory 16           # Optimize for your GPU
orbit suggest lerobot/my-dataset --framework openvla       # Use OpenVLA instead of LeRobot
orbit suggest lerobot/my-dataset --framework all           # Show all framework options

Supported frameworks: lerobot, openvla, openpi, groot, osmo, custom.

4. Or do it all in one command

orbit fix lerobot/my-dataset

Runs the full pipeline — analyze, clean, suggest — in one shot.

orbit fix lerobot/my-dataset --dry-run                     # Preview changes first
orbit fix lerobot/my-dataset -p act -o user/clean-data     # Policy + output path

Understanding Grades

ORBIT grades your dataset A through F based on detected problems:

Grade	Score	Meaning
A	85-100	Ready to train — expect strong results
B	72-84	Good data — minor issues, should train well
C	58-71	Usable but has problems — clean first
D	40-57	Significant issues — collect more or better data
F	0-39	Critical problems — fix before training

Grades are calibrated against 82 real datasets with known training outcomes. An A means the data actually trained successfully; an F means it didn't.

What ORBIT Checks

Dead servos — joints that stopped moving during collection, wasting model capacity on zero outputs
Aborted/corrupted episodes — too short, too long, or no meaningful motion
Clipping joints — hitting position limits, creating discontinuous action targets
Inconsistent demonstrations — doing different things in similar states, confusing the policy
Wrong policy for your data — ACT needs consistent demos, Diffusion Policy handles multimodal strategies
Insufficient data — not enough episodes for your chosen policy (ACT wants 50+, DP wants 100+)
Timing problems — frame drops, FPS jitter, state-action lag
Low workspace coverage — demos that only cover a narrow region of the task space
Episode outliers — demonstrations that are statistically different from the rest (Modified Z-Score detection)

Supported Data Formats

Format	Flag	Source
LeRobot (Hub)	`--format lerobot`	HuggingFace datasets (`lerobot/...`)
LeRobot (local)	`--format lerobot-local`	Local LeRobot directories
HDF5	`--format hdf5`	RoboMimic, robosuite, custom `.hdf5` files
RLDS	`--format rlds`	TFRecord-based datasets (requires `pip install orbit-robotics[rlds]`)
ROS bags	`--format rosbag`	`.bag` and `.mcap` files (requires `pip install orbit-robotics[rosbag]`)
Directory	`--format directory`	Flat directories of numpy/CSV files

Format is auto-detected in most cases. Use --format to override when needed.

Policy Support

Policy	Flag	Best for
ACT	`--policy act`	Consistent, high-res demos (50+ episodes)
Diffusion Policy	`--policy diffusion_policy`	Multimodal strategies (100+ episodes)
SmolVLA	`--policy smolvla`	Vision-language tasks, fewer episodes needed
DP3	`--policy dp3`	3D point cloud observations
BC / BC-RNN	`--policy bc`	Large datasets (200+ episodes)

--policy auto (default) recommends the best policy for your data.

All Commands

Data Collection

# Plan a data collection session — generates task-specific checklists
orbit plan "pick up cups" --robot so100 --policy act

# Real-time coach — watches as you collect and gives live feedback
orbit coach ./my-dataset/ --target 50 --policy act

# Select the best episodes from a large dataset
orbit curate lerobot/my-dataset --budget 50 -o curated.json

Training Pipeline

# Initialize a training project with Makefile and scripts
orbit init --dataset lerobot/my-dataset --policy act -o ./my-project/

# CI/CD quality gate — exits 1 if grade below threshold
orbit gate lerobot/my-dataset -p act --min-grade C

# Full pipeline: gate → train → verify (wraps ANY training command)
orbit train --gate "act --min-grade B" -- lerobot-train --dataset.repo_id=./data --policy.type=act

# Monitor a training run in real-time (alerts on NaN loss, plateaus)
orbit monitor ./outputs/train/my-run/

# Verify training outcome against predicted quality
orbit verify ./outputs/train/my-run/ --success-rate 0.82

# Diagnose a failed training run
orbit debug ./outputs/train/my-run/ --ai

Discovery & Benchmarks

# Browse LeRobot datasets on HuggingFace
orbit explore --robot koch --limit 10

# Browse 82 published benchmarks with known success rates
orbit benchmark --task pick_and_place --min-success 0.7
orbit benchmark --policy act --top 5

# Compare your results against the community
orbit report lerobot/my-dataset --policy act --success-rate 0.82 --eval-trials 20

Data Conversion

# Convert between formats (HDF5, ROS bags, RLDS → LeRobot v3)
orbit convert ./recording.mcap --to lerobot-v3 -o ./my-dataset/
orbit convert ./demo.hdf5 --to lerobot-v3 -o ./my-dataset/
orbit convert ./rlds-data/ --to lerobot-v3 -o ./my-dataset/

Compare & Leaderboard

# Compare two datasets side-by-side
orbit compare lerobot/dataset-v1 lerobot/dataset-v2
orbit compare ./before-clean/ ./after-clean/ --policy act

# Browse the quality leaderboard of public datasets
orbit leaderboard                    # Top 20 datasets
orbit leaderboard --grade A          # Only training-ready
orbit leaderboard --robot aloha      # Filter by robot

CI/CD Integration

# Quality gate — exits 1 if grade below threshold
orbit gate lerobot/my-dataset -p act --min-grade B

# GitHub Actions annotation format
orbit gate lerobot/my-dataset -p act --output-format github

Or use the GitHub Action directly in your workflow:

- uses: orbit-robotics/gate-action@v1
  with:
    dataset: ./data/
    policy: act
    min-grade: B

Utilities

# Check environment health and dependencies
orbit doctor

# Generate an ORBIT quality badge for your dataset
orbit badge lerobot/my-dataset --push

# Manage API keys (Google API for --ai/--deep, ORBIT dashboard token)
orbit auth set google-api-key AIza...
orbit auth show

# Interactive AI assistant for your entire workflow
orbit assist
orbit assist --task "analyze my dataset and tell me if it's ready for ACT"

# Manage configuration
orbit config --show
orbit config --init
orbit config --set default_policy act

# Track progress against a collection plan
orbit track plan.json --dataset lerobot/my-dataset

# Install shell tab-completion
orbit install-completion zsh

JSON Output

Every analysis command supports --json for scripting and CI/CD:

orbit analyze lerobot/my-dataset --json | jq '.grade'
orbit gate lerobot/my-dataset -p act --min-grade B --json
orbit clean lerobot/my-dataset --dry-run --json

Configuration

ORBIT can be configured via ~/.orbit/config.yaml:

orbit config --init          # Create default config
orbit config --show          # View current config
orbit config --set default_policy act --set gpu_memory 24

Requirements

Python 3.10+ (tested on 3.10, 3.11, 3.12)
Internet for HuggingFace Hub datasets (local files work fully offline)
GPU optional — only needed for --proxy training signal and SigLIP embeddings
GOOGLE_API_KEY optional — only for --deep, --ai, --vlm, and orbit assist

License

MIT — see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.7.1

Apr 6, 2026

0.7.0

Apr 6, 2026

0.6.1

Apr 6, 2026

0.6.0

Apr 6, 2026

0.5.4

Apr 5, 2026

0.5.3

Apr 5, 2026

0.5.2

Apr 5, 2026

0.5.1

Apr 5, 2026

This version

0.5.0

Apr 4, 2026

0.4.1

Apr 3, 2026

0.4.0

Apr 3, 2026

0.3.0

Mar 29, 2026

0.2.1

Mar 22, 2026

0.2.0

Mar 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orbit_robotics-0.5.0.tar.gz (377.8 kB view details)

Uploaded Apr 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

orbit_robotics-0.5.0-py3-none-any.whl (336.5 kB view details)

Uploaded Apr 4, 2026 Python 3

File details

Details for the file orbit_robotics-0.5.0.tar.gz.

File metadata

Download URL: orbit_robotics-0.5.0.tar.gz
Upload date: Apr 4, 2026
Size: 377.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for orbit_robotics-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`8bb827f09dc3b08229bb94e48564332d133634cf426a8951534b0a985fe15a49`
MD5	`aa83fbc4744b2c28582d54ecee8df010`
BLAKE2b-256	`75e1103ed1ea0bb517cff91129e559b760644786d69c0eabb3f453d43c4eb85d`

See more details on using hashes here.

File details

Details for the file orbit_robotics-0.5.0-py3-none-any.whl.

File metadata

Download URL: orbit_robotics-0.5.0-py3-none-any.whl
Upload date: Apr 4, 2026
Size: 336.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for orbit_robotics-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d6305a0dca27c523385e1ca2f86ef14fce3f2418758744eeac2781ba490b440f`
MD5	`6c444b3fe0dfb26bfb31e169a40cf728`
BLAKE2b-256	`225dbe44b39a8faa44c58a99d00a06a64260b1b4270f46c0ca8cb3ebe2ac9d07`

See more details on using hashes here.

orbit-robotics 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ORBIT

Install

Setting Up API Keys

Quick Start

The Workflow

1. Analyze your dataset

2. Clean bad episodes

2b. Prove that cleaning helps

3. Get a training command

4. Or do it all in one command

Understanding Grades

What ORBIT Checks

Supported Data Formats

Policy Support

All Commands

Data Collection

Training Pipeline

Discovery & Benchmarks

Data Conversion

Compare & Leaderboard

CI/CD Integration

Utilities

JSON Output

Configuration

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes