Skip to main content

The normalization layer for robotics data — convert between RLDS, LeRobot, Zarr, HDF5, MCAP, Rosbag, and more

Project description

███████╗ ██████╗ ██████╗  ██████╗ ███████╗
██╔════╝██╔═══██╗██╔══██╗██╔════╝ ██╔════╝
█████╗  ██║   ██║██████╔╝██║  ███╗█████╗
██╔══╝  ██║   ██║██╔══██╗██║   ██║██╔══╝
██║     ╚██████╔╝██║  ██║╚██████╔╝███████╗
╚═╝      ╚═════╝ ╚═╝  ╚═╝ ╚═════╝ ╚══════╝

⚒ Robotics Data Toolkit ⚒

Convert, inspect, visualize, score, and discover robotics datasets across every major format.

Website Python 3.10+ License: MIT

RLDS ═══╗ ╔═══► LeRobot
Zarr ═══╬════⚙════╬═══► RoboDM
HDF5 ═══╝ ╚═══► RLDS

Convert between robotics dataset formats with one command. Score demonstration quality with research-backed metrics. Segment episodes into sub-skills with changepoint detection.

Format Read Write Visualize Notes
RLDS Open-X, TensorFlow Datasets
LeRobot v2/v3 HuggingFace, Parquet + MP4
GR00T - NVIDIA Isaac, LeRobot v2 with embodiment metadata
RoboDM Berkeley's .vla format, up to 70x compression*
Zarr - Diffusion Policy, UMI
HDF5 - robomimic, ACT/ALOHA
MCAP ROS2 CDR + Foxglove Protobuf, no ROS install required
Rosbag - ROS1 .bag, ROS2 SQLite3

*RoboDM requires manual installation from GitHub (see below)

See docs/model_formats.md for which models (Octo, OpenVLA, ACT, Diffusion Policy, etc.) use which format. See docs/format_reference.md for detailed format specifications.

Why Forge?

Every robotics lab has their own data format: Open-X uses RLDS, HuggingFace uses LeRobot, Diffusion Policy uses Zarr, robomimic uses HDF5. Want to train Octo on your ALOHA data? Write a converter. Want to use LeRobot on Open-X datasets? Write another.

Forge uses a hub-and-spoke architecture — one intermediate representation, O(n) format support:

Any Reader → Episode/Frame → Any Writer

Add a reader, get all writers for free. Add a writer, get all readers for free. No N×M conversion logic. See docs/architecture.md for details.

Quick Start

git clone https://github.com/arpitg1304/forge.git
cd forge
pip install -e ".[all]"

RoboDM Support (Optional)

RoboDM requires manual installation from GitHub (PyPI version has a codec bug):

git clone https://github.com/BerkeleyAutomation/robodm.git
pip install -e robodm

Usage

# See what's in a dataset
forge inspect /path/to/dataset

# Convert it
forge convert /path/to/rlds ./output --format lerobot-v3
forge convert hf://arpitg1304/stack_lego ./stack_lego_rlds --format rlds --workers 4 --visualize
forge convert hf://lerobot/pusht ./pusht_robodm --format robodm

Works with HuggingFace Hub too:

forge inspect hf://lerobot/pusht
forge convert hf://lerobot/pusht ./output --format lerobot-v3

Python API

import forge

# Inspect
info = forge.inspect("/path/to/dataset")
print(info.format, info.num_episodes, info.cameras)

# Convert
forge.convert(
    "/path/to/rlds",
    "/path/to/output",
    target_format="lerobot-v3"
)

Quality Metrics

Automated episode-level quality scoring from proprioception data alone — no video processing needed.

forge quality ./my_dataset
forge quality hf://lerobot/aloha_sim_cube --export report.json

Scores each episode 0-10 based on 8 research-backed metrics:

  • Smoothness (LDLJ) — jerk-based smoothness from motor control literature (Hogan & Sternad, 2009)
  • Dead actions — zero/constant action detection (Kim et al. "OpenVLA", 2024)
  • Gripper chatter — rapid open/close transitions (Sakr et al., 2024)
  • Static detection — idle periods where the robot isn't moving (Liu et al. "SCIZOR", 2025)
  • Timestamp regularity — dropped frames and frequency jitter
  • Action saturation — time spent at hardware limits
  • Action entropy — diversity vs repetitiveness (Belkhale et al. "DemInf", 2025)
  • Path length — wandering/hesitation in joint space

See forge/quality/README.md for full metric details, paper references, and how to add new metrics.

Episode Filtering

Filter datasets by quality score, flags, or episode IDs. Supports dry-run previews and pre-computed quality reports.

forge filter ./my_dataset --min-quality 6.0                          # Dry-run preview
forge filter ./my_dataset ./filtered --min-quality 6.0               # Write filtered dataset
forge filter ./my_dataset ./filtered --exclude-flags jerky,mostly_static
forge filter ./my_dataset ./filtered --from-report report.json       # Skip re-analysis

See forge/filter/README.md for full details.

Dataset Registry

A curated catalog of 23+ prominent robotics datasets — browse, search, and download by name instead of memorizing URIs. Browse the registry online

# Browse all datasets
forge registry list

# Open an interactive HTML browser with filtering
forge registry list --html

# Filter by format, embodiment, or tags
forge registry list --format rlds --embodiment franka
forge registry list --tag manipulation --demo

# Get detailed info on a dataset
forge registry info droid

# Search across names, tags, embodiments, and task types
forge registry search "franka manipulation"

# Validate the registry (for contributors)
forge registry validate

Registry ID Resolution

Use dataset IDs directly in any command — no need for full paths or URIs:

forge inspect droid          # resolves to hf://lerobot/droid
forge quality pusht          # resolves to hf://lerobot/pusht
forge convert droid ./output --format lerobot-v3

Quick Start with forge demo

Download a small demo dataset, inspect it, and run quality scoring — all in one command:

forge demo                   # uses pusht by default
forge demo aloha_sim_cube    # or pick any demo-suitable dataset

See forge/registry/CONTRIBUTING.md for how to add new datasets to the registry.

Episode Segmentation

Automatic episode segmentation via PELT changepoint detection on proprioception signals. Splits episodes into contiguous phases (sub-skills, regime changes, idle periods) without video processing.

forge segment ./my_dataset
forge segment hf://lerobot/droid_100 --export segments.json --plot timeline.png
forge segment ./my_dataset --signal action --penalty bic --cost-model rbf
forge segment ./my_dataset --sample 20

Detects where the statistical properties of the proprio signal change abruptly — e.g., transitions between reaching, grasping, and placing phases. Configurable cost models (rbf, l2, l1), penalty methods (bic, aic, or numeric), and signal selection (observation.state, action, qpos).

See forge/segment/README.md for full details.

Visualization

Forge ships three visualization backends selectable with --backend:

forge visualize pusht                             # web (default) — browser-based, no install
forge visualize pusht --backend matplotlib        # matplotlib — sliders, comparison mode
forge visualize pusht --backend rerun             # Rerun — cameras + time-series on one timeline
forge visualize pusht --backend rerun --segment   # with PELT phase labels
forge visualize pusht --backend rerun --samples 3 # stream multiple episodes

The Rerun backend logs each frame's camera images, per-dimension action and state scalars, and segment labels into the Rerun viewer — all aligned on a shared frame timeline.

Rerun viewer showing camera stream alongside action and state time series

Install the Rerun extra to use it:

pip install "forge-robotics[rerun]"

CLI Reference

See docs/cli.md for the full command reference including:

  • forge inspect - Dataset inspection and schema analysis
  • forge convert - Format conversion with camera mapping
  • forge visualize - Interactive dataset viewer (backends: web, matplotlib, rerun)
  • forge quality - Episode-level quality scoring (details)
  • forge filter - Quality-based episode filtering (details)
  • forge registry - Browse and search the dataset registry
  • forge demo - Quick-start with a demo dataset
  • forge segment - Episode segmentation via changepoint detection (details)
  • forge stats - Compute dataset statistics
  • forge export-video - Extract camera videos as MP4
  • forge hub - Search and download from HuggingFace

Configuration

For complex conversions, use a YAML config:

forge inspect my_dataset/ --generate-config config.yaml
forge convert my_dataset/ output/ --config config.yaml

See docs/configuration.md for details.

Roadmap

Planned features (contributions welcome!):

  • Dataset merging - Combine multiple datasets into one (forge merge ds1/ ds2/ --output combined/)
  • Train/val/test splitting - Split datasets with stratification (--split 80/10/10)
  • Dataset registry - Curated catalog of 23+ robotics datasets with CLI browser and HTML viewer
  • Streaming reads - Process HuggingFace datasets without full download
  • Episode filtering - Filter by quality score, flags, or episode IDs (forge filter --min-quality 6.0)
  • Depth/point cloud support - Preserve depth streams from RLDS/Open-X
  • GR00T writer - Write to NVIDIA Isaac GR00T training format (read support complete)
  • Distributed conversion - Scale to 100K+ episode datasets across nodes
  • Conversion verification - Automated diff between source and converted data

Development

make venv && source .venv/bin/activate
make install-dev
make test

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

forge_robotics-0.2.0.tar.gz (1.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

forge_robotics-0.2.0-py3-none-any.whl (224.5 kB view details)

Uploaded Python 3

File details

Details for the file forge_robotics-0.2.0.tar.gz.

File metadata

  • Download URL: forge_robotics-0.2.0.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for forge_robotics-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6e8c8b6df19d569d01124faa3d4434d88c00a5c4aabd24689d31c6cf4ecdd732
MD5 a95b55159e8329aef1798ef5d1796579
BLAKE2b-256 e40d6c45f722d96c70af5b0c17d863de65ddc12ee665287e7b287d4d3f9bac8d

See more details on using hashes here.

Provenance

The following attestation bundles were made for forge_robotics-0.2.0.tar.gz:

Publisher: publish.yml on arpitg1304/forge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file forge_robotics-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: forge_robotics-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 224.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for forge_robotics-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 86553835bc1c4df7fb9c47a711b2fe37a9acb43cd0fabe1f9a2738a96e41ed05
MD5 694cf71f90b6d5b5536d126c84598de2
BLAKE2b-256 40502559a3ee8201d97b526cff30877ba38c4ecee41af6276083cfe6f394cce9

See more details on using hashes here.

Provenance

The following attestation bundles were made for forge_robotics-0.2.0-py3-none-any.whl:

Publisher: publish.yml on arpitg1304/forge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page