The normalization layer for robotics data — convert between RLDS, LeRobot, Zarr, HDF5, MCAP, Rosbag, and more
Project description
███████╗ ██████╗ ██████╗ ██████╗ ███████╗ ██╔════╝██╔═══██╗██╔══██╗██╔════╝ ██╔════╝ █████╗ ██║ ██║██████╔╝██║ ███╗█████╗ ██╔══╝ ██║ ██║██╔══██╗██║ ██║██╔══╝ ██║ ╚██████╔╝██║ ██║╚██████╔╝███████╗ ╚═╝ ╚═════╝ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝
⚒ Robotics Data Toolkit ⚒
Convert, inspect, visualize, score, and discover robotics datasets across every major format.RLDS ═══╗ ╔═══► LeRobotZarr ═══╬════⚙════╬═══► RoboDMHDF5 ═══╝ ╚═══► RLDS
Convert between robotics dataset formats with one command. Score demonstration quality with research-backed metrics. Segment episodes into sub-skills with changepoint detection.
| Format | Read | Write | Visualize | Notes |
|---|---|---|---|---|
| RLDS | ✓ | ✓ | ✓ | Open-X, TensorFlow Datasets |
| LeRobot v2/v3 | ✓ | ✓ | ✓ | HuggingFace, Parquet + MP4 |
| GR00T | ✓ | - | ✓ | NVIDIA Isaac, LeRobot v2 with embodiment metadata |
| RoboDM | ✓ | ✓ | ✓ | Berkeley's .vla format, up to 70x compression* |
| Zarr | ✓ | - | ✓ | Diffusion Policy, UMI |
| HDF5 | ✓ | - | ✓ | robomimic, ACT/ALOHA |
| MCAP | ✓ | ✓ | ✓ | ROS2 CDR + Foxglove Protobuf, no ROS install required |
| Rosbag | ✓ | - | ✓ | ROS1 .bag, ROS2 SQLite3 |
*RoboDM requires manual installation from GitHub (see below)
See docs/model_formats.md for which models (Octo, OpenVLA, ACT, Diffusion Policy, etc.) use which format. See docs/format_reference.md for detailed format specifications.
Why Forge?
Every robotics lab has their own data format: Open-X uses RLDS, HuggingFace uses LeRobot, Diffusion Policy uses Zarr, robomimic uses HDF5. Want to train Octo on your ALOHA data? Write a converter. Want to use LeRobot on Open-X datasets? Write another.
Forge uses a hub-and-spoke architecture — one intermediate representation, O(n) format support:
Any Reader → Episode/Frame → Any Writer
Add a reader, get all writers for free. Add a writer, get all readers for free. No N×M conversion logic. See docs/architecture.md for details.
Quick Start
git clone https://github.com/arpitg1304/forge.git
cd forge
pip install -e ".[all]"
RoboDM Support (Optional)
RoboDM requires manual installation from GitHub (PyPI version has a codec bug):
git clone https://github.com/BerkeleyAutomation/robodm.git
pip install -e robodm
Usage
# See what's in a dataset
forge inspect /path/to/dataset
# Convert it
forge convert /path/to/rlds ./output --format lerobot-v3
forge convert hf://arpitg1304/stack_lego ./stack_lego_rlds --format rlds --workers 4 --visualize
forge convert hf://lerobot/pusht ./pusht_robodm --format robodm
Works with HuggingFace Hub too:
forge inspect hf://lerobot/pusht
forge convert hf://lerobot/pusht ./output --format lerobot-v3
Python API
import forge
# Inspect
info = forge.inspect("/path/to/dataset")
print(info.format, info.num_episodes, info.cameras)
# Convert
forge.convert(
"/path/to/rlds",
"/path/to/output",
target_format="lerobot-v3"
)
Quality Metrics
Automated episode-level quality scoring from proprioception data alone — no video processing needed.
forge quality ./my_dataset
forge quality hf://lerobot/aloha_sim_cube --export report.json
Scores each episode 0-10 based on 8 research-backed metrics:
- Smoothness (LDLJ) — jerk-based smoothness from motor control literature (Hogan & Sternad, 2009)
- Dead actions — zero/constant action detection (Kim et al. "OpenVLA", 2024)
- Gripper chatter — rapid open/close transitions (Sakr et al., 2024)
- Static detection — idle periods where the robot isn't moving (Liu et al. "SCIZOR", 2025)
- Timestamp regularity — dropped frames and frequency jitter
- Action saturation — time spent at hardware limits
- Action entropy — diversity vs repetitiveness (Belkhale et al. "DemInf", 2025)
- Path length — wandering/hesitation in joint space
See forge/quality/README.md for full metric details, paper references, and how to add new metrics.
Episode Filtering
Filter datasets by quality score, flags, or episode IDs. Supports dry-run previews and pre-computed quality reports.
forge filter ./my_dataset --min-quality 6.0 # Dry-run preview
forge filter ./my_dataset ./filtered --min-quality 6.0 # Write filtered dataset
forge filter ./my_dataset ./filtered --exclude-flags jerky,mostly_static
forge filter ./my_dataset ./filtered --from-report report.json # Skip re-analysis
See forge/filter/README.md for full details.
Dataset Registry
A curated catalog of 23+ prominent robotics datasets — browse, search, and download by name instead of memorizing URIs. Browse the registry online
# Browse all datasets
forge registry list
# Open an interactive HTML browser with filtering
forge registry list --html
# Filter by format, embodiment, or tags
forge registry list --format rlds --embodiment franka
forge registry list --tag manipulation --demo
# Get detailed info on a dataset
forge registry info droid
# Search across names, tags, embodiments, and task types
forge registry search "franka manipulation"
# Validate the registry (for contributors)
forge registry validate
Registry ID Resolution
Use dataset IDs directly in any command — no need for full paths or URIs:
forge inspect droid # resolves to hf://lerobot/droid
forge quality pusht # resolves to hf://lerobot/pusht
forge convert droid ./output --format lerobot-v3
Quick Start with forge demo
Download a small demo dataset, inspect it, and run quality scoring — all in one command:
forge demo # uses pusht by default
forge demo aloha_sim_cube # or pick any demo-suitable dataset
See forge/registry/CONTRIBUTING.md for how to add new datasets to the registry.
Episode Segmentation
Automatic episode segmentation via PELT changepoint detection on proprioception signals. Splits episodes into contiguous phases (sub-skills, regime changes, idle periods) without video processing.
forge segment ./my_dataset
forge segment hf://lerobot/droid_100 --export segments.json --plot timeline.png
forge segment ./my_dataset --signal action --penalty bic --cost-model rbf
forge segment ./my_dataset --sample 20
Detects where the statistical properties of the proprio signal change abruptly — e.g., transitions between reaching, grasping, and placing phases. Configurable cost models (rbf, l2, l1), penalty methods (bic, aic, or numeric), and signal selection (observation.state, action, qpos).
See forge/segment/README.md for full details.
Visualization
Forge ships three visualization backends selectable with --backend:
forge visualize pusht # web (default) — browser-based, no install
forge visualize pusht --backend matplotlib # matplotlib — sliders, comparison mode
forge visualize pusht --backend rerun # Rerun — cameras + time-series on one timeline
forge visualize pusht --backend rerun --segment # with PELT phase labels
forge visualize pusht --backend rerun --samples 3 # stream multiple episodes
The Rerun backend logs each frame's camera images, per-dimension action and state scalars, and segment labels into the Rerun viewer — all aligned on a shared frame timeline.
Install the Rerun extra to use it:
pip install "forge-robotics[rerun]"
CLI Reference
See docs/cli.md for the full command reference including:
forge inspect- Dataset inspection and schema analysisforge convert- Format conversion with camera mappingforge visualize- Interactive dataset viewer (backends:web,matplotlib,rerun)forge quality- Episode-level quality scoring (details)forge filter- Quality-based episode filtering (details)forge registry- Browse and search the dataset registryforge demo- Quick-start with a demo datasetforge segment- Episode segmentation via changepoint detection (details)forge stats- Compute dataset statisticsforge export-video- Extract camera videos as MP4forge hub- Search and download from HuggingFace
Configuration
For complex conversions, use a YAML config:
forge inspect my_dataset/ --generate-config config.yaml
forge convert my_dataset/ output/ --config config.yaml
See docs/configuration.md for details.
Roadmap
Planned features (contributions welcome!):
- Dataset merging - Combine multiple datasets into one (
forge merge ds1/ ds2/ --output combined/) - Train/val/test splitting - Split datasets with stratification (
--split 80/10/10) - Dataset registry - Curated catalog of 23+ robotics datasets with CLI browser and HTML viewer
- Streaming reads - Process HuggingFace datasets without full download
- Episode filtering - Filter by quality score, flags, or episode IDs (
forge filter --min-quality 6.0) - Depth/point cloud support - Preserve depth streams from RLDS/Open-X
- GR00T writer - Write to NVIDIA Isaac GR00T training format (read support complete)
- Distributed conversion - Scale to 100K+ episode datasets across nodes
- Conversion verification - Automated diff between source and converted data
Development
make venv && source .venv/bin/activate
make install-dev
make test
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file forge_robotics-0.2.0.tar.gz.
File metadata
- Download URL: forge_robotics-0.2.0.tar.gz
- Upload date:
- Size: 1.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e8c8b6df19d569d01124faa3d4434d88c00a5c4aabd24689d31c6cf4ecdd732
|
|
| MD5 |
a95b55159e8329aef1798ef5d1796579
|
|
| BLAKE2b-256 |
e40d6c45f722d96c70af5b0c17d863de65ddc12ee665287e7b287d4d3f9bac8d
|
Provenance
The following attestation bundles were made for forge_robotics-0.2.0.tar.gz:
Publisher:
publish.yml on arpitg1304/forge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
forge_robotics-0.2.0.tar.gz -
Subject digest:
6e8c8b6df19d569d01124faa3d4434d88c00a5c4aabd24689d31c6cf4ecdd732 - Sigstore transparency entry: 1391998059
- Sigstore integration time:
-
Permalink:
arpitg1304/forge@1055197336f574a378d05777c3fc34fa89e5e281 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/arpitg1304
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1055197336f574a378d05777c3fc34fa89e5e281 -
Trigger Event:
release
-
Statement type:
File details
Details for the file forge_robotics-0.2.0-py3-none-any.whl.
File metadata
- Download URL: forge_robotics-0.2.0-py3-none-any.whl
- Upload date:
- Size: 224.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86553835bc1c4df7fb9c47a711b2fe37a9acb43cd0fabe1f9a2738a96e41ed05
|
|
| MD5 |
694cf71f90b6d5b5536d126c84598de2
|
|
| BLAKE2b-256 |
40502559a3ee8201d97b526cff30877ba38c4ecee41af6276083cfe6f394cce9
|
Provenance
The following attestation bundles were made for forge_robotics-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on arpitg1304/forge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
forge_robotics-0.2.0-py3-none-any.whl -
Subject digest:
86553835bc1c4df7fb9c47a711b2fe37a9acb43cd0fabe1f9a2738a96e41ed05 - Sigstore transparency entry: 1391998076
- Sigstore integration time:
-
Permalink:
arpitg1304/forge@1055197336f574a378d05777c3fc34fa89e5e281 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/arpitg1304
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1055197336f574a378d05777c3fc34fa89e5e281 -
Trigger Event:
release
-
Statement type: