forge-robotics

The normalization layer for robotics data — convert between RLDS, LeRobot, Zarr, HDF5, MCAP, Rosbag, and more

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

███████╗ ██████╗ ██████╗  ██████╗ ███████╗
██╔════╝██╔═══██╗██╔══██╗██╔════╝ ██╔════╝
█████╗  ██║   ██║██████╔╝██║  ███╗█████╗
██╔══╝  ██║   ██║██╔══██╗██║   ██║██╔══╝
██║     ╚██████╔╝██║  ██║╚██████╔╝███████╗
╚═╝      ╚═════╝ ╚═╝  ╚═╝ ╚═════╝ ╚══════╝

⚒ Robotics Data Toolkit ⚒

Convert, inspect, visualize, score, and discover robotics datasets across every major format.

RLDS ═══╗ ╔═══► LeRobot
Zarr ═══╬════⚙════╬═══► RoboDM
HDF5 ═══╝ ╚═══► RLDS

Convert between robotics dataset formats with one command. Score demonstration quality with research-backed metrics. Segment episodes into sub-skills with changepoint detection.

Format	Read	Write	Visualize	Notes
RLDS	✓	✓	✓	Open-X, TensorFlow Datasets
LeRobot v2/v3	✓	✓	✓	HuggingFace, Parquet + MP4
GR00T	✓	-	✓	NVIDIA Isaac, LeRobot v2 with embodiment metadata
RoboDM	✓	✓	✓	Berkeley's .vla format, up to 70x compression*
Zarr	✓	-	✓	Diffusion Policy, UMI
HDF5	✓	-	✓	robomimic, ACT/ALOHA
MCAP	✓	✓	✓	ROS2 CDR + Foxglove Protobuf, no ROS install required
Rosbag	✓	-	✓	ROS1 .bag, ROS2 SQLite3

*RoboDM requires manual installation from GitHub (see below)

See docs/model_formats.md for which models (Octo, OpenVLA, ACT, Diffusion Policy, etc.) use which format. See docs/format_reference.md for detailed format specifications.

Why Forge?

Every robotics lab has their own data format: Open-X uses RLDS, HuggingFace uses LeRobot, Diffusion Policy uses Zarr, robomimic uses HDF5. Want to train Octo on your ALOHA data? Write a converter. Want to use LeRobot on Open-X datasets? Write another.

Forge uses a hub-and-spoke architecture — one intermediate representation, O(n) format support:

Any Reader → Episode/Frame → Any Writer

Add a reader, get all writers for free. Add a writer, get all readers for free. No N×M conversion logic. See docs/architecture.md for details.

Quick Start

git clone https://github.com/arpitg1304/forge.git
cd forge
pip install -e ".[all]"

RoboDM Support (Optional)

RoboDM requires manual installation from GitHub (PyPI version has a codec bug):

git clone https://github.com/BerkeleyAutomation/robodm.git
pip install -e robodm

Usage

# See what's in a dataset
forge inspect /path/to/dataset

# Convert it
forge convert /path/to/rlds ./output --format lerobot-v3
forge convert hf://arpitg1304/stack_lego ./stack_lego_rlds --format rlds --workers 4 --visualize
forge convert hf://lerobot/pusht ./pusht_robodm --format robodm

Works with HuggingFace Hub too:

forge inspect hf://lerobot/pusht
forge convert hf://lerobot/pusht ./output --format lerobot-v3

Python API

import forge

# Inspect
info = forge.inspect("/path/to/dataset")
print(info.format, info.num_episodes, info.cameras)

# Convert
forge.convert(
    "/path/to/rlds",
    "/path/to/output",
    target_format="lerobot-v3"
)

Quality Metrics

Automated episode-level quality scoring from proprioception data alone — no video processing needed.

forge quality ./my_dataset
forge quality hf://lerobot/aloha_sim_cube --export report.json

Scores each episode 0-10 based on 8 research-backed metrics:

Smoothness (LDLJ) — jerk-based smoothness from motor control literature (Hogan & Sternad, 2009)
Dead actions — zero/constant action detection (Kim et al. "OpenVLA", 2024)
Gripper chatter — rapid open/close transitions (Sakr et al., 2024)
Static detection — idle periods where the robot isn't moving (Liu et al. "SCIZOR", 2025)
Timestamp regularity — dropped frames and frequency jitter
Action saturation — time spent at hardware limits
Action entropy — diversity vs repetitiveness (Belkhale et al. "DemInf", 2025)
Path length — wandering/hesitation in joint space

See forge/quality/README.md for full metric details, paper references, and how to add new metrics.

Episode Filtering

Filter datasets by quality score, flags, or episode IDs. Supports dry-run previews and pre-computed quality reports.

forge filter ./my_dataset --min-quality 6.0                          # Dry-run preview
forge filter ./my_dataset ./filtered --min-quality 6.0               # Write filtered dataset
forge filter ./my_dataset ./filtered --exclude-flags jerky,mostly_static
forge filter ./my_dataset ./filtered --from-report report.json       # Skip re-analysis

See forge/filter/README.md for full details.

Dataset Registry

A curated catalog of 23+ prominent robotics datasets — browse, search, and download by name instead of memorizing URIs. Browse the registry online

# Browse all datasets
forge registry list

# Open an interactive HTML browser with filtering
forge registry list --html

# Filter by format, embodiment, or tags
forge registry list --format rlds --embodiment franka
forge registry list --tag manipulation --demo

# Get detailed info on a dataset
forge registry info droid

# Search across names, tags, embodiments, and task types
forge registry search "franka manipulation"

# Validate the registry (for contributors)
forge registry validate

Registry ID Resolution

Use dataset IDs directly in any command — no need for full paths or URIs:

forge inspect droid          # resolves to hf://lerobot/droid
forge quality pusht          # resolves to hf://lerobot/pusht
forge convert droid ./output --format lerobot-v3

Quick Start with `forge demo`

Download a small demo dataset, inspect it, and run quality scoring — all in one command:

forge demo                   # uses pusht by default
forge demo aloha_sim_cube    # or pick any demo-suitable dataset

See forge/registry/CONTRIBUTING.md for how to add new datasets to the registry.

Episode Segmentation

Automatic episode segmentation via PELT changepoint detection on proprioception signals. Splits episodes into contiguous phases (sub-skills, regime changes, idle periods) without video processing.

forge segment ./my_dataset
forge segment hf://lerobot/droid_100 --export segments.json --plot timeline.png
forge segment ./my_dataset --signal action --penalty bic --cost-model rbf
forge segment ./my_dataset --sample 20

Detects where the statistical properties of the proprio signal change abruptly — e.g., transitions between reaching, grasping, and placing phases. Configurable cost models (rbf, l2, l1), penalty methods (bic, aic, or numeric), and signal selection (observation.state, action, qpos).

See forge/segment/README.md for full details.

Visualization

Forge ships three visualization backends selectable with --backend:

forge visualize pusht                             # web (default) — browser-based, no install
forge visualize pusht --backend matplotlib        # matplotlib — sliders, comparison mode
forge visualize pusht --backend rerun             # Rerun — cameras + time-series on one timeline
forge visualize pusht --backend rerun --segment   # with PELT phase labels
forge visualize pusht --backend rerun --samples 3 # stream multiple episodes

The Rerun backend logs each frame's camera images, per-dimension action and state scalars, and segment labels into the Rerun viewer — all aligned on a shared frame timeline.

Rerun viewer showing camera stream alongside action and state time series

Install the Rerun extra to use it:

pip install "forge-robotics[rerun]"

CLI Reference

See docs/cli.md for the full command reference including:

forge inspect - Dataset inspection and schema analysis
forge convert - Format conversion with camera mapping
forge visualize - Interactive dataset viewer (backends: web, matplotlib, rerun)
forge quality - Episode-level quality scoring (details)
forge filter - Quality-based episode filtering (details)
forge registry - Browse and search the dataset registry
forge demo - Quick-start with a demo dataset
forge segment - Episode segmentation via changepoint detection (details)
forge stats - Compute dataset statistics
forge export-video - Extract camera videos as MP4
forge hub - Search and download from HuggingFace

Configuration

For complex conversions, use a YAML config:

forge inspect my_dataset/ --generate-config config.yaml
forge convert my_dataset/ output/ --config config.yaml

See docs/configuration.md for details.

Roadmap

Planned features (contributions welcome!):

Dataset merging - Combine multiple datasets into one (forge merge ds1/ ds2/ --output combined/)
Train/val/test splitting - Split datasets with stratification (--split 80/10/10)
Dataset registry - Curated catalog of 23+ robotics datasets with CLI browser and HTML viewer
Streaming reads - Process HuggingFace datasets without full download
Episode filtering - Filter by quality score, flags, or episode IDs (forge filter --min-quality 6.0)
Depth/point cloud support - Preserve depth streams from RLDS/Open-X
GR00T writer - Write to NVIDIA Isaac GR00T training format (read support complete)
Distributed conversion - Scale to 100K+ episode datasets across nodes
Conversion verification - Automated diff between source and converted data

Development

make venv && source .venv/bin/activate
make install-dev
make test

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

arpitg1304

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Apr 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

forge_robotics-0.2.0.tar.gz (1.7 MB view details)

Uploaded Apr 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

forge_robotics-0.2.0-py3-none-any.whl (224.5 kB view details)

Uploaded Apr 27, 2026 Python 3

File details

Details for the file forge_robotics-0.2.0.tar.gz.

File metadata

Download URL: forge_robotics-0.2.0.tar.gz
Upload date: Apr 27, 2026
Size: 1.7 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for forge_robotics-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`6e8c8b6df19d569d01124faa3d4434d88c00a5c4aabd24689d31c6cf4ecdd732`
MD5	`a95b55159e8329aef1798ef5d1796579`
BLAKE2b-256	`e40d6c45f722d96c70af5b0c17d863de65ddc12ee665287e7b287d4d3f9bac8d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for forge_robotics-0.2.0.tar.gz:

Publisher: publish.yml on arpitg1304/forge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: forge_robotics-0.2.0.tar.gz
- Subject digest: 6e8c8b6df19d569d01124faa3d4434d88c00a5c4aabd24689d31c6cf4ecdd732
- Sigstore transparency entry: 1391998059
- Sigstore integration time: Apr 27, 2026
Source repository:
- Permalink: arpitg1304/forge@1055197336f574a378d05777c3fc34fa89e5e281
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/arpitg1304
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1055197336f574a378d05777c3fc34fa89e5e281
- Trigger Event: release

File details

Details for the file forge_robotics-0.2.0-py3-none-any.whl.

File metadata

Download URL: forge_robotics-0.2.0-py3-none-any.whl
Upload date: Apr 27, 2026
Size: 224.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for forge_robotics-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`86553835bc1c4df7fb9c47a711b2fe37a9acb43cd0fabe1f9a2738a96e41ed05`
MD5	`694cf71f90b6d5b5536d126c84598de2`
BLAKE2b-256	`40502559a3ee8201d97b526cff30877ba38c4ecee41af6276083cfe6f394cce9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for forge_robotics-0.2.0-py3-none-any.whl:

Publisher: publish.yml on arpitg1304/forge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: forge_robotics-0.2.0-py3-none-any.whl
- Subject digest: 86553835bc1c4df7fb9c47a711b2fe37a9acb43cd0fabe1f9a2738a96e41ed05
- Sigstore transparency entry: 1391998076
- Sigstore integration time: Apr 27, 2026
Source repository:
- Permalink: arpitg1304/forge@1055197336f574a378d05777c3fc34fa89e5e281
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/arpitg1304
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1055197336f574a378d05777c3fc34fa89e5e281
- Trigger Event: release

forge-robotics 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

⚒ Robotics Data Toolkit ⚒

Why Forge?

Quick Start

RoboDM Support (Optional)

Usage

Python API

Quality Metrics

Episode Filtering

Dataset Registry

Registry ID Resolution

Quick Start with forge demo

Episode Segmentation

Visualization

CLI Reference

Configuration

Roadmap

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Quick Start with `forge demo`