Skip to main content

Bidirectional converter and validator for AgiBot World ↔ LeRobot v3 datasets.

Project description

embodied-data

Bidirectional converter and validator for AgiBot World ↔ LeRobot v3 datasets.

PyPI version PyPI downloads CI Python License: MIT

What it does

  • Bidirectional conversion between AgiBot World (DigitalWorld sim + Beta/Alpha real hardware) and LeRobot v3.
  • Schema-detect dispatcher — point convert at any AgiBot root and the right reader fires automatically.
  • Five-check validator — schema conformance, fps consistency, timestamp monotonicity, action-dim consistency, frame ↔ video alignment.
  • Batch + resume--max-episodes for parallel conversion, meta/uuid_map.parquet for restartable jobs.
  • Stdlib-first — h5py + pyarrow + av; no PyTorch dependency in the data path.
embodied-data convert running on a Beta task root with one episode succeeding and one without upstream video logged to .beta_batch_errors.jsonl

Quick start

LeRobot's pusht is the fastest end-to-end check (no HuggingFace gating, ~30 s):

pip install --upgrade embodied-data
huggingface-cli download lerobot/pusht --repo-type dataset --local-dir ./pusht

embodied-data preview  ./pusht
embodied-data validate ./pusht

preview prints a per-episode stats table; validate runs all five checks and exits non-zero on failure.

💡 New on main (unreleased): convert --dry-run to preview a conversion plan without writing files, convert --verify to auto-validate the output, and inspect <dataset_dir> --summary for a high-level dataset overview.

Real AgiBot data (HuggingFace gated)

AgiBot World Beta and Alpha live on HuggingFace under a gated license. Request access on the AgiBotWorld-Beta page first, then:

huggingface-cli login
huggingface-cli download agibot-world/AgiBotWorld-Beta \
    --repo-type dataset \
    --include "task_info_675.json" "observations/675/936938/**" "proprio_stats/675/936938.h5" \
    --local-dir ./agibot_beta_root

embodied-data convert \
    ./agibot_beta_root/675/936938 \
    /tmp/beta_v3 \
    --from agibot --to lerobot-v3

embodied-data validate /tmp/beta_v3

For batch conversion of a whole task, point convert at the task root and pass --max-episodes N. Streaming-extraction tips for partial Beta downloads are in docs/schema/beta.md.

Validation example

embodied-data validate output showing five PASS rows in a Rich-rendered table

Why this exists

Robotics researchers spend days rewriting the same dataset conversion scripts. AgiBot World's official convert_to_lerobot.py has carried unresolved issues for months; LeRobot's v2.0 / v2.1 / v3.0 versions break each other; every lab writes its own timestamp alignment check. This tool is the layer that stops.

Concrete upstream issues this project addresses or works around:

Roadmap

  • v0.3.0 (released)observation.images.head_color video for Beta / Alpha (single + batch) so v3 datasets are usable for VLA training end-to-end.
  • v0.3.x patches — multi-camera (fisheye / hand / back), sparse action/*/index masks, end-pose flattening, reverse Beta path. Roadmap input welcome on Discussions / Ideas (see also docs/v0.3.x-patches.md).
  • v0.4+ — ALOHA HDF5 ingest, RLDS export, OpenX Embodiment alignment.

Cross-embodiment action-space retargeting and Chinese prompt embedding remain explicit non-goals.

Schema reference

Install

pip install embodied-data
embodied-data --help

Python 3.12+ required.

Development

git clone https://github.com/allenwu-blip/embodied-data.git
cd embodied-data
uv sync
uv run pytest

Coverage

  • 65+ commits, 4 PyPI releases (0.1.0 / 0.1.1 / 0.2.0 / 0.3.0)
  • 115 passing tests + 1 skipped (gated dataset)
  • 4 upstream issue threads engaged
  • 4 HuggingFace datasets exercised end-to-end (lerobot/pusht, AgiBotWorld-Beta, AgiBotWorld-Alpha, agibot-world/agibot_digital_world)

Acknowledgments

  • HuggingFace LeRobot team for the v3 schema and reference datasets
  • OpenDriveLab AgiBot World team for releasing Beta and Alpha under HF gating

License

MIT — see LICENSE.

Contact

Bug reports and feature requests: GitHub Issues.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embodied_data-0.3.1.tar.gz (45.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embodied_data-0.3.1-py3-none-any.whl (58.0 kB view details)

Uploaded Python 3

File details

Details for the file embodied_data-0.3.1.tar.gz.

File metadata

  • Download URL: embodied_data-0.3.1.tar.gz
  • Upload date:
  • Size: 45.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for embodied_data-0.3.1.tar.gz
Algorithm Hash digest
SHA256 65b5dcc2c16e913354c1194212f4d8f2fdc4df2d8c46abf4a98e5e2e2b354a3d
MD5 b4a8535674fbd024fa9fe2e8d9cdd2ac
BLAKE2b-256 4a6fbe94e783308df6f3c0278ddac27d6383265af3c6c38b6a5978f40c5ffd1e

See more details on using hashes here.

File details

Details for the file embodied_data-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: embodied_data-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 58.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for embodied_data-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e4a645c1d2a916e08a9c12d6580fe43dfc4803f49e157321b45a3d0ff2ca8494
MD5 94d1fbef2a00e2b6b3f1f913e255df7b
BLAKE2b-256 81e7483b512e046a78ca0b0d47b9ed4b26404d2da1ff050b64b5a2fe3d2d4646

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page