Skip to main content

Bidirectional converter and validator for AgiBot World ↔ LeRobot v3 datasets.

Project description

embodied-data

Bidirectional converter and validator for AgiBot World ↔ LeRobot v3 datasets.

PyPI Python CI License: MIT

What it does

  • Bidirectional conversion between AgiBot World (DigitalWorld sim + Beta/Alpha real hardware) and LeRobot v3.
  • Schema-detect dispatcher — point convert at any AgiBot root and the right reader fires automatically.
  • Five-check validator — schema conformance, fps consistency, timestamp monotonicity, action-dim consistency, frame ↔ video alignment.
  • Batch + resume--max-episodes for parallel conversion, meta/uuid_map.parquet for restartable jobs.
  • Stdlib-first — h5py + pyarrow + av; no PyTorch dependency in the data path.

Quick start

LeRobot's pusht is the fastest end-to-end check (no HuggingFace gating, ~30 s):

pip install --upgrade embodied-data
huggingface-cli download lerobot/pusht --repo-type dataset --local-dir ./pusht

embodied-data preview  ./pusht
embodied-data validate ./pusht

preview prints a per-episode stats table; validate runs all five checks and exits non-zero on failure.

Real AgiBot data (HuggingFace gated)

AgiBot World Beta and Alpha live on HuggingFace under a gated license. Request access on the AgiBotWorld-Beta page first, then:

huggingface-cli login
huggingface-cli download agibot-world/AgiBotWorld-Beta \
    --repo-type dataset \
    --include "task_info_675.json" "observations/675/936938/**" "proprio_stats/675/936938.h5" \
    --local-dir ./agibot_beta_root

embodied-data convert \
    ./agibot_beta_root/675/936938 \
    /tmp/beta_v3 \
    --from agibot --to lerobot-v3

embodied-data validate /tmp/beta_v3

For batch conversion of a whole task, point convert at the task root and pass --max-episodes N. Streaming-extraction tips for partial Beta downloads are in docs/schema/beta.md.

Validation example

embodied-data validate output showing five PASS rows in a Rich-rendered table

Why this exists

Robotics researchers spend days rewriting the same dataset conversion scripts. AgiBot World's official convert_to_lerobot.py has carried unresolved issues for months; LeRobot's v2.0 / v2.1 / v3.0 versions break each other; every lab writes its own timestamp alignment check. This tool is the layer that stops.

Concrete upstream issues this project addresses or works around:

Roadmap

  • v0.3 (shipped on main, awaiting tag)observation.images.head_color video for Beta / Alpha (single + batch) so v3 datasets are usable for VLA training end-to-end.
  • v0.3.x patches — multi-camera (fisheye / hand / back), sparse action/*/index masks, end-pose flattening, reverse Beta path (see docs/v0.3.x-patches.md).
  • v0.4+ — ALOHA HDF5 ingest, RLDS export, OpenX Embodiment alignment.

Cross-embodiment action-space retargeting and Chinese prompt embedding remain explicit non-goals.

Schema reference

Install

pip install embodied-data
embodied-data --help

Python 3.12+ required.

Development

git clone https://github.com/allenwu-blip/embodied-data.git
cd embodied-data
uv sync
uv run pytest

Coverage

  • 56 commits, 3 PyPI releases (0.1.0 / 0.1.1 / 0.2.0); v0.3 staged on main
  • 114 passing tests + 1 skipped (gated dataset)
  • 4 upstream issue threads engaged
  • 4 HuggingFace datasets exercised end-to-end (lerobot/pusht, AgiBotWorld-Beta, AgiBotWorld-Alpha, agibot-world/agibot_digital_world)

Acknowledgments

  • HuggingFace LeRobot team for the v3 schema and reference datasets
  • OpenDriveLab AgiBot World team for releasing Beta and Alpha under HF gating

License

MIT — see LICENSE.

Contact

Bug reports and feature requests: GitHub Issues.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embodied_data-0.3.0.tar.gz (41.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embodied_data-0.3.0-py3-none-any.whl (53.0 kB view details)

Uploaded Python 3

File details

Details for the file embodied_data-0.3.0.tar.gz.

File metadata

  • Download URL: embodied_data-0.3.0.tar.gz
  • Upload date:
  • Size: 41.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for embodied_data-0.3.0.tar.gz
Algorithm Hash digest
SHA256 95a7f5e09903abf3d93ef796da0b259f693fb53521ef86ceeb8ff7b39425be8d
MD5 3de4a5236e29600bab33161eedbb40a0
BLAKE2b-256 1de20897bdc23125014268ec0fae43c48f4734e4864d7dda8b88aa3386490d87

See more details on using hashes here.

File details

Details for the file embodied_data-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: embodied_data-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 53.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for embodied_data-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9798cc824981340d66c33d874e74600077bf499e64aca85ef3413480c035f3bc
MD5 88aae86b7df9e5c98644f71d3ac45877
BLAKE2b-256 19df05fb254eeb9c63b1e23015ac07d91df38f9bffc67c6f10079d83a4c313a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page