Skip to main content

Lance-backed datasets for LeRobot — frame-level random access on local disk and cloud (S3 / GCS / HF Hub / HF Buckets).

Project description

lerobot-lancedb

📖 Docs: https://lancedb.github.io/lerobot-lancedb/

Lance-backed datasets for LeRobot. Drop-in replacement for LeRobotDataset with two storage layouts:

  • LeRobotLanceDataset — per-frame JPEG bytes (lossy, fastest at single-frame access, optional GPU NVJPEG decode).
  • LeRobotLanceVideoDataset — per-file mp4 bytes stored via Lance blob v2, decoded on the fly with torchcodec. Bit-exact pixels, ~same disk size as upstream.

Both subclass LeRobotDataset so existing trainers / samplers / isinstance checks accept them transparently.

Install

pip install lerobot-lancedb

For local development:

git clone https://github.com/lancedb/lerobot-lancedb.git
cd lerobot-lancedb
pip install -e '.[dev]'

Quickstart

# Convert (recommended path for dtype=video sources)
lerobot-convert-to-lance-video \
    --repo-id=lerobot/aloha_static_cups_open \
    --output=./aloha_cups_open_lance_video --overwrite
from lerobot_lancedb import LeRobotLanceVideoDataset
ds = LeRobotLanceVideoDataset(root="./aloha_cups_open_lance_video")

For the JPEG layout, use lerobot-convert-to-lance and LeRobotLanceDataset instead. See the docs for the full CLI / API reference.

Benchmark

Realistic training read pattern (delta_timestamps, 8 frames / sample, batch 32, num_workers 4, CPU decode, H100):

dataset format size MB delta_ts fps speedup
pusht (96×96, 1-cam) upstream parquet+mp4 7.3 750 1.00×
convert_to_lance (JPEG-95) 60.0 3510 4.68×
convert_to_lance --jpeg-quality=100 --jpeg-subsampling=0 105.6 2909 3.88×
convert_to_lance_video 8.0 2853 3.80×
ALOHA cups_open (480×640, 4-cam) upstream parquet+mp4 485.6 18.7 1.00×
convert_to_lance (JPEG-95) 3626.0 46.0 2.46×
convert_to_lance --jpeg-quality=100 --jpeg-subsampling=0 8735.4 32.5 1.74×
convert_to_lance_video 487.4 45.6 2.44×
Koch lego (480×640, 2-cam) upstream parquet+mp4 2014.1 26.6 1.00×
convert_to_lance (JPEG-95) 8541.0 70.8 2.66×
convert_to_lance --jpeg-quality=100 --jpeg-subsampling=0 17 335.3 49.0 1.84×
convert_to_lance_video 2015.9 53.8 2.02×

Reproducible via examples/benchmark_formats.py.

Training parity

convert_to_lance_video trains a DiffusionPolicy on pusht to 68.4 % gym-pusht success (seed=42, 500 rollouts) — matches the head-to-head upstream parquet+mp4 result (68.0 %) and the published lerobot/diffusion_pusht (65.4 %).

Full numbers (pusht env-eval + ALOHA cups_open held-out MSE across all storage modes) in docs/benchmarks.md. Reproducers: examples/train_and_eval_lance.py and examples/aloha_loader_parity.py.

Cloud / Hub

Both readers accept s3://, gs://, hf://datasets/..., hf://buckets/... URIs and pick up credentials from the usual env vars (AWS_*, GOOGLE_APPLICATION_CREDENTIALS, HF_TOKEN). Lance does byte-range fetches — no full-dataset download.

Pre-converted reference datasets you can paste directly:

from lerobot_lancedb import LeRobotLanceDataset, LeRobotLanceVideoDataset

LeRobotLanceDataset(repo_id="lance-format/pusht-lerobot-lancedb")        # 60 MB JPEG layout
LeRobotLanceVideoDataset(repo_id="lance-format/pusht-lerobot-lancedb-video")  # 8 MB video-blob layout

License

Apache 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lerobot_lancedb-0.1.0.tar.gz (43.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lerobot_lancedb-0.1.0-py3-none-any.whl (43.3 kB view details)

Uploaded Python 3

File details

Details for the file lerobot_lancedb-0.1.0.tar.gz.

File metadata

  • Download URL: lerobot_lancedb-0.1.0.tar.gz
  • Upload date:
  • Size: 43.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for lerobot_lancedb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b1b121bdae00d95099fac85ace62e43129af537cb70c9c8bff97b33e828ba772
MD5 2045a3aca6654489c7c1fd3770f5d520
BLAKE2b-256 91f3d1a651bb6e31423aa94e77ac637864c845ad5ca7a2cb67de08521d11771f

See more details on using hashes here.

File details

Details for the file lerobot_lancedb-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for lerobot_lancedb-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b6d24c627462777a3087875db2825e96a0de53624296e51a2967eae6b083a7b3
MD5 d715a8b85e6a3285ddb2f9108c76fd38
BLAKE2b-256 6b5b5ea9c8a3eb7336b7484a8db3437248a1a63a90be5fe9ae341157aab1499b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page