Lance-backed datasets for LeRobot — frame-level random access on local disk and cloud (S3 / GCS / HF Hub / HF Buckets).
Project description
lerobot-lancedb
📖 Docs: https://lancedb.github.io/lerobot-lancedb/
Lance-backed datasets for LeRobot. Drop-in replacement for LeRobotDataset with two storage layouts:
LeRobotLanceDataset— per-frame JPEG bytes (lossy, fastest at single-frame access, optional GPU NVJPEG decode).LeRobotLanceVideoDataset— per-file mp4 bytes stored via Lance blob v2, decoded on the fly with torchcodec. Bit-exact pixels, ~same disk size as upstream.
Both subclass LeRobotDataset so existing trainers / samplers / isinstance checks accept them transparently.
Install
pip install lerobot-lancedb
For local development:
git clone https://github.com/lancedb/lerobot-lancedb.git
cd lerobot-lancedb
pip install -e '.[dev]'
Quickstart
# Convert (recommended path for dtype=video sources)
lerobot-convert-to-lance-video \
--repo-id=lerobot/aloha_static_cups_open \
--output=./aloha_cups_open_lance_video --overwrite
from lerobot_lancedb import LeRobotLanceVideoDataset
ds = LeRobotLanceVideoDataset(root="./aloha_cups_open_lance_video")
For the JPEG layout, use lerobot-convert-to-lance and LeRobotLanceDataset instead. See the docs for the full CLI / API reference.
Benchmark
Realistic training read pattern (delta_timestamps, 8 frames / sample, batch 32, num_workers 4, CPU decode, H100):
| dataset | format | size MB | delta_ts fps | speedup |
|---|---|---|---|---|
| pusht (96×96, 1-cam) | upstream parquet+mp4 | 7.3 | 750 | 1.00× |
convert_to_lance (JPEG-95) |
60.0 | 3510 | 4.68× | |
convert_to_lance --jpeg-quality=100 --jpeg-subsampling=0 |
105.6 | 2909 | 3.88× | |
convert_to_lance_video |
8.0 | 2853 | 3.80× | |
| ALOHA cups_open (480×640, 4-cam) | upstream parquet+mp4 | 485.6 | 18.7 | 1.00× |
convert_to_lance (JPEG-95) |
3626.0 | 46.0 | 2.46× | |
convert_to_lance --jpeg-quality=100 --jpeg-subsampling=0 |
8735.4 | 32.5 | 1.74× | |
convert_to_lance_video |
487.4 | 45.6 | 2.44× | |
| Koch lego (480×640, 2-cam) | upstream parquet+mp4 | 2014.1 | 26.6 | 1.00× |
convert_to_lance (JPEG-95) |
8541.0 | 70.8 | 2.66× | |
convert_to_lance --jpeg-quality=100 --jpeg-subsampling=0 |
17 335.3 | 49.0 | 1.84× | |
convert_to_lance_video |
2015.9 | 53.8 | 2.02× |
Reproducible via examples/benchmark_formats.py.
Training parity
convert_to_lance_video trains a DiffusionPolicy on pusht to 68.4 % gym-pusht success (seed=42, 500 rollouts) — matches the head-to-head upstream parquet+mp4 result (68.0 %) and the published lerobot/diffusion_pusht (65.4 %).
Full numbers (pusht env-eval + ALOHA cups_open held-out MSE across all storage modes) in docs/benchmarks.md. Reproducers: examples/train_and_eval_lance.py and examples/aloha_loader_parity.py.
Cloud / Hub
Both readers accept s3://, gs://, hf://datasets/..., hf://buckets/... URIs and pick up credentials from the usual env vars (AWS_*, GOOGLE_APPLICATION_CREDENTIALS, HF_TOKEN). Lance does byte-range fetches — no full-dataset download.
Pre-converted reference datasets you can paste directly:
from lerobot_lancedb import LeRobotLanceDataset, LeRobotLanceVideoDataset
LeRobotLanceDataset(repo_id="lance-format/pusht-lerobot-lancedb") # 60 MB JPEG layout
LeRobotLanceVideoDataset(repo_id="lance-format/pusht-lerobot-lancedb-video") # 8 MB video-blob layout
License
Apache 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lerobot_lancedb-0.1.0.tar.gz.
File metadata
- Download URL: lerobot_lancedb-0.1.0.tar.gz
- Upload date:
- Size: 43.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b1b121bdae00d95099fac85ace62e43129af537cb70c9c8bff97b33e828ba772
|
|
| MD5 |
2045a3aca6654489c7c1fd3770f5d520
|
|
| BLAKE2b-256 |
91f3d1a651bb6e31423aa94e77ac637864c845ad5ca7a2cb67de08521d11771f
|
File details
Details for the file lerobot_lancedb-0.1.0-py3-none-any.whl.
File metadata
- Download URL: lerobot_lancedb-0.1.0-py3-none-any.whl
- Upload date:
- Size: 43.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6d24c627462777a3087875db2825e96a0de53624296e51a2967eae6b083a7b3
|
|
| MD5 |
d715a8b85e6a3285ddb2f9108c76fd38
|
|
| BLAKE2b-256 |
6b5b5ea9c8a3eb7336b7484a8db3437248a1a63a90be5fe9ae341157aab1499b
|