Dataset quality diagnostics for LeRobot v3 datasets
Project description
lerobot-doctor
Dataset quality diagnostics for LeRobot v3 datasets.
Catches issues that waste debugging time: corrupted timestamps, dropped frames, frozen actions, clipped values, metadata inconsistencies, video problems, stuck actuators, and more.
Works on local datasets and HuggingFace Hub datasets. No dependency on the lerobot package.
Install
pip install lerobot-doctor
Or from source:
git clone https://github.com/jashshah999/lerobot-doctor.git
cd lerobot-doctor
pip install .
Usage
# Check a local dataset
lerobot-doctor /path/to/dataset
# Check a HuggingFace dataset
lerobot-doctor lerobot/pusht
# Run specific checks only
lerobot-doctor /path/to/dataset --checks metadata,temporal,actions
# JSON output (for CI/CD integration)
lerobot-doctor /path/to/dataset --json
# Limit episodes checked (for large datasets)
lerobot-doctor /path/to/dataset --max-episodes 10
# Verbose (show PASS details)
lerobot-doctor /path/to/dataset -v
Checks (10 total)
| Check | What it catches |
|---|---|
| metadata | Missing/invalid info.json, wrong episode/frame counts, missing data files, tasks.parquet issues |
| temporal | Non-monotonic timestamps, dropped frames, inconsistent fps, broken frame/episode indices |
| actions | NaN/Inf values, clipped actions, frozen (stuck) actions, sudden action jumps |
| videos | Missing video files, decode errors, fps/resolution mismatches, frame count mismatches |
| statistics | NaN/Inf in observations, zero-variance features, extreme outliers, stats.json drift |
| episodes | Short/empty episodes, length distribution, policy window compatibility (ACT/Diffusion), metadata-data length mismatches, task imbalance |
| consistency | Cross-episode feature schema changes (missing columns, dtype/shape mismatches), within-episode shape inconsistencies |
| training | Policy compatibility (ACT/Diffusion/VLA), normalization readiness (zero-std dims), action space sanity, delta_timestamps compatibility |
| anomalies | Stuck actuators (>80% static), near-duplicate episodes, distribution shift across dataset, broken sensors (constant observations) |
| portability | Absolute paths, symlinks, large files, HF Hub compatibility, non-standard files |
Exit codes
0: All checks PASS or WARN1: At least one check FAIL
Example output
lerobot-doctor v0.1.0 -- Dataset Quality Report
Dataset: lerobot/pusht (v3.0)
Episodes: 206 | Frames: 25,650 | FPS: 10
[PASS] Metadata & Format Compliance
[PASS] Temporal Consistency
[WARN] Action Quality
- action: Episode 2 has 1 sudden large action jumps (>5 std)
- action: Episode 3 has 2 sudden large action jumps (>5 std)
[WARN] Video Integrity
- Skipping video decode checks for remote dataset
[WARN] Data Distribution
- next.success: zero variance (constant value 0.0000)
[WARN] Episode Health
- 2/10 episodes shorter than chunk_size=100 (used by ACT/Diffusion policies)
[PASS] Feature Consistency
[PASS] Training Readiness
[WARN] Anomaly Detection
- next.success: ALL 1 dimensions constant across ALL episodes
Summary: 5 PASS | 5 WARN
Suggested fixes:
Check sensor connections -- constant readings indicate hardware issues
Filter episodes shorter than your policy's chunk_size before training
JSON output
Use --json for CI integration. Exit code 1 on any FAIL.
lerobot-doctor /path/to/dataset --json | jq '.overall_severity'
Development
git clone https://github.com/jashshah999/lerobot-doctor.git
cd lerobot-doctor
pip install -e ".[dev]"
PYTHONPATH=src pytest tests/ -v
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lerobot_doctor-0.1.0.tar.gz.
File metadata
- Download URL: lerobot_doctor-0.1.0.tar.gz
- Upload date:
- Size: 27.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3bc98382b71f6d1486ab78f428212b40cffb9c970ed5a387215e0cb8f0d21736
|
|
| MD5 |
065f1047d619bd52f75807e941eed345
|
|
| BLAKE2b-256 |
9d80b78a5a01f9cce560dd38ce6ded0107c2c345821e2a813f3ceb16b192dc78
|
File details
Details for the file lerobot_doctor-0.1.0-py3-none-any.whl.
File metadata
- Download URL: lerobot_doctor-0.1.0-py3-none-any.whl
- Upload date:
- Size: 29.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97dabdf19afb8661103240d7654850eff122572c3b5fbed82d0ca71c0ee77e83
|
|
| MD5 |
9bbaea117006a1ffc6ee87a8d1a2ad6e
|
|
| BLAKE2b-256 |
f6ed795a2acf7c719f1d3e05357d2bfdd75d374b274299fda1b924c77172c961
|