Score robot demonstrations by motion quality
Project description
democlean
Quality scoring for robot demonstration datasets.
Why
Robot learning datasets contain bad demonstrations—jerky movements, hesitation, inconsistent timing. Training on these hurts performance. Manual review doesn't scale.
democlean scores episodes by motion quality using mutual information (MI) between states and actions.
Install
pip install democlean
Usage
democlean analyze lerobot/pusht
Dataset lerobot/pusht
Episodes: 50 | Dims: 2→2
Distribution
████████████████████ High 30
██████████ Medium 15
█████ Low 5
Mean 2.55 (typical for human teleop)
Flagged (lowest MI)
ep 46 1.897
ep 6 1.984
Filtering
democlean analyze lerobot/pusht --keep 0.8 # keep top 80%
democlean analyze lerobot/pusht --min-mi 2.0 # drop below threshold
democlean analyze lerobot/pusht --keep 0.8 -r out.json
What MI Measures
MI quantifies how predictable actions are given states.
- High MI → smooth, purposeful motion
- Low MI → jerky, hesitant, inconsistent
MI measures how the robot moved, not what it achieved. Use task-specific metrics for success rates.
| MI | Interpretation |
|---|---|
| >3.0 | Very smooth |
| 2.0–3.0 | Typical human teleop |
| 1.0–2.0 | Moderate |
| <1.0 | Noisy/random |
When to Use
Good fit:
- Human teleoperation data
- 50+ episodes
- Quick triage before training
Not a good fit:
- Scripted simulation (already uniform)
- Multi-task datasets
- Need task success metrics
Limitations
- Length correlation — MI correlates with episode length (r≈0.8). Use
--normalize-lengthto adjust. - Not task success — Measures motion quality, not task completion.
- Sample size — Works best with 50+ episodes.
Python API
from democlean import DemoScorer
scorer = DemoScorer(k=3)
scores = scorer.score_dataset("lerobot/pusht")
keep = scorer.filter_top_k(scores, percentile=80)
CLI Options
| Flag | Description |
|---|---|
--keep R |
Keep top R fraction (0–1) |
--top-k K |
Keep top K episodes |
--min-mi T |
Drop below threshold |
--normalize-length |
Adjust for episode length |
-k N |
KSG neighbors (default: 3) |
--max-dim D |
PCA reduce dimensions |
--ci |
Bootstrap confidence intervals |
-r FILE |
Save JSON report |
-q |
Quiet mode |
Credits
Inspired by DemInf (Hejna et al., RSS 2025).
Complements score_lerobot_episodes for visual quality.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file democlean-0.1.4.tar.gz.
File metadata
- Download URL: democlean-0.1.4.tar.gz
- Upload date:
- Size: 16.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
342531d152b0c32f88a88781ade6eb811e86f71c66023827d980b0c42458b21a
|
|
| MD5 |
de8f657ba4502b3565ac5b37a98e0ec5
|
|
| BLAKE2b-256 |
a204cf8f82692d27e001069656ef9ca977d86747e13d180095eed23afdfda5fc
|
File details
Details for the file democlean-0.1.4-py3-none-any.whl.
File metadata
- Download URL: democlean-0.1.4-py3-none-any.whl
- Upload date:
- Size: 12.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd5f03d8f306b5c1108db26f6866de14360fec120af0cb57716f1b01f21f36d3
|
|
| MD5 |
caa5c8276edc82264f530f5d0a5e84b3
|
|
| BLAKE2b-256 |
c08be3131e4affba087f62c9ad33c1650056808904d5834e141d6df6f14a3ba7
|