Score robot demonstrations by motion quality
Project description
democlean
Quality scoring for robot demonstration datasets.
The Problem
Robot learning datasets often contain bad demonstrations—jerky movements, hesitation, inconsistent timing. Training on these hurts policy performance. Manual review doesn't scale.
democlean automatically scores episodes by motion quality using mutual information (MI) between states and actions. Episodes with smooth, purposeful motion score high. Jerky, inconsistent episodes score low.
Install
pip install democlean
Quick Start
democlean analyze lerobot/pusht
Output:
Dataset lerobot/pusht
Episodes: 50 | Dims: 2→2
Distribution
████████████████████ High 30
██████████ Medium 15
█████ Low 5
Mean 2.55 (typical for human teleop)
Std 0.27
Flagged (lowest MI)
ep 46 1.897
ep 6 1.984
What MI Measures
MI quantifies how predictable actions are given states.
High MI → actions are temporally smooth, low jerk, purposeful Low MI → actions are jerky, hesitant, inconsistent timing
This is useful because motion quality correlates with demonstration quality. But MI is not a direct measure of task success—it measures how the robot moved, not what it achieved.
Score Ranges
| MI | Interpretation |
|---|---|
| >3.0 | Very smooth |
| 2.0–3.0 | Typical human teleop |
| 1.0–2.0 | Moderate |
| <1.0 | Noisy/random |
Filtering Episodes
Keep top 80%:
democlean analyze lerobot/pusht --keep 0.8
Drop below threshold:
democlean analyze lerobot/pusht --min-mi 2.0
Save report:
democlean analyze lerobot/pusht --keep 0.8 -r report.json
Limitations
-
Length correlation: MI correlates with episode length (r≈0.8). Longer episodes score higher. Use
--normalize-lengthto adjust. -
Not task success: MI measures motion smoothness, not whether the task was completed. Use task-specific metrics for that.
-
Sample size: Works best with 50+ episodes. Small datasets may not show meaningful variation.
Python API
from democlean import DemoScorer
scorer = DemoScorer(k=3)
scores = scorer.score_dataset("lerobot/pusht")
# Filter
keep = scorer.filter_top_k(scores, percentile=80)
print(f"Keep episodes: {keep}")
CLI Reference
democlean analyze <dataset> [options]
Options:
--keep R Keep top R fraction (0-1)
--top-k K Keep top K episodes
--min-mi T Drop episodes below MI threshold
--normalize-length Adjust for episode length
-k N KSG neighbors (default: 3)
--max-dim D PCA reduce to D dimensions
--ci Bootstrap confidence intervals
-r FILE Save JSON report
-q Quiet mode (JSON output only)
--explain Show interpretation guide
Credits
Built on the KSG mutual information estimator (Kraskov et al., 2004).
Complements score_lerobot_episodes which catches visual issues—democlean catches motion issues.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file democlean-0.1.1.tar.gz.
File metadata
- Download URL: democlean-0.1.1.tar.gz
- Upload date:
- Size: 16.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81fbced0edf5d0b250b3fc155124bcf62746d70bd682f21d2cd4c04727aeb8b5
|
|
| MD5 |
90d0d7015ba05518a99238403deb5232
|
|
| BLAKE2b-256 |
ac3fe75950407fff5312bd1420e38a6bdec1e831d5889372451c2fa7dde8facb
|
File details
Details for the file democlean-0.1.1-py3-none-any.whl.
File metadata
- Download URL: democlean-0.1.1-py3-none-any.whl
- Upload date:
- Size: 12.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
510e8713740e6843952a6189070b66e26352aab77c2639abe45207fc169b009f
|
|
| MD5 |
27d368fa5939fc953f5273983e9ae16a
|
|
| BLAKE2b-256 |
5d77071fb49314419e6e68109a37f0291183707a81581e50f48befe22b9aab7b
|