Skip to main content

Version control and lineage tracking for robot training episode data

Project description

EpisodeVault: find out exactly why your robot model regressed.

PyPI Python 3.10+ License: MIT LeRobot v3


The problem

Every robotics ML engineer has retrained a model and watched performance drop with no clear cause. DVC tracks which files changed. MLflow tracks which hyperparameters ran. Nobody tracks what changed at the episode level, which tasks dropped out, which quality metrics shifted, which task distribution moved between v1 and v2 of your dataset.

EpisodeVault fills that gap.


What EpisodeVault does

Run episodevault diff v1.0 v2.0 and get this:

Dataset diff: v1.0 → v2.0
────────────────────────────────────────────────────
Episodes added:    +0
Episodes removed:  -7

Distribution shift:
  factory_pick                     2 → 6  ↑ 200%  ⚠️
  kitchen_grasp                    4 → 1  ↓ 75%  ⚠️

Quality metrics:
  avg episode length:    3.7s → 3.0s  ↓
  success_rate:          0.88 → 0.38  ↓
  camera_sync_score:     1.00 → 1.00  →

Regression candidates (ranked by magnitude; correlate with your eval):
  - 'kitchen_grasp' episodes dropped 75% (4 → 1). Restore from prior
    version if this task is in your eval benchmark.
  - Success rate fell 50% (0.88 → 0.38). New episodes may contain failed
    demonstrations. Run score_lerobot_episodes to identify low-quality additions.

Install

pip install episodevault

Requires Python 3.10+. Key dependencies: pyarrow, pandas, duckdb, click, rich, pydantic.


Quickstart

# Start tracking a local LeRobot dataset
episodevault track ./my_dataset

# Snapshot the current state with a message
episodevault commit -m "added 500 kitchen episodes"

# Compare two snapshots
episodevault diff v1.0 v2.0

# Find what dataset a model was trained on and diff against the prior version
episodevault blame model_v3

track initializes a .episodevault/ store inside your dataset directory. commit snapshots the episode manifest (not raw sensor data -- fast). diff computes task distribution shift and quality deltas between any two versions. blame looks up which dataset version trained a given model and diffs it against the version before.


Python API

Log a training run from your training script so blame can trace it back:

import episodevault as ev

ev.log_training_run(
    model_version="model_v3",
    dataset_version="v2.0",
    framework="lerobot"
)

One call. That's all blame needs.


Compatibility

Tested against real HuggingFace LeRobot v3 datasets:

Dataset Robot Format Episodes Parse time Status
aloha_pencil aloha LeRobot v3 25 0.33s OK
aloha_shrimp aloha LeRobot v3 18 0.38s OK
so100_stacking so100 LeRobot v3 56 0.65s OK
aloha_cabinet aloha LeRobot v3 85 2.65s OK

Parse time is for the episode manifest only. Raw sensor data (video, joint trajectories) is never loaded.


How it works

  • Parses episode manifests (meta/episodes/, meta/tasks.parquet, meta/info.json) without loading raw sensor data -- sub-second parse regardless of frame count or video size.
  • Snapshots manifests into a version store on every commit -- diff and time travel are built in from the start.
  • Diff engine computes task distribution shift and quality deltas between any two snapshots -- regression candidates are ranked by a normalized severity score and the top few are surfaced, not asserted as proven causes.

Credits


License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

episodevault-0.1.1.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

episodevault-0.1.1-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file episodevault-0.1.1.tar.gz.

File metadata

  • Download URL: episodevault-0.1.1.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for episodevault-0.1.1.tar.gz
Algorithm Hash digest
SHA256 17ac2bc9e810f3325d4bfca330c418b9f3fad4798db552853a42290071a7f5a1
MD5 d122c4ec90bfc5aedd926cfe1984b014
BLAKE2b-256 fbfbfc878f42de06cd817df95de8aaa59f1cc3fcd046e73692eec11ae6dfdfcd

See more details on using hashes here.

File details

Details for the file episodevault-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: episodevault-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 18.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for episodevault-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9de448107f4acaaff0bf589f1ca889b576fb902e2611843cecb04e3875c8aed0
MD5 0ca8b22fea08713847d9225d7faaf06e
BLAKE2b-256 f5e5c2185ee69b669547bcd418077f791681b7932f47ea4fe5729811549fbaff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page