Version control and lineage tracking for robot training episode data

Project description

EpisodeVault: find out exactly why your robot model regressed.

The problem

Every robotics ML engineer has retrained a model and watched performance drop with no clear cause. DVC tracks which files changed. MLflow tracks which hyperparameters ran. Nobody tracks what changed at the episode level, which tasks dropped out, which quality metrics shifted, which task distribution moved between v1 and v2 of your dataset.

EpisodeVault fills that gap.

What EpisodeVault does

Run episodevault diff v1.0 v2.0 and get this:

Dataset diff: v1.0 → v2.0
────────────────────────────────────────────────────
Episodes added:    +0
Episodes removed:  -7

Distribution shift:
  factory_pick                     2 → 6  ↑ 200%  ⚠️
  kitchen_grasp                    4 → 1  ↓ 75%  ⚠️

Quality metrics:
  avg episode length:    3.7s → 3.0s  ↓
  success_rate:          0.88 → 0.38  ↓
  camera_sync_score:     1.00 → 1.00  →

Regression candidates (ranked by magnitude; correlate with your eval):
  - 'kitchen_grasp' episodes dropped 75% (4 → 1). Restore from prior
    version if this task is in your eval benchmark.
  - Success rate fell 50% (0.88 → 0.38). New episodes may contain failed
    demonstrations. Run score_lerobot_episodes to identify low-quality additions.

Install

pip install episodevault

Requires Python 3.10+. Key dependencies: pyarrow, pandas, duckdb, click, rich, pydantic.

Quickstart

# Start tracking a local LeRobot dataset
episodevault track ./my_dataset

# Snapshot the current state with a message
episodevault commit -m "added 500 kitchen episodes"

# Compare two snapshots
episodevault diff v1.0 v2.0

# Find what dataset a model was trained on and diff against the prior version
episodevault blame model_v3

track initializes a .episodevault/ store inside your dataset directory. commit snapshots the episode manifest (not raw sensor data -- fast). diff computes task distribution shift and quality deltas between any two versions. blame looks up which dataset version trained a given model and diffs it against the version before.

Python API

Log a training run from your training script so blame can trace it back:

import episodevault as ev

ev.log_training_run(
    model_version="model_v3",
    dataset_version="v2.0",
    framework="lerobot"
)

One call. That's all blame needs.

Compatibility

Tested against real HuggingFace LeRobot v3 datasets:

Dataset	Robot	Format	Episodes	Parse time	Status
aloha_pencil	aloha	LeRobot v3	25	0.33s	OK
aloha_shrimp	aloha	LeRobot v3	18	0.38s	OK
so100_stacking	so100	LeRobot v3	56	0.65s	OK
aloha_cabinet	aloha	LeRobot v3	85	2.65s	OK

Parse time is for the episode manifest only. Raw sensor data (video, joint trajectories) is never loaded.

How it works

Parses episode manifests (meta/episodes/, meta/tasks.parquet, meta/info.json) without loading raw sensor data -- sub-second parse regardless of frame count or video size.
Snapshots manifests into a version store on every commit -- diff and time travel are built in from the start.
Diff engine computes task distribution shift and quality deltas between any two snapshots -- regression candidates are ranked by a normalized severity score and the top few are surfaced, not asserted as proven causes.

Credits

HuggingFace LeRobot team for the v3 dataset format that EpisodeVault parses.
Berkeley AutoLab (Kaiyuan Chen et al.) for Robo-DM / fog_x, prior work on robot dataset management.
score_lerobot_episodes by RobotData for quality signal methodology.
Evidently AI for drift detection methodology that informed the distribution shift logic.

License

MIT. See LICENSE.

Project details

Release history Release notifications | RSS feed

0.2.0

Jun 12, 2026

This version

0.1.1

Jun 10, 2026

0.0.2

Jun 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

episodevault-0.1.1.tar.gz (17.4 kB view details)

Uploaded Jun 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

episodevault-0.1.1-py3-none-any.whl (18.8 kB view details)

Uploaded Jun 10, 2026 Python 3

File details

Details for the file episodevault-0.1.1.tar.gz.

File metadata

Download URL: episodevault-0.1.1.tar.gz
Upload date: Jun 10, 2026
Size: 17.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for episodevault-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`17ac2bc9e810f3325d4bfca330c418b9f3fad4798db552853a42290071a7f5a1`
MD5	`d122c4ec90bfc5aedd926cfe1984b014`
BLAKE2b-256	`fbfbfc878f42de06cd817df95de8aaa59f1cc3fcd046e73692eec11ae6dfdfcd`

See more details on using hashes here.

File details

Details for the file episodevault-0.1.1-py3-none-any.whl.

File metadata

Download URL: episodevault-0.1.1-py3-none-any.whl
Upload date: Jun 10, 2026
Size: 18.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for episodevault-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9de448107f4acaaff0bf589f1ca889b576fb902e2611843cecb04e3875c8aed0`
MD5	`0ca8b22fea08713847d9225d7faaf06e`
BLAKE2b-256	`f5e5c2185ee69b669547bcd418077f791681b7932f47ea4fe5729811549fbaff`

See more details on using hashes here.

episodevault 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

EpisodeVault: find out exactly why your robot model regressed.

The problem

What EpisodeVault does

Install

Quickstart

Python API

Compatibility

How it works

Credits

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes