Skip to main content

Dataset surgery utils (e.g., merge, episode remove) for LeRobotDataset v3

Project description

🔬 lerobot-surgery

Python 3.8+

⚠️ IMPORTANT: LeRobotDataset v3.0 Only It may work with future v3.x versions, but the code has only been tested on v3.0 so far. For v2.1 datasets, migrate to v3.0 first.

Dataset surgery utils (e.g., merge, episode remove) for LeRobotDataset v3.


✨ Features

  • 🔀 Merge multiple datasets seamlessly into a single dataset
  • ✂️ Remove specific episodes with automatic re-indexing
  • 🖥️ CLI & Python API for flexible usage

📋 Requirements

  • Python: 3.8 or higher
  • LeRobot: LeRobotDataset version 3.0
  • Dependencies: Automatically installed with the package

🚀 Installation

From PyPI

pip install lerobot-surgery

From source

git clone https://github.com/spiglerg/lerobot-surgery.git
cd lerobot-surgery
pip install -e .

Development installation

git clone https://github.com/spiglerg/lerobot-surgery.git
cd lerobot-surgery
pip install -e ".[dev]"

Verify installation:

lerobot-surgery --version
# Output: lerobot-surgery, version 0.1.0

lerobot-surgery --help
# Shows CLI help

📖 Quick Start

Command Line Interface

# Merge multiple datasets
lerobot-surgery merge dataset_a/ dataset_b/ dataset_c/ -o merged_dataset/

# Remove specific episodes (by index)
lerobot-surgery remove my_dataset/ 0 5 10 -o filtered_dataset/

# Display dataset information
lerobot-surgery info my_dataset/

# Show help
lerobot-surgery --help

Python API

from lerobot_surgery import merge_datasets, remove_episodes

# Merge datasets
merged = merge_datasets(
    source_paths=["dataset_a/", "dataset_b/", "dataset_c/"],
    output_path="merged_dataset/",
)
print(f"Merged {merged.num_episodes} episodes, {merged.num_frames} frames")

# Remove episodes
filtered = remove_episodes(
    dataset_path="my_dataset/",
    episode_indices=[0, 5, 10],  # Episodes to remove
    output_path="filtered_dataset/",
)
print(f"Remaining: {filtered.num_episodes} episodes")

📚 Detailed Usage

Merging Datasets

Combine multiple LeRobot datasets with identical structures (features, fps) into a single consolidated dataset.

Requirements:

  • All datasets must have the same FPS
  • All datasets must have identical features
  • Datasets must be in v3.0 format

Episode Re-indexing: Episodes are automatically re-indexed sequentially starting from 0 in the merged dataset.

Removing Episodes

Remove specific episodes from a dataset while maintaining data integrity and re-indexing remaining episodes.

Episode Re-indexing: Remaining episodes are automatically re-indexed sequentially starting from 0.

🧪 Testing

Run the test suite:

# Install development dependencies
pip install -e ".[dev]"

# Run all tests
pytest

# Run with coverage
pytest --cov=lerobot_surgery --cov-report=html

# Run specific test file
pytest tests/test_merge.py -v

🙏 Acknowledgments

Built for the LeRobot framework.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lerobot_surgery-0.1.0.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lerobot_surgery-0.1.0-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file lerobot_surgery-0.1.0.tar.gz.

File metadata

  • Download URL: lerobot_surgery-0.1.0.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for lerobot_surgery-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4937e9a57c57bf41bb3573f77902fd5cf09351a9da3b279a3525165e47d40701
MD5 88754befcd5f98acf60bb3910d44c9a5
BLAKE2b-256 e5623a6e7789a2ab9822eb920b54374a59f0c1aa9c4a3935c2d89d4219a04f15

See more details on using hashes here.

File details

Details for the file lerobot_surgery-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for lerobot_surgery-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 15121eeee3d1f0a2abc84dc58e039893baf1e2a0b5e1482ddd439b7c05ef298b
MD5 45cdd5668e6184aac71985707e28e9f6
BLAKE2b-256 8d190e932a6e8491d1de51411724089261ad407719cf4654f27fa1a82b21ede0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page