Dataset surgery utils (e.g., merge, episode remove) for LeRobotDataset v3
Project description
🔬 lerobot-surgery
⚠️ IMPORTANT: LeRobotDataset v3.0 Only It may work with future v3.x versions, but the code has only been tested on v3.0 so far. For v2.1 datasets, migrate to v3.0 first.
Dataset surgery utils (e.g., merge, episode remove) for LeRobotDataset v3.
✨ Features
- 🔀 Merge multiple datasets seamlessly into a single dataset
- ✂️ Remove specific episodes with automatic re-indexing
- 🖥️ CLI & Python API for flexible usage
📋 Requirements
- Python: 3.8 or higher
- LeRobot: LeRobotDataset version 3.0
- Dependencies: Automatically installed with the package
🚀 Installation
From PyPI
pip install lerobot-surgery
From source
git clone https://github.com/spiglerg/lerobot-surgery.git
cd lerobot-surgery
pip install -e .
Development installation
git clone https://github.com/spiglerg/lerobot-surgery.git
cd lerobot-surgery
pip install -e ".[dev]"
Verify installation:
lerobot-surgery --version
# Output: lerobot-surgery, version 0.1.0
lerobot-surgery --help
# Shows CLI help
📖 Quick Start
Command Line Interface
# Merge multiple datasets
lerobot-surgery merge dataset_a/ dataset_b/ dataset_c/ -o merged_dataset/
# Remove specific episodes (by index)
lerobot-surgery remove my_dataset/ 0 5 10 -o filtered_dataset/
# Display dataset information
lerobot-surgery info my_dataset/
# Show help
lerobot-surgery --help
Python API
from lerobot_surgery import merge_datasets, remove_episodes
# Merge datasets
merged = merge_datasets(
source_paths=["dataset_a/", "dataset_b/", "dataset_c/"],
output_path="merged_dataset/",
)
print(f"Merged {merged.num_episodes} episodes, {merged.num_frames} frames")
# Remove episodes
filtered = remove_episodes(
dataset_path="my_dataset/",
episode_indices=[0, 5, 10], # Episodes to remove
output_path="filtered_dataset/",
)
print(f"Remaining: {filtered.num_episodes} episodes")
📚 Detailed Usage
Merging Datasets
Combine multiple LeRobot datasets with identical structures (features, fps) into a single consolidated dataset.
Requirements:
- All datasets must have the same FPS
- All datasets must have identical features
- Datasets must be in v3.0 format
Episode Re-indexing: Episodes are automatically re-indexed sequentially starting from 0 in the merged dataset.
Removing Episodes
Remove specific episodes from a dataset while maintaining data integrity and re-indexing remaining episodes.
Episode Re-indexing: Remaining episodes are automatically re-indexed sequentially starting from 0.
🧪 Testing
Run the test suite:
# Install development dependencies
pip install -e ".[dev]"
# Run all tests
pytest
# Run with coverage
pytest --cov=lerobot_surgery --cov-report=html
# Run specific test file
pytest tests/test_merge.py -v
🙏 Acknowledgments
Built for the LeRobot framework.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lerobot_surgery-0.1.0.tar.gz.
File metadata
- Download URL: lerobot_surgery-0.1.0.tar.gz
- Upload date:
- Size: 11.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4937e9a57c57bf41bb3573f77902fd5cf09351a9da3b279a3525165e47d40701
|
|
| MD5 |
88754befcd5f98acf60bb3910d44c9a5
|
|
| BLAKE2b-256 |
e5623a6e7789a2ab9822eb920b54374a59f0c1aa9c4a3935c2d89d4219a04f15
|
File details
Details for the file lerobot_surgery-0.1.0-py3-none-any.whl.
File metadata
- Download URL: lerobot_surgery-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15121eeee3d1f0a2abc84dc58e039893baf1e2a0b5e1482ddd439b7c05ef298b
|
|
| MD5 |
45cdd5668e6184aac71985707e28e9f6
|
|
| BLAKE2b-256 |
8d190e932a6e8491d1de51411724089261ad407719cf4654f27fa1a82b21ede0
|