Skip to main content

MCAP Data Loader

Project description

MCAP Data Loader

PyPI Python License

A Python library for loading and processing MCAP data files in a way that is more suitable for machine learning and robotics training pipelines.

Features

  • Dataset-style APIs for iterating MCAP data as episodes/samples
  • Built-in statistics utilities (dataset-level and episode-level)
  • Convenient access to topics and attachments
  • Integration CLI for training with LeRobot using MCAP as the dataset backend

Installation

Install from PyPI:

pip install mcap-data-loader

Or install from source:

git clone https://github.com/OpenGHz/MCAP-DataLoader.git --depth 1
cd MCAP-DataLoader
pip install -e .

Quickstart (basic usage)

A basic example showing how to load MCAP files from a directory, inspect statistics, and iterate through episodes/samples:

from mcap_data_loader.datasets.mcap_dataset import (
    McapFlatBuffersEpisodeDataset,
    McapFlatBuffersEpisodeDatasetConfig,
)
from pprint import pprint

dataset = McapFlatBuffersEpisodeDataset(
    McapFlatBuffersEpisodeDatasetConfig(
        data_root="data/example",
        # keys typically include topic names and optional special fields (e.g. "log_stamps")
        keys=["/follow/arm/joint_state/position", "log_stamps"],
    )
)

print(f"All files: {dataset.all_files}")
print(f"Dataset length: {len(dataset)}")

print("Dataset statistics:")
pprint(dataset.statistics())

for episode in dataset:
    print(f"Current file: {episode.config.data_root}")

    for sample in episode:
        print(f"Sample keys: {sample.keys()}")
        break

    print(f"Episode length: {len(episode)}")
    print(f"All topics: {episode.reader.all_topic_names()}")
    print(f"All attachments: {episode.reader.all_attachment_names()}")

    print("Episode statistics:")
    pprint(episode.statistics())
    print("----" * 10)

More examples and detailed usage can be found in the examples directory.

Integration with LeRobot training

MCAP Data Loader provides a CLI to train LeRobot models using MCAP data files. This allows you to use MCAP datasets directly as the training data source for LeRobot, without needing to convert them into a different format.

You should have LeRobot installed in your environment to use this feature. You can install it from PyPI (0.4.3 is tested):

pip install lerobot

Train with an MCAP dataset

Run:

mcap_lerobot_train -c configs/config.yaml

Recommended: place your config file under a configs/ directory in your current working directory.

Configuration reference

The top level is the standard LeRobot configuration, with an additional mcap section for MCAP dataset loading settings:

batch_size: 2
num_workers: 1
policy:
  type: act
  push_to_hub: false
  chunk_size: 2
  n_action_steps: 2

dataset:
  root: data
  repo_id: example
  streaming: true

mcap:
  states:
    - /follow/arm/joint_state/position
    - /follow/eef/joint_state/position
  actions:
    - /lead/arm/pose/position
    - /lead/arm/pose/orientation
  images:
    - /env_camera/color/image_raw

The lists of topics specified by states and actions will be loaded and concatenated to form the observation.state and action required by lerobot, serving as low-dimensional state and action inputs in the training data. Meanwhile, images will be appended to the observation.images field, using the first part of the name (e.g., env_camera in the example above) as a suffix for image input, such as observation.images.env_camera, for use during training.

Organizing processed data

For processed data, MCAP is better suited to creating a new file that contains only the processed topics, rather than appending processed data back into the original file. For an example of generating processed topics, see Data Processing.

During training, you can specify both the original dataset directory and the processed dataset directory at the same time. MCAP Data Loader will merge them automatically at runtime, so they can be consumed as if they were read from a single dataset.

A typical configuration looks like this:

dataset:
  root: data
  repo_id:
    - mujoco
    - mujoco_processed
  streaming: true

Notes:

  • dataset.root and dataset.repo_id are reused to specify the MCAP dataset root directory and dataset name.
  • Command-line overrides compatible with LeRobot are supported and take the highest priority (they override values in the config file). For example:
    mcap_lerobot_train -c configs/config.yaml --dataset.repo_id=example_task
    

Train with LeRobot’s original dataset format

If you want to use LeRobot’s original data format (while still using this CLI), add --ori:

mcap_lerobot_train -c configs/ori.yaml --ori

Make sure the dataset path in your config points to the actual LeRobot dataset location.

Help / supported CLI args

Show supported parameters:

mcap_lerobot_train -h

If the output is long, redirect to a file:

mcap_lerobot_train -h > lerobot_help.txt

Data Processing

For pose-topic post-processing, see docs/poses.md.

The script mcap_data_loader/scripts/data_process/poses.py can be used to generate:

  • relative pose topics with _rela suffix
  • rotation_6d topics converted from quaternion pose topics

Example:

python mcap_data_loader/scripts/data_process/poses.py \
  data/example \
  --keys /follow/arm/pose/position /follow/arm/pose/orientation \
  --targets rela rotation_6d

License

See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcap_data_loader-0.2.7.tar.gz (151.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcap_data_loader-0.2.7-py3-none-any.whl (180.9 kB view details)

Uploaded Python 3

File details

Details for the file mcap_data_loader-0.2.7.tar.gz.

File metadata

  • Download URL: mcap_data_loader-0.2.7.tar.gz
  • Upload date:
  • Size: 151.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for mcap_data_loader-0.2.7.tar.gz
Algorithm Hash digest
SHA256 d26f9f16fe7a3c95b17d3156ad62f3d29f5d47f859e29ecc09194db0847b36ca
MD5 45d9daa131f96646f2b58d6729aed456
BLAKE2b-256 01ebe79285743f405d19ee81c399a82f03d5bf77f32e8a44ddaaab22cb4bdd1d

See more details on using hashes here.

File details

Details for the file mcap_data_loader-0.2.7-py3-none-any.whl.

File metadata

File hashes

Hashes for mcap_data_loader-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 d7ada36856aad7830f0edc6c5b6db066c596a75b0424654eb28c2d990ac56627
MD5 148c99fbc7e0288ea7db385fb9aee4e9
BLAKE2b-256 f41cd292238f370c02bd5d3551f5e939bbf4ff333bc1c795cd45c679b9846461

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page