Early/experimental LeRobot dataloader for Apple Silicon & Linux — FFmpeg decode, NumPy/MLX/PyTorch output (hardware decode, zero-copy MLX & parallel prefetch in progress)

These details have not been verified by PyPI

Project links

Project description

PyRoboFrames

A robotics data platform for training robots from recorded demonstrations — ingest, query, and a fast dataloader, built for Apple Silicon and Linux.

⚠️ Early / experimental (0.x, expect API changes). What works today: the LeRobot dataloader (state/action + camera frames, temporal windows, off-GIL prefetch, NumPy / MLX / PyTorch / JAX output); ingest from MCAP (JSON/protobuf/CDR) and ROS 2 .db3 bags; a time-indexed Robotics DataFrame (slice + as-of align + resample); and LeRobot write-back. Still in progress: the Apple-Silicon zero-copy-MLX fast path and the NVIDIA CUDA/NVDEC path (built feature-gated, verified on a GPU box). See What works today.

What is this, in plain terms?

Modern robots are increasingly trained the way large language models are: you record lots of demonstrations (a robot arm doing a task, teleoperated or scripted), then train a neural network to imitate them. Each demonstration is mostly camera video (often several cameras) plus sensor readings (joint positions, the actions taken).

When you train on that data, the computer has to constantly pull frames out of the videos and feed them to the model. This step is slow — so slow that the expensive GPU often sits idle waiting for video to be decoded. It's the single biggest bottleneck in robot-learning training pipelines.

PyRoboFrames is the piece that feeds that data to your training loop — and is being built to make it fast. It reads your robot dataset, decodes the video, and hands batches straight to your training loop as NumPy, MLX, PyTorch, or JAX arrays — with a focus on Apple Silicon Macs, where the usual CUDA-centric tools serve you poorly.

It's also growing into a small data platform: convert raw robot logs (MCAP, ROS 2 bags) into columnar Parquet, work with them through a time-indexed Robotics DataFrame (slice, time-align, resample), and write datasets back out in LeRobot v3.0 format.

Honest status on speed: decode today is FFmpeg (the Apple Media-Engine hardware path is still in progress). The off-GIL prefetch pipeline works (num_workers=): on synthetic data it scales the FFmpeg camera-decode epoch ~2.7× with 4 workers vs synchronous (benches/throughput.py) — a relative sync-vs-prefetch signal on one machine, not yet an absolute benchmark vs other libraries. See What works today.

When would I use it?

You're training (or fine-tuning) a robot policy / VLA model from demonstration data.
Your dataset is in the LeRobot format — the open standard from Hugging Face's LeRobot project, now used by tens of thousands of shared robot datasets. (Support for other formats is on the roadmap.)
Your data loading is slow, or you're developing on a Mac and the usual CUDA-centric tools don't serve you well.

Why it's different

Apple Silicon first. MLX (and PyTorch/JAX) output works today, and one script runs unchanged across Mac and CPU (device="auto"). The headline goal — decoding on the Mac's hardware video engine (VideoToolbox) straight into MLX with zero copies — is in progress; no other robot dataloader even targets it.
More than a loader. Ingest MCAP / ROS 2 bags → columnar Parquet, query a time-indexed Robotics DataFrame (as-of align + resample for multi-sensor fusion), and write back LeRobot datasets — the data layer most robot-learning stacks lack.
Rust core, simple Python. The engine is Rust (native speed, hardware access, off-GIL prefetch); you just pip install and import it.
Runs on Linux too, with an NVIDIA CUDA/NVDEC decode path built feature-gated (functional sign-off on a GPU box).

Status: early (0.1.x — expect API changes)

Works today on any LeRobotDataset v3.0: the dataloader (state/action and camera frames, temporal windows incl. video, curriculum / goal-conditioned / balanced / episode-chunking sampling), off-GIL prefetch, validate(), stats + normalization, train/val split, checkpoint/resume, NumPy / MLX / PyTorch / JAX output; ingest (MCAP JSON/protobuf/CDR, ROS 2 .db3); the Robotics DataFrame; and LeRobot write-back. Camera frames decode via FFmpeg. Not yet (functionally): Apple-Silicon zero-copy MLX (decode → IOSurface → MLX), native VideoToolbox, and NVIDIA NVDEC (built feature-gated, verified on a GPU box). See What works today.

Installation

Requires Python ≥ 3.10.

# pip
pip install pyroboframes

# uv
uv pip install pyroboframes
#   or, in a uv project:
uv add pyroboframes

# one-line installer (uses uv if present, else pip)
curl -LsSf https://raw.githubusercontent.com/Mullassery/PyRoboFrames/main/install.sh | sh

Prebuilt wheels are published for macOS (Apple Silicon); on other platforms pip builds from the source distribution (a Rust toolchain is required for that until more wheels ship).

The curl one-liner fetches install.sh from this repo; it needs the repository to be public.

Quickstart

Load states & actions (works today)

import pyroboframes as prf

# Open a LeRobot dataset on disk (the folder containing meta/, data/, videos/)
ds = prf.RoboFrameDataset.from_path("/path/to/lerobot_dataset")
print(ds)                 # RoboFrameDataset(episodes=…, frames=…, cameras=[…])

loader = ds.loader(
    batch_size=64,
    shuffle=True,         # buffered/quasi-random shuffle (keeps decode locality)
    seed=0,               # reproducible
    drop_last=False,
)

for batch in loader:                       # dict of NumPy arrays
    state  = batch["observation.state"]    # shape [64, state_dim], float32
    action = batch["action"]               # shape [64, action_dim], float32
    episodes = batch["episode_index"]      # which episode each row came from
    ...                                    # your training step

Temporal windows (works today)

Ask for several timesteps per sample with LeRobot-style delta_timestamps (seconds relative to the current frame):

loader = ds.loader(
    batch_size=64,
    delta_timestamps={"observation.state": [-0.1, 0.0]},  # one step of history + current
    tolerance_s=1e-4,                                      # nearest-frame match tolerance
)

for batch in loader:
    state = batch["observation.state"]   # shape [64, 2, state_dim]  (2 = num timesteps)
    ...

Camera frames (works via FFmpeg → NumPy)

Requires ffmpeg and ffprobe on your PATH. Frames come back as uint8 arrays shaped [batch, H, W, 3]:

# output="numpy" (default) | "mlx" | "torch"
loader = ds.loader(batch_size=64, cameras=["observation.images.top"], output="torch")
for batch in loader:
    frames = batch["observation.images.top"]   # torch.Tensor [64, H, W, 3] uint8
    state  = batch["observation.state"]         # torch.Tensor [64, state_dim]

output="torch" is zero-copy from the NumPy buffers; output="mlx" copies into unified memory. Decoding straight into MLX on the Apple Media Engine with zero copies (no NumPy hop) is the next milestone — see Roadmap.

Sequence batches for sequence models (works today)

chunk_size draws contiguous, in-episode chunks (never crossing a boundary) and shuffles them as units — sequence-friendly batches with decode locality. Pair it with delta_timestamps and MLX:

loader = ds.loader(
    batch_size=32,
    chunk_size=16,                                          # contiguous 16-frame chunks
    delta_timestamps={"observation.state": [-0.2, -0.1, 0.0]},
    output="mlx",
)
for batch in loader:
    seq = batch["observation.state"]   # mlx.core.array [32, 3, state_dim]
    ...

Convert a robotics log to columnar Parquet (works today)

Turn a raw robotics log (MCAP — Foxglove/teleop — or a ROS 2 .db3 bag) into one flattened Parquet table per topic, plus a self-describing metadata.json and a loader-ready stats.json. MCAP json, protobuf (via the embedded descriptor set), and cdr/ros2msg encodings all decode; ROS 2 bags decode their CDR against the embedded message definitions:

import pyroboframes as prf

report = prf.convert_mcap("run.mcap", "out/")          # or prf.convert_ros2_bag("bag.db3", "out/")
for t in report["topics"]:
    print(t["topic"], t["messages"], "msgs ->", t["path"])  # e.g. /state 2 msgs -> out/state.parquet
print("skipped (undecodable):", report["skipped"])

Query + time-align sensors with a Robotics DataFrame (works today)

Load the converted output as a typed, time-indexed, multi-sensor table — then slice by time or snap every sensor onto a reference topic's timestamps (backward as-of join = time-synced fusion):

df = prf.RoboticsDataFrame.from_mcap("run.mcap")   # or .from_converted("out/") / .from_ros2_bag(...)
print(df.topics, df.time_range())

window = df.slice(start_ns, end_ns)                # every topic restricted to a time window
fused = df.align("/joint_states", tolerance=10_000_000)  # 10 ms; columns like "imu.accel.x"
print(fused.log_time, fused["imu.accel.x"])        # NaN where no sample within tolerance

grid = df.resample(period=20_000_000, method="linear")   # 50 Hz uniform grid, interpolated
df.save("native_out/")                                   # round-trips via from_converted(...)

Write a dataset back out in LeRobot format (works today)

import numpy as np, pyroboframes as prf

prf.write_lerobot_dataset(
    "my_dataset/",
    features={"observation.state": np.zeros((100, 7), np.float32),
              "action": np.zeros((100, 7), np.float32)},
    episode_lengths=[50, 50],   # two episodes
    fps=30.0,
)
ds = prf.RoboFrameDataset.from_path("my_dataset/")   # read it straight back

Validate a dataset before training

report = ds.validate()          # checks frame-range contiguity, lengths, timestamps, totals
report.raise_if_errors()        # raises if integrity errors were found
print(report.ok, report.warnings)

What works today

Capability	Status
Read LeRobotDataset v3.0 (schema, episodes, state/action)	✅
Dataloader: batches of state/action as NumPy	✅
Shuffling (buffered/quasi-random), `drop_last`, seeding	✅
Temporal windows (`delta_timestamps`, `tolerance_s`) — tabular and video	✅
macOS and Linux	✅
Decoded-frame cache, batched-seek API, backend selection	✅
Camera frame decoding (FFmpeg → NumPy)	✅ (needs `ffmpeg` on `PATH`)
Dataset validation (`ds.validate()`)	✅
Dataset statistics (`ds.stats()`) + normalization (`loader(normalize=…)`)	✅
Train/val split (`ds.train_val_split()` + `loader(episodes=…)`)	✅
Episode iteration (`ds.episodes()`)	✅
Loader checkpoint/resume (`loader.position` / `seek()`)	✅
Off-GIL prefetch pipeline (`loader(num_workers=…)`)	✅
Balanced sampling (`loader(balanced=True)`, by episode)	✅
Episode-chunking sampler (`loader(chunk_size=N)`, sequence-friendly)	✅
Curriculum (`curriculum=True`) + goal-conditioned (`goal="final"`) sampling	✅
MCAP → columnar (Parquet) converter (`convert_mcap()`)	✅ JSON · protobuf · cdr/ros2msg
ROS 2 bag → columnar converter (`convert_ros2_bag()`, `.db3`)	✅
Converter metadata.json + stats.json (self-describing, loader-ready)	✅
Robotics DataFrame (time-index, `slice`, as-of `align`, `resample`, `save`)	✅
LeRobot write-back (`write_lerobot_dataset()`, v3.0)	✅
HF Hub importer (`download_lerobot_dataset()`)	✅ (needs `huggingface_hub`)
Memory-mapped data shards (lower RSS on large datasets)	✅
Image transforms + augments (Resize bilinear, Flip/Crop/ColorJitter)	✅ (NumPy; GPU later)
Backend parity (`to_backend`, `default_framework`, transform fallback chain)	✅
Device/backend selection (`resolve_device`, `DataLoader`, MPS)	✅
Loader profiling (`DataLoader(on_batch=…)`, `loader.stats`)	✅
Throughput benchmark harness (`benches/throughput.py`)	✅
NumPy / MLX / PyTorch / JAX output (`output=`)	✅ (torch is zero-copy from NumPy)
NVIDIA NVDEC decode (`CudaDecoder`, `--features cuda`)	🚧 built; verify on a GPU box
Native VideoToolbox decode	🚧
Zero-copy MLX (decode → IOSurface → MLX, no NumPy hop)	🚧 (upstream `mlx#2855`)
CV-CUDA compute · HF Hub streaming	🚧

The 🚧 rows are the honest gaps — see the Roadmap for sequencing.

How it works

LeRobotDataset            PyRoboFrames (Rust core)                 your training loop
┌──────────────┐   ┌──────────────────────────────────────┐   ┌────────────────────┐
│ parquet      │   │ episode index → sampler → per-camera   │   │  NumPy / MLX /      │
│ (state/action)│──▶│ decode → frame cache → time-synced     │──▶│  PyTorch            │
│ + mp4 video  │   │ windows                                │   │                     │
└──────────────┘   └──────────────────────────────────────┘   └────────────────────┘

Decode today uses FFmpeg; the Apple VideoToolbox / NVIDIA NVDEC hardware paths are planned.

The engine is Rust (crate pyroboframes-core); the Python package is a thin PyO3/maturin binding. Full design, decisions, and trade-offs are in ARCHITECTURE.md.

Cross-platform — Train Anywhere

The goal: one script runs unchanged on a Mac, a rented NVIDIA box, or a CPU — the environment picks the backend (device="auto"), not your code. See docs/ROADMAP.md for the design and build order.

Target	Decode	Compute / transforms	Output	Status
macOS (Apple Silicon) — MLX	FFmpeg	MLX	`mlx.core.array`	✅ output · ⏳ transforms
macOS (Apple Silicon) — MPS	FFmpeg	Torch (MPS)	`torch.Tensor`	⏳
RTX 5090 / H100 / RunPod	NVDEC	CV-CUDA	`torch.Tensor` (cuda)	⏳
Local CPU	FFmpeg (software)	NumPy / Torch	`np.ndarray` / `torch.Tensor`	✅
macOS (Apple Silicon)	FFmpeg	—	NumPy · MLX · PyTorch	✅ · VideoToolbox→zero-copy MLX ⏳

How it compares

PyRoboFrames deliberately does not reinvent robotics middleware (use Zenoh / dora-rs) or the dataset format (it reads LeRobot's). It targets the training data feed, especially on Apple Silicon. The libraries below overlap with that job from different angles. Full write-up in docs/COMPARISON.md.

Legend: ✅ works today · ⏳ planned / in progress · ⚠️ partial · ❌ no.

Library	Primary use	LeRobot-native	Apple HW decode	NVIDIA CUDA/NVDEC	Temporal windows	Frame cache	Core
PyRoboFrames	Robot-learning dataloader	✅	⏳	⏳	✅	✅	Rust
LeRobot (built-in loader)	Robot-learning stack + loader	✅	❌	✅	✅	❌	Python
Robo-DM	Robot dataset mgmt + loading	❌ (own EBML)	❌	✅	⚠️	✅ (mmap)	C++/Python
torchcodec	Video decode for PyTorch	n/a	❌	✅	❌	❌	C++/Rust
NVIDIA DALI	GPU data loading (vision)	❌	❌	✅	❌	⚠️	C++/CUDA
FFCV	Fast vision dataloader	❌ (own format)	❌	✅	❌	✅ (RAM)	Python/C
WebDataset	Sharded streaming format	❌	❌	n/a	❌	❌	Python
decord	Video reading for DL	n/a	❌	✅	❌	❌	C++

Which should I use?

Training a LeRobot policy on a Mac (or want MLX output): PyRoboFrames — it runs today (FFmpeg decode, MLX/PyTorch output) and is the only one targeting Apple-Silicon hardware decode + zero-copy MLX next.
Training a LeRobot policy on NVIDIA today: LeRobot's built-in loader (uses torchcodec) is the mature path; PyRoboFrames' CUDA backend is in progress.
Huge robot datasets, framework-agnostic, max raw loading speed: Robo-DM.
General (non-robot) GPU vision pipelines on NVIDIA: DALI or FFCV.
Just decoding video frames into PyTorch tensors: torchcodec.

The gap PyRoboFrames fills: a LeRobot-native dataloader that treats Apple Silicon as a first-class target (hardware decode + MLX), which none of the others do.

⏳ = designed and scaffolded but not yet functional (see What works today). PyRoboFrames already runs on a Mac with MLX/PyTorch output today via FFmpeg decode; the remaining piece is the hardware decode path.

Roadmap

Direction is informed by where robot learning is heading — Vision-Language-Action (VLA) models trained on ever-larger, multimodal, increasingly streamed datasets, with a growing need for data-quality curation.

Shipped so far (0.1.0 → 0.1.3): dataloader (state/action + camera frames), buffered shuffle, temporal windows, ds.validate(), ds.stats(), train/val split (train_val_split + loader(episodes=…)), checkpoint/resume, FFmpeg decode, and NumPy / MLX / PyTorch output — macOS & Linux. (All single-threaded; no throughput benchmarks published yet.)

Next up:

Performance — the actual speed story. Wire the off-GIL parallel prefetch + worker pipeline (today these are config fields only), then publish a reproducible throughput benchmark vs the FFmpeg/CPU baseline. This is what justifies the word "fast"; until it lands, the claim stays a goal.
Train Anywhere (multi-backend core). One script, unchanged, across MacBook (MLX / MPS), NVIDIA (RTX 5090 / H100 / RunPod, via CV-CUDA + NVDEC), and CPU — the runtime auto-selects the backend. Sequenced test-first: the backend-detection seam, the unified tensor/transforms abstraction, and the CPU/MPS/MLX paths (verifiable on a Mac) land before the NVIDIA paths (built feature-gated, functionally signed off on a GPU box). Full plan + priority tiers in docs/ROADMAP.md.
0.1.x — The Apple fast path. Native VideoToolbox (macOS) hardware decode → zero-copy MLX (no NumPy hop, gated on mlx#2855) and NVIDIA NVDEC on Linux; a published decode-throughput benchmark vs. the FFmpeg/CPU baseline.
0.2 — Streaming. Stream datasets directly from the Hugging Face Hub without a full download (à la LeRobot's StreamingLeRobotDataset).
0.3 — More formats. MCAP, RLDS / Open X-Embodiment, and HDF5 ingestion behind the same loader API.
0.4 — Data-quality curation. Beyond validate(): trajectory scoring (jitter, diversity, sharpness, state-variance) to filter low-quality segments before training.
0.5+ — Scale. Multi-GPU / multi-Mac distributed loading, on-the-fly augmentation, and interop with synthetic-data / world-model pipelines.

See docs/ROADMAP.md for the "Train Anywhere" multi-backend plan and priority tiers, and docs/IMPLEMENTATION_PLAN.md for the original v0.1 build sequence.

Documentation

ARCHITECTURE.md — design, the gap, and decisions.
docs/COMPARISON.md — alternatives and adopted techniques.
docs/IMPLEMENTATION_PLAN.md — phased build plan.
AGENTS.md — orientation for contributors and AI coding agents.
CONTRIBUTING.md · CHANGELOG.md

Contributing

Contributions welcome — see CONTRIBUTING.md. The highest-impact work right now is the video-decode backends and the MLX zero-copy path (mlx#2855).

License

MIT © Georgi Mammen Mullassery

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.10

Jun 27, 2026

0.1.9

Jun 27, 2026

0.1.8

Jun 27, 2026

0.1.7

Jun 25, 2026

0.1.6

Jun 25, 2026

0.1.5

Jun 25, 2026

0.1.4

Jun 25, 2026

0.1.3

Jun 25, 2026

0.1.2

Jun 24, 2026

0.1.1

Jun 24, 2026

0.1.0

Jun 24, 2026

0.1.0a0 pre-release

Jun 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyroboframes-0.1.10.tar.gz (96.5 kB view details)

Uploaded Jun 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyroboframes-0.1.10-cp310-abi3-macosx_11_0_arm64.whl (5.1 MB view details)

Uploaded Jun 27, 2026 CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file pyroboframes-0.1.10.tar.gz.

File metadata

Download URL: pyroboframes-0.1.10.tar.gz
Upload date: Jun 27, 2026
Size: 96.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for pyroboframes-0.1.10.tar.gz
Algorithm	Hash digest
SHA256	`6eaf4c2961f18d1fa695c392a319223c1bb231bca0824890df359f425eebc3eb`
MD5	`6ad2e6601b0c21bc35560101e03f706f`
BLAKE2b-256	`6f1fce45f356f2dbb62479157c2238a9c858536a3f3af28f14cb9ab270e6e6a7`

See more details on using hashes here.

File details

Details for the file pyroboframes-0.1.10-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

Download URL: pyroboframes-0.1.10-cp310-abi3-macosx_11_0_arm64.whl
Upload date: Jun 27, 2026
Size: 5.1 MB
Tags: CPython 3.10+, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for pyroboframes-0.1.10-cp310-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`bdf8bd820b4650474889f229ae04df13e1f08d2f0ff205b6b5d10acfaad7cb0f`
MD5	`f17ee289c9495faeac4daac7c12c88c6`
BLAKE2b-256	`924a5171a65dfa6525659e3a441dd40383bb428b73edfa259ebfe669f5b32703`

See more details on using hashes here.

pyroboframes 0.1.10

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PyRoboFrames

What is this, in plain terms?

When would I use it?

Why it's different

Status: early (0.1.x — expect API changes)

Installation

Quickstart

Load states & actions (works today)

Temporal windows (works today)

Camera frames (works via FFmpeg → NumPy)

Sequence batches for sequence models (works today)

Convert a robotics log to columnar Parquet (works today)

Query + time-align sensors with a Robotics DataFrame (works today)

Write a dataset back out in LeRobot format (works today)

Validate a dataset before training

What works today

How it works

Cross-platform — Train Anywhere

How it compares

Which should I use?

Roadmap

Documentation

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Status: early (`0.1.x` — expect API changes)