Zero-copy, hardware-accelerated robot-learning dataloader for Apple Silicon (MLX)

These details have not been verified by PyPI

Project links

Project description

PyRoboFrames

Zero-copy, hardware-accelerated robot-learning dataloader for Apple Silicon.

PyRoboFrames feeds robot-learning training loops on Apple Silicon at hardware speed. It reads robot datasets (LeRobotDataset v3.0, with MCAP planned), decodes their multi-camera video on the Apple Media Engine via VideoToolbox, and hands the frames to MLX (and PyTorch-MPS) as arrays without a single CPU copy — turning the data path from the training bottleneck into a non-event.

Status: pre-alpha, under active construction. APIs will change and it is not yet on PyPI. The sections below describe the v0.1 goal — see What works today for the current state.

What works today

Implemented and tested (Rust core + Python):

✅ LeRobotDataset v3.0 readers — schema / cameras / fps; a per-episode index that resolves a global frame to (camera, video file, timestamp); and tabular state/action reading.
✅ Working dataloader (tabular) — RoboFrameDataset.from_path(...).loader(...) iterates NumPy batches of observation.state / action with a buffered/quasi-random shuffle, drop_last, and seeded reproducibility. Works today on any LeRobotDataset v3.0.
✅ Temporal windows — LeRobot-style delta_timestamps return [batch, steps, dim] arrays.
✅ Decode scaffolding — the Decoder trait (batched seeks), a decoded-frame LRU cache, a frame-buffer pool, and per-platform backend selection (VideoToolbox / FFmpeg / CUDA NVDEC).

Not usable yet (in progress):

🚧 Video frames — VideoToolbox (macOS) / FFmpeg / CUDA-NVDEC (Linux) decode are feature-gated stubs (the decode integration into the pipeline is done and tested).
🚧 Zero-copy MLX output (the Apple-Silicon differentiator).
🚧 The validation pass (ds.validate()).

Try the working part now (state / action → NumPy)

import pyroboframes as prf

ds = prf.RoboFrameDataset.from_path("/path/to/lerobot_dataset")
print(ds)                                   # episodes / frames / cameras
loader = ds.loader(batch_size=64, shuffle=True)

for batch in loader:                        # dict of NumPy arrays
    state  = batch["observation.state"]     # [64, state_dim], float32
    action = batch["action"]                # [64, action_dim], float32
    ...                                      # your training step

The video/MLX dataloader shown further below is the v0.1 target, not yet shipped.

The problem

Robot-learning datasets store observations as MP4 video (often several cameras per episode). During training, every sample requires seeking into those videos and decoding the right frames. This decode step is the dominant cost of the data pipeline — Hugging Face's own LeRobot tracker reports training that is "completely bottlenecked by video decoding even on servers with hundreds of cores," spending more time waiting on the dataloader than on backprop (lerobot#1623).

On Apple Silicon the problem is worse, and avoidably so: the standard Python stack (torchvision / PyAV / FFmpeg software decode) runs on the CPU and leaves the dedicated Media Engine idle, then copies frames across to the GPU — copies that are pure waste on a unified-memory machine. Meanwhile the compute side (MLX, M5 Neural Accelerators) is fast and underfed.

What PyRoboFrames does

This is the v0.1 design; see What works today for what's currently built.

LeRobotDataset / MCAP        PyRoboFrames (Rust core)              your training loop
┌───────────────────┐   ┌──────────────────────────────────┐   ┌────────────────────┐
│ parquet (state /  │   │ index → sample → VideoToolbox HW   │   │  MLX  (Neural       │
│ action) + mp4     │──▶│ decode → IOSurface (shared mem) →  │──▶│  Accelerators) or  │
│ video shards      │   │ time-synced windows, no copy       │   │  PyTorch-MPS        │
└───────────────────┘   └──────────────────────────────────┘   └────────────────────┘

Hardware decode via Apple VideoToolbox — uses the Media Engine, not the CPU.
Zero-copy — decoded frames live in IOSurface-backed unified memory and are wrapped as MLX arrays without a host→device transfer (there is no "device transfer" on unified memory; we stop pretending there is).
Time-synced windows — assembles (multi-camera frames, joint state, action) windows by joining the parquet tabular data with the decoded video at matching timestamps.
Built-in validation — flags missing frames, non-monotonic timestamps, and camera/state misalignment before they silently corrupt a training run.

Why a Rust core with a Python API

The audience is ML researchers, so the product is a pip-installable Python package — the Rust is invisible. Rust is the implementation because the hot path (HW decode, IOSurface lifetime management, off-GIL prefetch, zero-copy buffer hand-off) is exactly where a safe systems language with no GIL earns its keep. The result: a fast, safe core with an ergonomic Python shell — and no Rust toolchain needed to pip install it — via PyO3 + maturin.

Installation

Not yet released. When v0.1 ships:

pip install pyroboframes        # macOS / Apple Silicon wheels, no Rust toolchain needed

Wheels are built for Apple Silicon (primary target) with a portable FFmpeg fallback for other platforms.

Quickstart (planned v0.1 API)

import pyroboframes as prf

# Open a LeRobot dataset (local path or Hugging Face Hub repo id)
ds = prf.RoboFrameDataset.from_hub("lerobot/aloha_sim_insertion_human")

# Validate before training
report = ds.validate()
report.raise_if_errors()        # missing frames, timestamp gaps, cam/state mismatch

# Build a dataloader that yields MLX arrays, zero-copy, decoded on the Media Engine
loader = ds.loader(
    batch_size=64,
    cameras=["observation.images.top", "observation.images.wrist"],
    delta_timestamps={"observation.images.top": [-0.1, 0.0]},  # temporal context (LeRobot-style)
    tolerance_s=1e-4,           # snap to the nearest frame within this tolerance
    shuffle=True,
    num_workers=4,              # Rust worker pool, runs off the GIL
    output="mlx",               # or "numpy" / "torch" (MPS)
)

for batch in loader:
    frames = batch["observation.images.top"]   # mlx.core.array, already on GPU
    state  = batch["observation.state"]
    action = batch["action"]
    ...                                          # your MLX training step

Cross-platform

PyRoboFrames runs on both macOS and Linux from the same API and the same Rust core. The platform-specific part is decode and output, selected behind a single Decoder trait:

macOS (Apple Silicon) — the optimized path: VideoToolbox hardware decode → IOSurface → zero-copy MLX. This is the differentiator.
Linux — the same engine, decoding via FFmpeg (VAAPI where available, software otherwise) and outputting NumPy / PyTorch.
Linux + CUDA — when CUDA libraries are present (build with --features cuda), NVIDIA NVDEC hardware decode with CUDA output for PyTorch.

Supported (target matrix)

	v0.1	Planned
Datasets	LeRobotDataset v3.0	MCAP, RLDS, HDF5
Decode (HW)	macOS: VideoToolbox · Linux: FFmpeg (VAAPI) + software · Linux+CUDA: NVDEC	ProRes, AV1 (M3+)
Output	macOS: MLX · all: NumPy	PyTorch (MPS/CUDA) via DLPack
Platform	macOS (Apple Silicon) · Linux (x86_64, aarch64) · Linux+CUDA	CUDA zero-copy output

Benchmarks

The headline metric is decode+load throughput on Apple Silicon vs. the PyAV/CPU path. Numbers will be published here with a reproducible harness once v0.1 lands.

Pipeline	Frames/s (M-series)	Notes
PyAV / CPU (baseline)	TBD	torchvision default backend
PyRoboFrames (VideoToolbox, zero-copy)	TBD	target: multiple× baseline

Roadmap

See ARCHITECTURE.md for the full design and decisions.

v0.1 — LeRobotDataset v3.0 → hardware decode (VideoToolbox on macOS, FFmpeg on Linux) → dataloader with zero-copy MLX (macOS) / NumPy (Linux), validation, and a benchmark harness.
v0.2 — MCAP ingest, PyTorch-MPS output via DLPack.
v0.3 — RLDS / HDF5 ingest, multi-Mac distributed loading.

Contributing

Contributions welcome — see CONTRIBUTING.md. The Rust core lives in crates/, the Python package in python/. The most valuable early contributions are around the MLX zero-copy init path (see mlx#2855) and the benchmark harness.

Prior art & acknowledgements

docs/COMPARISON.md compares PyRoboFrames against LeRobot, torchcodec, Robo-DM, DALI, FFCV and others, and records which of their techniques we adopt (a decoded-frame cache, buffered shuffle, batched seeks, and LeRobot's delta_timestamps/tolerance_s API).

PyRoboFrames stands on LeRobot, MLX, Apple VideoToolbox, PyO3, and the Rust FFmpeg ecosystem. It deliberately does not reinvent robotics middleware — that space is well served by Zenoh and dora-rs. It targets the one layer they leave unsolved on Apple Silicon: the training data feed.

License

MIT © Georgi Mammen Mullassery

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.10

Jun 27, 2026

0.1.9

Jun 27, 2026

0.1.8

Jun 27, 2026

0.1.7

Jun 25, 2026

0.1.6

Jun 25, 2026

0.1.5

Jun 25, 2026

0.1.4

Jun 25, 2026

0.1.3

Jun 25, 2026

0.1.2

Jun 24, 2026

0.1.1

Jun 24, 2026

0.1.0

Jun 24, 2026

This version

0.1.0a0 pre-release

Jun 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyroboframes-0.1.0a0.tar.gz (40.9 kB view details)

Uploaded Jun 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyroboframes-0.1.0a0-cp310-abi3-macosx_11_0_arm64.whl (3.1 MB view details)

Uploaded Jun 24, 2026 CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file pyroboframes-0.1.0a0.tar.gz.

File metadata

Download URL: pyroboframes-0.1.0a0.tar.gz
Upload date: Jun 24, 2026
Size: 40.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.14.0

File hashes

Hashes for pyroboframes-0.1.0a0.tar.gz
Algorithm	Hash digest
SHA256	`896717fba4544b3f64d8222f45c40614c36e4141990ea0ebf8555dde67b0093d`
MD5	`b2a8415f013edf083c1ba14faefd60a2`
BLAKE2b-256	`f3eac9ba8751913da7484c3033f967db092c34616775c69803c74bd6b57d0cef`

See more details on using hashes here.

File details

Details for the file pyroboframes-0.1.0a0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

Download URL: pyroboframes-0.1.0a0-cp310-abi3-macosx_11_0_arm64.whl
Upload date: Jun 24, 2026
Size: 3.1 MB
Tags: CPython 3.10+, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.14.0

File hashes

Hashes for pyroboframes-0.1.0a0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`c33a77c5b06bc257073886b286a8504f6bf0adf67921d87c398f0dda41d388b5`
MD5	`34c0f2acd23a53476993c83cd7d74a73`
BLAKE2b-256	`892c7948335d25d3a49ae46d8cd6627c0603b720c1f04d14c916850536dca36d`

See more details on using hashes here.

pyroboframes 0.1.0a0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PyRoboFrames

What works today

Try the working part now (state / action → NumPy)

The problem

What PyRoboFrames does

Why a Rust core with a Python API

Installation

Quickstart (planned v0.1 API)

Cross-platform

Supported (target matrix)

Benchmarks

Roadmap

Contributing

Prior art & acknowledgements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes