Pydantic media reference for images and video frames with lazy loading and optimized batch decoding
Project description
MediaRef
The portable frame-level media reference primitive — container-agnostic, fps-free, RFC-based.
(uri, pts_ns) is the entire schema. URIs follow RFC 3986 (with RFC 2397 for embedded data); pts_ns is an int64 nanosecond presentation timestamp. The schema is frozen for the life of MediaRef Spec 1.x. Works in any container (Parquet, mcap, rosbag, HDF5) and any standard media format (JPEG, PNG, H.264, H.265, AV1).
Quick Start
from mediaref import MediaRef, DataURI, batch_decode
import numpy as np
# 1. Create references — local file, HTTP(S), cloud, or video frame.
ref = MediaRef(uri="image.png")
ref = MediaRef(uri="https://example.com/image.jpg")
ref = MediaRef(uri="s3://bucket/image.jpg") # any fsspec scheme
ref = MediaRef(uri="video.mp4", pts_ns=1_000_000_000) # frame at 1.0s
# 2. Load.
rgb = ref.to_ndarray() # (H, W, 3) RGB
pil = ref.to_pil_image()
# 3. Embed bytes inside a MediaRef (self-contained reference).
ref = MediaRef(uri=DataURI.from_image(rgb, format="png"))
# 4. Batch-decode many frames from one video — opens the container once.
refs = [MediaRef(uri="video.mp4", pts_ns=int(i*1e9)) for i in range(10)]
frames = batch_decode(refs)
# 5. Serialize for storage in any string-based format.
json_str = ref.model_dump_json() # '{"uri":"...","pts_ns":...}'
See API Reference for full details — DataURI, batch_decode, cloud URIs, HuggingFace datasets integration, lerobot interop, the mediaref CLI.
Why MediaRef?
1. Separate heavy media from lightweight metadata. Store 1 TB of videos separately and keep only a few KB of references in your tables. MediaRef is decoupled, format-agnostic, and works wherever you can store a string. Already used in production: the D2E research project stores 10 TB+ of gameplay data referenced by MediaRef via OWAMcap.
2. Permanent schema built on RFCs. (uri, pts_ns) is frozen for the life of Spec 1.x. No proprietary formats, no breaking changes.
3. Sparse-frame batch decoding. When loading many frames from a single video, batch_decode() opens the container once and seeks monotonically — 4.9× faster decoding throughput and 2.2× better I/O efficiency vs per-frame decoding on a sparse-frame ML dataloader workload. Methodology: D2E paper Section 3 / Appendix A.
Installation
pip install mediaref # core: image loading + cloud-storage URIs (fsspec)
pip install 'mediaref[video]' # + PyAV for video frame decoding
pip install 'mediaref[hf]' # + HuggingFace datasets feature registration
pip install 'mediaref[video,hf]' # all extras
For uv: uv add 'mediaref[video,hf]~=1.0.0'. MediaRef follows semantic versioning; patch releases are bug-only, minor releases are backward-compatible. The wire schema (uri, pts_ns) is frozen for the life of Spec 1.x.
Optional TorchCodec backend. batch_decode(refs, decoder="torchcodec") uses TorchCodec for CUDA-accelerated decoding. TorchCodec ships its own FFmpeg shared-library expectations that may not match PyAV's bundled copies; if you see libavcodec.so.NN: cannot open shared object file after pip install torchcodec, repair the install with patch-torchcodec (it patches torchcodec's RPATH onto PyAV's bundled FFmpeg):
pip install patch-torchcodec && patch-torchcodec
Documentation
- API Reference — full API:
MediaRef,DataURI,batch_decode, cloud URIs, HuggingFace integration, lerobot interop, the CLI. - MediaRef Specification 1.0 — wire format, URI grammar,
pts_nssemantics, conformance criteria. - Comparisons — how MediaRef relates to
datasets.Videoand lerobot'sVideoFrame. - Playback Semantics — how frame selection works at specific timestamps.
Examples
- ROS bag conversion — convert ROS1/ROS2 bags with
CompressedImagetopics to MediaRef-referenced video, recovering 70–90% storage via inter-frame compression. Works without a ROS install (usesrosbags).
Datasets shipped with MediaRef
These are projects from the author's own work that use MediaRef on the storage path. External adopters welcome — open a PR to add yours.
| Dataset | Domain | Scale |
|---|---|---|
| open-world-agents/D2E-Original | Game agents (29 PC games) | 273.4 hours, 1.83 TB |
| open-world-agents/D2E-480p | Game agents (downsampled) | — |
| maum-ai/CostNav-Teleop-Dataset | Delivery-robot navigation / teleop | — |
Tagging a HuggingFace dataset with mediaref makes it discoverable at huggingface.co/datasets?other=mediaref.
Citation
If you reference MediaRef in writing, the CITATION.cff file at repo root has the canonical metadata. BibTeX:
@software{mediaref,
author = {Choi, Suhwan},
title = {MediaRef: a portable frame-level media reference primitive},
url = {https://github.com/open-world-agents/MediaRef},
year = {2025}
}
Acknowledgments
The video decoder interface design references TorchCodec's API design.
License
MediaRef is released under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mediaref-1.0.0.tar.gz.
File metadata
- Download URL: mediaref-1.0.0.tar.gz
- Upload date:
- Size: 6.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b124448e1c9ba527d61f699314a9feb7ebd8487871740dd8a982bee53698cbb5
|
|
| MD5 |
e8e3fbfca54205a7d6e786d9ac9ca009
|
|
| BLAKE2b-256 |
646deba5db1ba9da2fde72350e94278106490fb5dc240b3ad103c53a629304bb
|
File details
Details for the file mediaref-1.0.0-py3-none-any.whl.
File metadata
- Download URL: mediaref-1.0.0-py3-none-any.whl
- Upload date:
- Size: 43.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f893493707f0114062239a9466ce554b86cdca2b4807dd89a967af6805448c54
|
|
| MD5 |
eb0fa98d37c85c100ad0533ffd8a0a6b
|
|
| BLAKE2b-256 |
cea6506fa90ca97ef1626c12557d74dc748ffad64d4ed97588f42e31b2cef3ea
|