Skip to main content

Pydantic media reference for images and video frames with lazy loading and optimized batch decoding

Project description

MediaRef

CI pypi versions license

The portable frame-level media reference primitive — container-agnostic, fps-free, RFC-based.

(uri, pts_ns) is the entire schema. URIs follow RFC 3986 (with RFC 2397 for embedded data); pts_ns is an int64 nanosecond presentation timestamp. The schema is frozen for the life of MediaRef Spec 1.x. Works in any container (Parquet, mcap, rosbag, HDF5) and any standard media format (JPEG, PNG, H.264, H.265, AV1).

Quick Start

from mediaref import MediaRef, DataURI, batch_decode
import numpy as np

# 1. Create references — local file, HTTP(S), cloud, or video frame.
ref = MediaRef(uri="image.png")
ref = MediaRef(uri="https://example.com/image.jpg")
ref = MediaRef(uri="s3://bucket/image.jpg")             # any fsspec scheme
ref = MediaRef(uri="video.mp4", pts_ns=1_000_000_000)   # frame at 1.0s

# 2. Load.
rgb = ref.to_ndarray()      # (H, W, 3) RGB
pil = ref.to_pil_image()

# 3. Embed bytes inside a MediaRef (self-contained reference).
ref = MediaRef(uri=DataURI.from_image(rgb, format="png"))

# 4. Batch-decode many frames from one video — opens the container once.
refs = [MediaRef(uri="video.mp4", pts_ns=int(i*1e9)) for i in range(10)]
frames = batch_decode(refs)

# 5. Serialize for storage in any string-based format.
json_str = ref.model_dump_json()   # '{"uri":"...","pts_ns":...}'

See API Reference for full details — DataURI, batch_decode, cloud URIs, HuggingFace datasets integration, lerobot interop, the mediaref CLI.

Why MediaRef?

1. Separate heavy media from lightweight metadata. Store 1 TB of videos separately and keep only a few KB of references in your tables. MediaRef is decoupled, format-agnostic, and works wherever you can store a string. Already used in production: the D2E research project stores 10 TB+ of gameplay data referenced by MediaRef via OWAMcap.

2. Permanent schema built on RFCs. (uri, pts_ns) is frozen for the life of Spec 1.x. No proprietary formats, no breaking changes.

3. Sparse-frame batch decoding. When loading many frames from a single video, batch_decode() opens the container once and seeks monotonically — 4.9× faster decoding throughput and 2.2× better I/O efficiency vs per-frame decoding on a sparse-frame ML dataloader workload. Methodology: D2E paper Section 3 / Appendix A.

Decoding Benchmark

Installation

pip install mediaref                  # core: image loading + cloud-storage URIs (fsspec)
pip install 'mediaref[video]'         # + PyAV for video frame decoding
pip install 'mediaref[hf]'            # + HuggingFace datasets feature registration
pip install 'mediaref[video,hf]'      # all extras

For uv: uv add 'mediaref[video,hf]~=1.0.0'. MediaRef follows semantic versioning; patch releases are bug-only, minor releases are backward-compatible. The wire schema (uri, pts_ns) is frozen for the life of Spec 1.x.

Optional TorchCodec backend. batch_decode(refs, decoder="torchcodec") uses TorchCodec for CUDA-accelerated decoding. TorchCodec ships its own FFmpeg shared-library expectations that may not match PyAV's bundled copies; if you see libavcodec.so.NN: cannot open shared object file after pip install torchcodec, repair the install with patch-torchcodec (it patches torchcodec's RPATH onto PyAV's bundled FFmpeg):

pip install patch-torchcodec && patch-torchcodec

Documentation

  • API Reference — full API: MediaRef, DataURI, batch_decode, cloud URIs, HuggingFace integration, lerobot interop, the CLI.
  • MediaRef Specification 1.0 — wire format, URI grammar, pts_ns semantics, conformance criteria.
  • Comparisons — how MediaRef relates to datasets.Video and lerobot's VideoFrame.
  • Playback Semantics — how frame selection works at specific timestamps.

Examples

  • ROS bag conversion — convert ROS1/ROS2 bags with CompressedImage topics to MediaRef-referenced video, recovering 70–90% storage via inter-frame compression. Works without a ROS install (uses rosbags).

Datasets shipped with MediaRef

These are projects from the author's own work that use MediaRef on the storage path. External adopters welcome — open a PR to add yours.

Dataset Domain Scale
open-world-agents/D2E-Original Game agents (29 PC games) 273.4 hours, 1.83 TB
open-world-agents/D2E-480p Game agents (downsampled)
maum-ai/CostNav-Teleop-Dataset Delivery-robot navigation / teleop

Tagging a HuggingFace dataset with mediaref makes it discoverable at huggingface.co/datasets?other=mediaref.

Citation

If you reference MediaRef in writing, the CITATION.cff file at repo root has the canonical metadata. BibTeX:

@software{mediaref,
  author = {Choi, Suhwan},
  title  = {MediaRef: a portable frame-level media reference primitive},
  url    = {https://github.com/open-world-agents/MediaRef},
  year   = {2025}
}

Acknowledgments

The video decoder interface design references TorchCodec's API design.

License

MediaRef is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mediaref-1.0.0.tar.gz (6.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mediaref-1.0.0-py3-none-any.whl (43.2 kB view details)

Uploaded Python 3

File details

Details for the file mediaref-1.0.0.tar.gz.

File metadata

  • Download URL: mediaref-1.0.0.tar.gz
  • Upload date:
  • Size: 6.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mediaref-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b124448e1c9ba527d61f699314a9feb7ebd8487871740dd8a982bee53698cbb5
MD5 e8e3fbfca54205a7d6e786d9ac9ca009
BLAKE2b-256 646deba5db1ba9da2fde72350e94278106490fb5dc240b3ad103c53a629304bb

See more details on using hashes here.

File details

Details for the file mediaref-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: mediaref-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 43.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mediaref-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f893493707f0114062239a9466ce554b86cdca2b4807dd89a967af6805448c54
MD5 eb0fa98d37c85c100ad0533ffd8a0a6b
BLAKE2b-256 cea6506fa90ca97ef1626c12557d74dc748ffad64d4ed97588f42e31b2cef3ea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page