Skip to main content

Pydantic media reference for images and video frames with lazy loading and optimized batch decoding

Project description

MediaRef

CI pypi versions license

The portable frame-level media reference primitive — container-agnostic, fps-free, RFC-based.

(uri, pts_ns) is the entire schema. URIs follow RFC 3986 (with RFC 2397 for embedded data); pts_ns is an int64 nanosecond presentation timestamp. The schema is frozen for the life of MediaRef Spec 1.x. Works in any container (Parquet, mcap, rosbag, HDF5) and any standard media format (JPEG, PNG, H.264, H.265, AV1).

Quick Start

from mediaref import MediaRef, DataURI, batch_decode
import numpy as np

# 1. Create references — local file, HTTP(S), cloud, or video frame.
ref = MediaRef(uri="image.png")
ref = MediaRef(uri="https://example.com/image.jpg")
ref = MediaRef(uri="s3://bucket/image.jpg")             # any fsspec scheme
ref = MediaRef(uri="video.mp4", pts_ns=1_000_000_000)   # frame at 1.0s

# 2. Load.
rgb = ref.to_ndarray()      # (H, W, 3) RGB
pil = ref.to_pil_image()

# 3. Embed bytes inside a MediaRef (self-contained reference).
ref = MediaRef(uri=DataURI.from_image(rgb, format="png"))

# 4. Batch-decode many frames from one video — opens the container once.
refs = [MediaRef(uri="video.mp4", pts_ns=int(i*1e9)) for i in range(10)]
frames = batch_decode(refs)

# 5. Serialize for storage in any string-based format.
json_str = ref.model_dump_json()   # '{"uri":"...","pts_ns":...}'

See API Reference for full details — DataURI, batch_decode, cloud URIs, HuggingFace datasets integration, lerobot interop, the mediaref CLI.

Why MediaRef?

1. Separate heavy media from lightweight metadata. Store 1 TB of videos separately and keep only a few KB of references in your tables. MediaRef is decoupled, format-agnostic, and works wherever you can store a string. Already used in production: the D2E research project stores 1 TB+ of gameplay data referenced by MediaRef via OWAMcap.

2. Permanent schema built on RFCs. (uri, pts_ns) is frozen for the life of Spec 1.x. No proprietary formats, no breaking changes.

3. Sparse-frame batch decoding. When loading many frames from a single video, batch_decode() opens the container once and seeks monotonically — 4.9× faster decoding throughput and 2.2× better I/O efficiency vs per-frame decoding on a sparse-frame ML dataloader workload. Methodology: D2E paper Section 3 / Appendix A.

Decoding Benchmark

Installation

pip install mediaref                  # core: image loading + cloud-storage URIs (fsspec)
pip install 'mediaref[video]'         # + PyAV for video frame decoding
pip install 'mediaref[hf]'            # + HuggingFace datasets feature registration
pip install 'mediaref[video,hf]'      # all extras

For uv: uv add 'mediaref[video,hf]'. MediaRef follows semantic versioning; the wire schema (uri, pts_ns) is frozen for the life of Spec 1.x.

Optional TorchCodec backend. batch_decode(refs, decoder="torchcodec") uses TorchCodec for CUDA-accelerated decoding. TorchCodec ships its own FFmpeg shared-library expectations that may not match PyAV's bundled copies; if you see libavcodec.so.NN: cannot open shared object file after pip install torchcodec, repair the install with patch-torchcodec (it patches torchcodec's RPATH onto PyAV's bundled FFmpeg):

pip install patch-torchcodec && patch-torchcodec

Documentation

  • API Reference — full API: MediaRef, DataURI, batch_decode, cloud URIs, HuggingFace integration, lerobot interop, the CLI.
  • MediaRef Specification 1.0 — wire format, URI grammar, pts_ns semantics, conformance criteria.
  • Comparisons — how MediaRef relates to datasets.Video and lerobot's VideoFrame.
  • Playback Semantics — how frame selection works at specific timestamps.

Examples

  • ROS bag conversion — convert ROS1/ROS2 bags with CompressedImage topics to MediaRef-referenced video, recovering 70–90% storage via inter-frame compression. Works without a ROS install (uses rosbags).

Datasets shipped with MediaRef

These are projects from the author's own work that use MediaRef on the storage path. External adopters welcome — open a PR to add yours.

Dataset Domain Scale
open-world-agents/D2E-Original Game agents (29 PC games) 273.4 hours, 1.83 TB
open-world-agents/D2E-480p Game agents (downsampled)
maum-ai/CostNav-Teleop-Dataset Delivery-robot navigation / teleop

Tagging a HuggingFace dataset with mediaref makes it discoverable at huggingface.co/datasets?other=mediaref.

Citation

If you reference MediaRef in writing, the CITATION.cff file at repo root has the canonical metadata. BibTeX:

@software{mediaref,
  author  = {Choi, Suhwan},
  title   = {MediaRef: a portable frame-level media reference primitive},
  version = {1.0.0},
  year    = {2026},
  doi     = {10.5281/zenodo.19892316},
  url     = {https://github.com/open-world-agents/MediaRef}
}

The doi above is the Zenodo concept DOI — it always resolves to the latest published release. To cite v1.0.0 specifically, use 10.5281/zenodo.19892317.

Acknowledgments

The video decoder interface design references TorchCodec's API design.

License

MediaRef is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mediaref-1.1.0.tar.gz (6.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mediaref-1.1.0-py3-none-any.whl (44.0 kB view details)

Uploaded Python 3

File details

Details for the file mediaref-1.1.0.tar.gz.

File metadata

  • Download URL: mediaref-1.1.0.tar.gz
  • Upload date:
  • Size: 6.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.10 {"installer":{"name":"uv","version":"0.11.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mediaref-1.1.0.tar.gz
Algorithm Hash digest
SHA256 fbbcb9f4dfdc079a528d8caa43c00e01fe24ec333c0aaecf061d5ff599424d5f
MD5 09cd1a6d4d9d2a7dbc426295ac84c9e1
BLAKE2b-256 913b03b88b30503b24a9cb181ced0780dc6539858f67c45088ceb56a3607aa76

See more details on using hashes here.

File details

Details for the file mediaref-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: mediaref-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 44.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.10 {"installer":{"name":"uv","version":"0.11.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mediaref-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f0f191f37c7337429e2281dcc21a9c0658920059d25afb413b65499084fa4e8d
MD5 f59e25e666a635d2a6aed691ea6b5940
BLAKE2b-256 09afacb4ddc5660a1368a926a8b3612cfe6cc652d62d4d34a9315c99e475884e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page