Pydantic media reference for images and video frames with lazy loading and optimized batch decoding

These details have not been verified by PyPI

Project description

MediaRef

Pydantic media reference for images and video frames (with timestamp support) from data URIs, HTTP URLs, file URIs, and local paths. Features lazy loading and optimized batch video decoding.

Works with any container format (Parquet, HDF5, mcap, rosbag, etc.) and any media format (JPEG, PNG, H.264, H.265, AV1, etc.).

Why MediaRef?

1. Separate heavy media from lightweight metadata

Store 1TB of videos separately while keeping only 1MB of references in your dataset tables. Break free from rigid structures where media must be embedded inside tables—MediaRef enables flexible, decoupled storage architectures for any format that stores strings.

# Store lightweight references in your dataset, not heavy media
import pandas as pd

# Image references: 37 bytes vs entire embedded image(>100KB)
df_images = pd.DataFrame([
    {"action": [0.1, 0.2], "observation": MediaRef(uri="frame_001.png").model_dump()},
    {"action": [0.3, 0.4], "observation": MediaRef(uri="frame_002.png").model_dump()},
])

# Video frame references: 35-42 bytes vs entire video file embedded(several GBs)
df_video = pd.DataFrame([
    {"action": [0.1, 0.2], "observation": MediaRef(uri="episode_01.mp4", pts_ns=0).model_dump()},
    {"action": [0.3, 0.4], "observation": MediaRef(uri="episode_01.mp4", pts_ns=50_000_000).model_dump()},
])

# Works with any container format (Parquet, HDF5, mcap, rosbag, etc.)
# and any media format (JPEG, PNG, H.264, H.265, AV1, etc.)

MediaRef is already used in production ML data formats at scale. For example, the D2E research project uses MediaRef via OWAMcap to store 10TB+ of gameplay data with screen observations.

2. Future-proof specification built on standards

The MediaRef schema(uri, pts_ns) is designed to be permanent, built entirely on established standards (RFC 2397 for data URIs, RFC 3986 for URI syntax). Use it anywhere with confidence—no proprietary formats, no breaking changes.

3. Optimized performance where it matters

Due to lazy loading, MediaRef has zero CPU and I/O overhead when the media is not accessed. When you do need to load the media, convenient APIs handle the complexity of multi-source media (local files, URLs, embedded data) with a single unified interface.

When loading multiple frames from the same video, batch_decode() opens the video file once and reuses the handle, achieving 4.9× faster throughput and 2.2× better I/O efficiency compared to sequential decoding.

Decoding Benchmark

Benchmark details: Decoding throughput = decoded frames per second during dataloading; I/O efficiency = inverse of disk I/O operations per frame loaded. Measured on real ML dataloader workloads (Minecraft dataset: 64×5 min episodes, 640×360 @ 20Hz, FSLDataset with 4096 token sequences). See D2E paper Section 3 and Appendix A for full methodology.

Installation

Quick install:

# Core package with image loading support
pip install mediaref

# With video decoding support (adds PyAV for video frame extraction)
pip install mediaref[video]

Add to your project:

# Core package
uv add mediaref~=0.5.0

# With video decoding support
uv add 'mediaref[video]~=0.5.0'

Versioning Policy: MediaRef follows semantic versioning. Patch releases (e.g., 0.5.0 → 0.5.1) contain only bug fixes and performance improvements with no API changes. Minor releases (e.g., 0.5.x → 0.6.0) may introduce new features while maintaining backward compatibility. Use ~=0.5.0 to automatically receive patch updates.

Quick Start

Basic Usage

from mediaref import MediaRef, DataURI, batch_decode
import numpy as np

# 1. Create references (lightweight, no loading yet)
ref = MediaRef(uri="image.png")                        # Local file
ref = MediaRef(uri="https://example.com/image.jpg")    # Remote URL
ref = MediaRef(uri="video.mp4", pts_ns=1_000_000_000)  # Video frame at 1.0s

# 2. Load media
rgb = ref.to_ndarray()                                 # Returns (H, W, 3) RGB array
pil = ref.to_pil_image()                               # Returns PIL.Image

# 3. Embed as data URI
data_uri = DataURI.from_image(rgb, format="png")       # e.g., "data:image/png;base64,iVBORw0KG..."
ref = MediaRef(uri=data_uri)                           # Self-contained reference

# 4. Batch decode video frames (opens video once, reuses handle)
refs = [MediaRef(uri="video.mp4", pts_ns=int(i*1e9)) for i in range(10)]
frames = batch_decode(refs)                            # Much faster than loading individually

# 5. Serialize for storage in any container format (Parquet, HDF5, mcap, rosbag, etc.)
json_str = ref.model_dump_json()                       # Lightweight JSON string
# Store in your dataset format of choice - works with any format that stores strings

Batch Decoding - Optimized Video Frame Loading

When loading multiple frames from the same video, use batch_decode() to open the video file once and reuse the handle—achieving significantly better performance than loading frames individually.

from mediaref import MediaRef, batch_decode

# Use optimized batch decoding (default: PyAV backend)
refs = [MediaRef(uri="video.mp4", pts_ns=int(i*1e9)) for i in range(10)]
frames = batch_decode(refs)

# Or use TorchCodec for GPU-accelerated decoding
frames = batch_decode(refs, decoder="torchcodec")  # Requires: pip install torchcodec

Both decoders follow unified playback semantics—querying a timestamp returns the frame being displayed at that moment, ensuring consistent behavior across backends.

Embedding Media Directly in MediaRef

You can embed image data directly into MediaRef objects, making them self-contained and portable (useful for serialization, caching, or sharing).

from mediaref import MediaRef, DataURI
import numpy as np

# Create embedded MediaRef from numpy array
rgb = np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8)
embedded_ref = MediaRef(uri=DataURI.from_image(rgb, format="png"))

# Or from file
embedded_ref = MediaRef(uri=DataURI.from_file("image.png"))

# Or from PIL Image
from PIL import Image
pil_img = Image.open("image.png")
embedded_ref = MediaRef(uri=DataURI.from_image(pil_img, format="jpeg", quality=90))

# Or from BGR array (OpenCV uses BGR by default - input_format="bgr" is REQUIRED)
import cv2
bgr_array = cv2.imread("image.jpg")  # OpenCV loads as BGR, not RGB!
embedded_ref = MediaRef(uri=DataURI.from_image(bgr_array, format="png", input_format="bgr"))

# Use just like any other MediaRef
rgb = embedded_ref.to_ndarray()                        # (H, W, 3) RGB array
pil = embedded_ref.to_pil_image()                      # PIL Image

# Serialize with embedded data
serialized = embedded_ref.model_dump_json()            # Contains image data
restored = MediaRef.model_validate_json(serialized)    # No external file needed!

# Properties
print(data_uri.mimetype)                               # "image/png"
print(len(data_uri))                                   # URI length in bytes
print(data_uri.is_image)                               # True for image/* types

Path Resolution & Serialization

Resolve relative paths and serialize MediaRef objects for storage in any container format (Parquet, HDF5, mcap, rosbag, etc.).

# Resolve relative paths
ref = MediaRef(uri="relative/video.mkv", pts_ns=123456)
resolved = ref.resolve_relative_path("/data/recordings")

# Handle unresolvable URIs (embedded/remote)
remote = MediaRef(uri="https://example.com/image.jpg")
resolved = remote.resolve_relative_path("/data", on_unresolvable="ignore")  # No warning

# Serialization (Pydantic-based) - works with any container format
ref = MediaRef(uri="video.mp4", pts_ns=1_500_000_000)

# As dict (for Python-based formats)
data = ref.model_dump()
# Output: {'uri': 'video.mp4', 'pts_ns': 1500000000}

# As JSON string (for Parquet, HDF5, mcap, rosbag, etc.)
json_str = ref.model_dump_json()
# Output: '{"uri":"video.mp4","pts_ns":1500000000}'

# Deserialization
ref = MediaRef.model_validate(data)                    # From dict
ref = MediaRef.model_validate_json(json_str)           # From JSON

Documentation

API Reference - Detailed API documentation
Playback Semantics - How frame selection works at specific timestamps

Potential Future Enhancements

HuggingFace datasets integration: Add native MediaRef feature type to HuggingFace datasets for seamless integration with the ML ecosystem
msgspec support: Replace pydantic BaseModel into msgspec
Thread-safe resource caching: Implement thread-safe ResourceCache for concurrent video decoding workloads
Audio support: Extend MediaRef to support audio references with timestamp-based extraction
Cloud storage support: Integrate fsspec for cloud URIs (e.g., s3://, gs://, az://)
Additional video decoders: Support for more decoder backends (e.g., OpenCV, decord)

Dependencies

Core dependencies (automatically installed):

pydantic>=2.0 - Data validation and serialization (requires Pydantic v2 API)
numpy - Array operations
opencv-python - Image loading and color conversion
pillow>=9.4.0 - Image loading from various sources
requests>=2.32.2 - HTTP/HTTPS URL loading
loguru - Logging (disabled by default for library code)

Optional dependencies:

[video] extra: av>=15.0 (PyAV for video frame extraction, 15.0+ for FFmpeg 7.0 support)
TorchCodec: torchcodec (install separately for GPU-accelerated decoding)

Acknowledgments

The video decoder interface design references TorchCodec's API design.

License

MediaRef is released under the MIT License.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.1.0

May 6, 2026

1.0.0

Apr 29, 2026

This version

0.5.3

Mar 14, 2026

0.5.2

Feb 20, 2026

0.5.1

Feb 20, 2026

0.5.0

Feb 6, 2026

0.4.4

Nov 25, 2025

0.4.3

Nov 17, 2025

0.4.2

Nov 17, 2025

0.4.1

Oct 30, 2025

0.4.0 yanked

Oct 30, 2025

0.3.1

Oct 29, 2025

0.3.0

Oct 29, 2025

0.2.0

Oct 28, 2025

0.1.0

Oct 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mediaref-0.5.3.tar.gz (6.6 MB view details)

Uploaded Mar 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mediaref-0.5.3-py3-none-any.whl (35.2 kB view details)

Uploaded Mar 14, 2026 Python 3

File details

Details for the file mediaref-0.5.3.tar.gz.

File metadata

Download URL: mediaref-0.5.3.tar.gz
Upload date: Mar 14, 2026
Size: 6.6 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mediaref-0.5.3.tar.gz
Algorithm	Hash digest
SHA256	`849b6148550bb717fe155c424b973ec86eb941afddca7b055169505b5ab909f5`
MD5	`5e690886086fc3c24486589b159d46ec`
BLAKE2b-256	`17b6f1f32c41bc3e09d00ebb6fb17ab179c74e09ba56bb414a28017874921f24`

See more details on using hashes here.

File details

Details for the file mediaref-0.5.3-py3-none-any.whl.

File metadata

Download URL: mediaref-0.5.3-py3-none-any.whl
Upload date: Mar 14, 2026
Size: 35.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mediaref-0.5.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b387b5e9e224d45d6883316c5f9ba778e55e7b1171c3e4234be38a4679cbd0cd`
MD5	`2d5aa21d952d9d92841c176d94acf538`
BLAKE2b-256	`05bd3cf69f6c0c6acbdd962dcf1ca94dc8b22f06cce8c6de7b1ae615353b2ab5`

See more details on using hashes here.

mediaref 0.5.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

MediaRef

Why MediaRef?

Installation

Quick Start

Basic Usage

Batch Decoding - Optimized Video Frame Loading

Embedding Media Directly in MediaRef

Path Resolution & Serialization

Documentation

Potential Future Enhancements

Dependencies

Acknowledgments

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes