Pydantic media reference for images and video frames with lazy loading and optimized batch decoding

These details have not been verified by PyPI

Project description

MediaRef

Pydantic media reference for images and video frames (with timestamp support) from data URIs, HTTP URLs, file URIs, and local paths. Features lazy loading and optimized batch video decoding.

Installation

# Core package with image loading support
pip install mediaref

# With video decoding support (adds PyAV for video frame extraction)
pip install mediaref[video]

Quick Start

Basic Usage

from mediaref import MediaRef, DataURI, batch_decode
import numpy as np

# 1. Create references (lightweight, no loading yet)
ref = MediaRef(uri="image.png")                        # Local file
ref = MediaRef(uri="https://example.com/image.jpg")    # Remote URL
ref = MediaRef(uri="video.mp4", pts_ns=1_000_000_000)  # Video frame at 1.0s

# 2. Load media
rgb = ref.to_ndarray()                                 # Returns (H, W, 3) RGB array
pil = ref.to_pil_image()                               # Returns PIL.Image

# 3. Embed as data URI
data_uri = DataURI.from_image(rgb, format="png")       # e.g., "data:image/png;base64,iVBORw0KG..."
ref = MediaRef(uri=data_uri)                           # Self-contained reference

# 4. Batch decode video frames (opens video once, reuses handle)
refs = [MediaRef(uri="video.mp4", pts_ns=int(i*1e9)) for i in range(10)]
frames = batch_decode(refs)                            # Much faster than loading individually

Batch Decoding - Optimized Video Frame Loading

When loading multiple frames from the same video, batch_decode() opens the video file once and reuses the handle, achieving 4.9× faster throughput and 41× better I/O efficiency compared to existing methods.

Decoding Benchmark

Benchmark details: Measured on real ML dataloader workloads (Minecraft dataset: 64×5 min episodes, 640×360 @ 20Hz, FSLDataset with 4096 token sequences) vs baseline and TorchCodec v0.6.0. See D2E paper Section 3 and Appendix A for full methodology.

from mediaref import MediaRef, batch_decode
from mediaref.video_decoder import BatchDecodingStrategy

# Use optimized batch decoding with adaptive strategy (default, recommended)
refs = [MediaRef(uri="video.mp4", pts_ns=int(i*1e9)) for i in range(10)]
frames = batch_decode(
    refs,
    # Our optimized implementation based on PyAV
    decoder="pyav",
    # Our adaptive strategy for optimal performance
    strategy=BatchDecodingStrategy.SEQUENTIAL_PER_KEYFRAME_BLOCK
)

# Or use TorchCodec for GPU-accelerated decoding
frames = batch_decode(refs, decoder="torchcodec")  # Requires: pip install torchcodec>=0.4.0

Embedding Media Directly in MediaRef

You can embed image data directly into MediaRef objects, making them self-contained and portable (useful for serialization, caching, or sharing).

from mediaref import MediaRef, DataURI
import numpy as np

# Create embedded MediaRef from numpy array
rgb = np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8)
embedded_ref = MediaRef(uri=DataURI.from_image(rgb, format="png"))

# Or from file
embedded_ref = MediaRef(uri=DataURI.from_file("image.png"))

# Or from PIL Image
from PIL import Image
pil_img = Image.open("image.png")
embedded_ref = MediaRef(uri=DataURI.from_image(pil_img, format="jpeg", quality=90))

# Or from BGR array (OpenCV uses BGR by default - input_format="bgr" is REQUIRED)
import cv2
bgr_array = cv2.imread("image.jpg")  # OpenCV loads as BGR, not RGB!
embedded_ref = MediaRef(uri=DataURI.from_image(bgr_array, format="png", input_format="bgr"))

# Use just like any other MediaRef
rgb = embedded_ref.to_ndarray()                        # (H, W, 3) RGB array
pil = embedded_ref.to_pil_image()                      # PIL Image

# Serialize with embedded data
serialized = embedded_ref.model_dump_json()            # Contains image data
restored = MediaRef.model_validate_json(serialized)    # No external file needed!

# Properties
print(data_uri.mimetype)                               # "image/png"
print(len(data_uri))                                   # URI length in bytes
print(data_uri.is_image)                               # True for image/* types

Path Resolution & Serialization

Resolve relative paths and serialize MediaRef objects for dataset metadata and storage.

# Resolve relative paths
ref = MediaRef(uri="relative/video.mkv", pts_ns=123456)
resolved = ref.resolve_relative_path("/data/recordings")

# Handle unresolvable URIs (embedded/remote)
remote = MediaRef(uri="https://example.com/image.jpg")
resolved = remote.resolve_relative_path("/data", on_unresolvable="ignore")  # No warning

# Serialization (Pydantic-based)
data = ref.model_dump()                                # {'uri': '...', 'pts_ns': ...}
json_str = ref.model_dump_json()                       # JSON string
ref = MediaRef.model_validate(data)                    # From dict
ref = MediaRef.model_validate_json(json_str)           # From JSON

API Reference

MediaRef(uri: str | DataURI, pts_ns: int | None = None)

Properties: is_embedded, is_video, is_remote, is_relative_path

Methods:

to_ndarray(format="rgb", **kwargs) -> np.ndarray - Load as numpy array
- Formats: "rgb" (default), "bgr", "rgba", "bgra", "gray"
- Returns: (H, W, 3) for RGB/BGR, (H, W, 4) for RGBA/BGRA, (H, W) for grayscale
to_pil_image(**kwargs) -> PIL.Image - Load as PIL Image
resolve_relative_path(base_path, on_unresolvable="warn") -> MediaRef - Resolve relative paths
- on_unresolvable: How to handle embedded/remote URIs: "error", "warn" (default), or "ignore"
validate_uri() -> bool - Check if URI exists (local files only)
model_dump() -> dict - Serialize to dict
model_dump_json() -> str - Serialize to JSON
model_validate(data) -> MediaRef - Deserialize from dict
model_validate_json(json_str) -> MediaRef - Deserialize from JSON

DataURI (for embedding media)

Class Methods:

from_image(image: np.ndarray | PIL.Image, format="png", quality=None, input_format="rgb") -> DataURI - Create from image
- format: Output format ("png", "jpeg", "bmp")
- quality: JPEG quality (1-100), ignored for PNG/BMP
- input_format: Input channel order for numpy arrays. Default: "rgb". Ignored for PIL Images.
  - "rgb": RGB format (3 channels)
  - "bgr": BGR format (3 channels) - REQUIRED for OpenCV arrays (e.g., cv2.imread())
  - "rgba": RGBA format (4 channels)
  - "bgra": BGRA format (4 channels)
- PNG format preserves alpha channel; JPEG/BMP drop alpha
from_file(path: str | Path, format=None) -> DataURI - Create from file
from_uri(uri: str) -> DataURI - Parse data URI string

Methods:

to_ndarray(format="rgb") -> np.ndarray - Convert to numpy array
- Formats: "rgb" (default), "bgr", "rgba", "bgra", "gray"
to_pil_image() -> PIL.Image - Convert to PIL Image

Properties:

uri: str - Full data URI string
is_image: bool - True if MIME type is image/*

Functions

batch_decode(refs, strategy=None, decoder="pyav", **kwargs) -> list[np.ndarray] - Batch decode using optimized batch decoding API
- refs: List of MediaRef objects to decode
- strategy: Batch decoding strategy (PyAV only): SEPARATE, SEQUENTIAL, or SEQUENTIAL_PER_KEYFRAME_BLOCK (default)
- decoder: Decoder backend ("pyav" or "torchcodec")
cleanup_cache() - Clear video container cache (PyAV only)

Video Decoders (requires `[video]` extra)

PyAVVideoDecoder(source) - PyAV-based decoder with batch decoding strategies
- Supports batch decoding strategies: SEPARATE, SEQUENTIAL, SEQUENTIAL_PER_KEYFRAME_BLOCK
- CPU-based decoding using FFmpeg
- Automatic container caching with reference counting
TorchCodecVideoDecoder(source) - TorchCodec-based decoder for GPU acceleration
- Requires torchcodec>=0.4.0 (install separately)
- GPU-accelerated decoding with CUDA support
- Does not support batch decoding strategies (parameter ignored)

Decoder Comparison:

Feature	PyAVVideoDecoder	TorchCodecVideoDecoder
Batch decoding strategies	✅ Full support	❌ Not supported (ignored)
GPU acceleration	❌ CPU only	✅ CUDA support
Backend	PyAV (FFmpeg)	TorchCodec (FFmpeg)
Installation	`pip install mediaref[video]`	`pip install torchcodec>=0.4.0`

Design Notes

Video container caching: Uses reference counting with LRU eviction (default: 10 containers)
Garbage collection: Triggered every 10 PyAV operations to handle FFmpeg reference cycles
Cache size: Configurable via AV_CACHE_SIZE environment variable
Lazy loading: Video dependencies only imported when needed (not at module import time)

Acknowledgments

The video decoder interface design references TorchCodec's API design.

Dependencies

Core dependencies (automatically installed):

pydantic>=2.0 - Data validation and serialization (requires Pydantic v2 API)
numpy - Array operations
opencv-python - Image loading and color conversion
pillow>=9.4.0 - Image loading from various sources
requests>=2.32.2 - HTTP/HTTPS URL loading
loguru - Logging (disabled by default for library code)

Optional dependencies:

[video] extra: av>=15.0 (PyAV for video frame extraction)
TorchCodec: torchcodec>=0.4.0 (install separately for GPU-accelerated decoding)

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.1.0

May 6, 2026

1.0.0

Apr 29, 2026

0.5.3

Mar 14, 2026

0.5.2

Feb 20, 2026

0.5.1

Feb 20, 2026

0.5.0

Feb 6, 2026

0.4.4

Nov 25, 2025

0.4.3

Nov 17, 2025

0.4.2

Nov 17, 2025

This version

0.4.1

Oct 30, 2025

0.4.0 yanked

Oct 30, 2025

0.3.1

Oct 29, 2025

0.3.0

Oct 29, 2025

0.2.0

Oct 28, 2025

0.1.0

Oct 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mediaref-0.4.1.tar.gz (108.5 kB view details)

Uploaded Oct 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mediaref-0.4.1-py3-none-any.whl (34.7 kB view details)

Uploaded Oct 30, 2025 Python 3

File details

Details for the file mediaref-0.4.1.tar.gz.

File metadata

Download URL: mediaref-0.4.1.tar.gz
Upload date: Oct 30, 2025
Size: 108.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.9.6

File hashes

Hashes for mediaref-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`933024a8638f9d63e04c217b9dd11a16b87b9a6789a1d28c87c82c2a7f0975d8`
MD5	`2795f1dc2dbce89283bcf8a2ff0d29ba`
BLAKE2b-256	`e0409108e91cc39bb3bc925b021bdb547d374ad914b8e1a281530fd1e7796321`

See more details on using hashes here.

File details

Details for the file mediaref-0.4.1-py3-none-any.whl.

File metadata

Download URL: mediaref-0.4.1-py3-none-any.whl
Upload date: Oct 30, 2025
Size: 34.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.9.6

File hashes

Hashes for mediaref-0.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f8709aaee183163e4f97f96669a6162cfc9cf23216ce0ecb7090c39df4e5cf4c`
MD5	`1ce32b85de97b6864ac632a21ceaf53f`
BLAKE2b-256	`23c1a02207de373f5264fcd0bfb9c01e3fb23b0f40876465e1fc10855b2dc2ec`

See more details on using hashes here.

mediaref 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

MediaRef

Installation

Quick Start

Basic Usage

Batch Decoding - Optimized Video Frame Loading

Embedding Media Directly in MediaRef

Path Resolution & Serialization

API Reference

MediaRef(uri: str | DataURI, pts_ns: int | None = None)

DataURI (for embedding media)

Functions

Video Decoders (requires `[video]` extra)

Design Notes

Acknowledgments

Dependencies

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

mediaref 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

MediaRef

Installation

Quick Start

Basic Usage

Batch Decoding - Optimized Video Frame Loading

Embedding Media Directly in MediaRef

Path Resolution & Serialization

API Reference

MediaRef(uri: str | DataURI, pts_ns: int | None = None)

DataURI (for embedding media)

Functions

Video Decoders (requires [video] extra)

Design Notes

Acknowledgments

Dependencies

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Video Decoders (requires `[video]` extra)