Tools for reading and writing videos, and loading them efficiently with PyTorch.

These details have not been verified by PyPI

Project links

Repository

Project description

parallel-video-io

Tools for reading and writing videos and for efficient frame-level loading with PyTorch.

This repository provides small, focused utilities around video I/O and a PyTorch-friendly iterable dataset + dataloader that make it easy to stream frames from many videos or directories of image frames in parallel.

Key features

Read frames from videos (random access or sequential) using imageio/ffmpeg.
Write sequences of numpy frames to H.264 MP4 files with sane defaults.
PyTorch-compatible VideoCollectionDataset and VideoCollectionDataLoader that provide a simple iterator that uses multiple processes to load data from different videos under the hood. This is especially handy for running trained deep learning models on many videos in production.

Installation
Quick examples
Testing
Notes & troubleshooting

Installation

This project targets Python >= 3.10. The library's runtime dependencies are listed in pyproject.toml (torch, imageio, imageio-ffmpeg, torchcodec, joblib, tqdm, numpy, pytest).

If you're using pip in a development environment, install editable with:

pip install -e .

Or with Poetry:

poetry install

You can include this package as a dependency for your project by including the following in your pyproject.toml:

[project]
# ... other stuff
dependencies = [
    # ...
    "parallel-video-io @ git+https://github.com/sibocw/parallel-video-io.git",
]

Make sure ffmpeg is available on your $PATH (required by imageio-ffmpeg).

Quick examples

These examples use NumPy arrays for frames in (height, width, channels) order and uint8 dtype.

Reading video metadata

from pvio.video_io import get_video_metadata, check_num_frames

# To get the number of frames in a video
n_frames = check_num_frames("example.mp4")
print(n_frames)  # this is an integer frame count

# To get more information
# Note that this function actually caches these information in a JSON file. To control
# whether you want to save the cache file or disregard existing cache files, set the
# `cache_metadata` (default True) and `use_cached_metadata` (default True) arguments.
meta = get_video_metadata("example.mp4")
print(meta)  # meta is a dictionary containing the keys "n_frames", "frame_size", "fps"

Reading video frames

from pvio.video_io import read_frames_from_video

# You can read a whole video
frames, fps = read_frames_from_video("example.mp4")

# ... or just some frames
frames, fps = read_frames_from_video("example.mp4", frame_indices=[0, 5])

Writing a video

import numpy as np
from pvio.video_io import write_frames_to_video

# Create dummy 32x32 RGB frames (H, W, C)
frames = [np.full((32, 32, 3), fill_value=i, dtype=np.uint8) for i in range(10)]

# Save them to file
# There are more complex video writing parameters that can be tuned - see the docstring
# for details.
write_frames_to_video("example.mp4", frames, fps=25.0)

Notes: the writer verifies that all frames share the same (height, width). FFmpeg can automatically resize frames to meet codec alignment requirements; for deterministic results, use dimensions divisible by 16.

Using the PyTorch dataset and dataloader

The VideoCollectionDataset iterates frames either from video files or from directories containing individual image frames.

from pvio.torch import VideoCollectionDataset, VideoCollectionDataLoader

# Initialize Dataset from video files
paths = ["/path/to/video1.mp4", "/path/to/video2.mp4"]
ds = VideoCollectionDataset(paths)
# ... or from directories containing individual frames as images
paths = ["/path/to/frames_dir1", "/path/to/frames_dir2"]
# To control sorting of frame files within each dir, use the `frame_sorting` argument
# (see docstring for details)
ds = VideoCollectionDataset(paths, as_image_dirs=True)

# Wrap in the special DataLoader
# (you can add other DataLoader keyword arguments if you wish)
loader = VideoCollectionDataLoader(ds, batch_size=8, num_workers=4)

# Now you can iterate over all frames from all videos in a single iterator. Behind the
# scenes, these frames are fetched in parallel (each worker handles one video at a time)
for batch in loader:
	frames = batch["frames"]  # torch.Tensor: B x C x H x W
	video_paths = batch["video_paths"]  # list of Path or str, depending on input
	frame_indices = batch["frame_indices"]  # list of int

When loading from video files (as_image_dirs=False), the dataset uses torchcodec's VideoDecoder to decode frames and get_video_metadata to build per-video frame counts; you may want to enable caching if you index many large files.

Testing

The test suite uses pytest. Run it from the repository root:

pytest tests

There are a few tests that write small MP4 files using imageio/ffmpeg; ensure ffmpeg is available in the environment where tests run.

Notes & troubleshooting

FFmpeg macroblock constraints: some ffmpeg builds require frame dimensions to be divisible by 16. If you see a warning about macro_block_size=16 and unexpected resizing, choose frame sizes divisible by 16 in production pipelines.
If you plan to decode many large videos, enabling metadata caching (the package writes a .metadata.json next to each video when get_video_metadata is called) will speed up repeated indexing.
The PyTorch loader expects the dataset passed to VideoCollectionDataLoader to be an instance of VideoCollectionDataset and enforces the built-in collate function.

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

0.1.5

May 20, 2026

This version

0.1.4

Oct 28, 2025

0.1.3

Oct 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parallel_video_io-0.1.4.tar.gz (8.7 kB view details)

Uploaded Oct 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

parallel_video_io-0.1.4-py3-none-any.whl (10.4 kB view details)

Uploaded Oct 28, 2025 Python 3

File details

Details for the file parallel_video_io-0.1.4.tar.gz.

File metadata

Download URL: parallel_video_io-0.1.4.tar.gz
Upload date: Oct 28, 2025
Size: 8.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for parallel_video_io-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`d10f797a7f63384966937c1dd7e71a53c1ea2c0f85bc29cf70160bb4fac485f0`
MD5	`60dd644e2717687161a794e809221416`
BLAKE2b-256	`769252659ef0919c7067181ce5f02e44e15a861fc994a504470d702af2e53875`

See more details on using hashes here.

File details

Details for the file parallel_video_io-0.1.4-py3-none-any.whl.

File metadata

Download URL: parallel_video_io-0.1.4-py3-none-any.whl
Upload date: Oct 28, 2025
Size: 10.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for parallel_video_io-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`878a89dadc48e771bb6a1cc5e37355d03728746a6f66ab38952bdc2d334293fc`
MD5	`354aad2401c23decfdfba67e9e13b065`
BLAKE2b-256	`ab381489f079baeac4ffa74f4a264ed3b7b723d18c76296ced817fe8b78a06a7`

See more details on using hashes here.

parallel-video-io 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

parallel-video-io

Key features

Table of contents

Installation

Quick examples

Reading video metadata

Reading video frames

Writing a video

Using the PyTorch dataset and dataloader

Testing

Notes & troubleshooting

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes