Codec-aware video preprocessing for training and inference
Project description
codec-video-prep (v0.2.4)
Codec-aware video preprocessing for training and inference. Extracts codec-level bitcost information from H.264 / HEVC / VP9 videos and turns them into patch-canvases ready for downstream vision models.
What it does
- Patched FFmpeg decoder – Instruments the H.264 / HEVC / VP9 decoder to export per-macroblock (H.264) or per-CTU (HEVC) bitcost maps during decoding.
- Fast C++ extension (
cv_reader_fast) – Decodes video with loop-filter / IDCT skipped and optionally returns bitcost data as NumPy arrays. - Readiness grouping – Groups frames by compressibility (bitcost) so that hard-to-decode regions get more patches.
- Top-K patch selection – Selects the most informative 2×2 patch blocks from each group and packs them into JPG/PNG canvases.
- One-command pipeline – From a raw video to a folder of canvases + metadata in a single call.
Install
From PyPI (recommended)
python -m pip install -i https://pypi.org/simple/ codec-video-prep==0.2.4
Verify the installation:
codec-video-prep-doctor
From wheel file
python -m pip install codec_video_prep-0.2.4-*.whl
Build from source
-
Build the patched FFmpeg shared libraries:
- Pixel-capable (recommended — supports both bitcost and BGR pixel export):
bash build_pixel_ffmpeg.sh - Legacy skip-IDCT (faster bitcost-only scan, no pixel output):
bash scripts/build_patched_ffmpeg.sh
- Pixel-capable (recommended — supports both bitcost and BGR pixel export):
-
Build and install the Python package:
python -m pip install -e .
CLI Usage (codec-video-prep)
Quick start
codec-video-prep \
--video /path/to/video.mp4 \
--out_dir ./preinfer_out \
--num_sampled_frames 1024 \
--group_size 32 \
--images_per_group 4 \
--patch 14 \
--max_pixels 153664
Full parameter list
Input / Output
| Parameter | Default | Description |
|---|---|---|
--video |
required | Path to input video file |
--out_dir |
required | Output directory for canvases and metadata |
--canvas_format |
jpg |
Canvas image format: jpg or png |
--save_mask_video |
False |
Save a side-by-side mask visualization video |
Frame Sampling
| Parameter | Default | Description |
|---|---|---|
--frame_sampling_mode |
uniform_count |
How to sample frames: fps, uniform_count, pkt_size_peak, fps_plus_pkt_size_peak, all_frames |
--sample_fps |
4.0 |
Target FPS when frame_sampling_mode=fps |
--num_sampled_frames |
1024 |
Exact number of frames to uniformly sample when frame_sampling_mode=uniform_count |
--avoid_keyframes / --no_avoid_keyframes |
True |
Shift sampled frames away from keyframes to avoid decoder drift |
Grouping
| Parameter | Default | Description |
|---|---|---|
--grouping_mode |
readiness |
Grouping strategy: readiness (dynamic) or fixed (fixed-size) |
--group_size |
32 |
Max frames per group (for fixed mode or readiness window) |
--images_per_group |
4 |
Number of patch canvases to extract per group |
--min_group_frames |
8 |
Minimum frames per readiness group |
--max_group_frames |
64 |
Maximum frames per readiness group |
Readiness Threshold (when --grouping_mode readiness)
| Parameter | Default | Description |
|---|---|---|
--readiness_sum_threshold_mode |
legacy |
Threshold mode: legacy, auto, fixed, clamped_sqrt_bpppf |
--readiness_sum_threshold |
0.0 |
Fixed threshold (used by legacy and fixed modes) |
--readiness_norm_sum_threshold |
2250000.0 |
Normalized threshold (used by clamped_sqrt_bpppf mode) |
--readiness_coverage_bins |
3 |
Minimum temporal bins that selected patches must cover |
--readiness_delta_ratio |
0.05 |
Stop extending group when score gain drops below this ratio |
Resolution
| Parameter | Default | Description |
|---|---|---|
--patch |
14 |
Vision model patch size (e.g. 14 for ViT) |
--max_pixels |
153664 |
Max pixels per canvas (resize limit) |
--max_dim |
616 |
Max dimension (width or height) before resize |
--block_size |
2 |
Block size for patch grouping (2×2 or 3×3) |
--no_resize |
False |
Disable resize entirely |
Bitcost Scoring
| Parameter | Default | Description |
|---|---|---|
--bitcost_grid |
adaptive |
Bitcost granularity: sub, mb, ctu, adaptive |
--bitcost_pct |
99.0 |
Percentile for bitcost normalization |
--bitcost_log_scale / --no_bitcost_log_scale |
True |
Apply log scale to bitcost scores |
--disable_target_only |
False |
Disable decoder-internal target-frame-only bitcost pruning |
Decode Backend
| Parameter | Default | Description |
|---|---|---|
--decode_backend |
ffmpeg_native |
Decoder backend: ffmpeg_native or cv_reader_pixels |
--parallel_segments |
0 |
Number of parallel decode segments (0 = serial) |
--threads_per_segment |
4 |
FFmpeg thread count per segment worker |
--segment_guard_frames |
30 |
Extra frames around segment boundaries for keyframe-seek safety |
CLI Example: Reproduce legacy benchmark settings
for id in 001 002 003 004 005 006 007 008 009 010; do
codec-video-prep \
--video /data/videommev2/${id}.mp4 \
--out_dir ./output/${id} \
--num_sampled_frames 512 \
--group_size 32 \
--images_per_group 4 \
--patch 14 \
--max_pixels 313600 \
--min_group_frames 8 \
--max_group_frames 128 \
--bitcost_grid sub \
--grouping_mode readiness \
--frame_sampling_mode uniform_count \
--readiness_sum_threshold_mode auto \
--decode_backend cv_reader_pixels \
--no_avoid_keyframes \
--parallel_segments 32 \
--threads_per_segment 1 \
--disable_target_only
done
Output files
After running, the output directory contains:
| File | Description |
|---|---|
canvas_*.jpg |
Packed patch canvases |
meta.json |
Full metadata, config, timing breakdown, and group info |
frame_ids.npy |
Sampled frame indices |
src_patch_position.npy |
Source patch positions (group, patch, y1, x1, y2, x2) |
Python API
High-level one-shot call (run_preinfer)
from codec_video_prep import run_preinfer
result = run_preinfer(
video="/path/to/video.mp4",
out_dir="./preinfer_out",
num_sampled_frames=1024,
group_size=32,
images_per_group=4,
patch=14,
max_pixels=153664,
min_group_frames=8,
max_group_frames=64,
bitcost_grid="adaptive",
grouping_mode="readiness",
frame_sampling_mode="uniform_count",
sample_fps=4.0,
readiness_sum_threshold=0.0,
readiness_sum_threshold_mode="legacy",
readiness_norm_sum_threshold=2250000.0,
avoid_keyframes=True,
decode_backend="cv_reader_pixels", # or "ffmpeg_native"
parallel_segments=4,
threads_per_segment=4,
segment_guard_frames=30,
)
print(result.out_dir) # output directory path
print(result.meta_path) # path to meta.json
print(result.canvas_files) # list of canvas image paths
print(result.timings) # dict of timing breakdowns
All parameters mirror the CLI arguments.
Using PreinferConfig directly
from codec_video_prep import run_preinfer_config, PreinferConfig
cfg = PreinferConfig(
video="/path/to/video.mp4",
out_dir="./preinfer_out",
num_sampled_frames=512,
group_size=32,
images_per_group=4,
patch=14,
max_pixels=313600,
decode_backend="cv_reader_pixels",
parallel_segments=32,
threads_per_segment=1,
)
result = run_preinfer_config(cfg)
Low-level fast decoder (cv_reader_fast)
from codec_video_prep import cv_reader_fast
# Decode ALL frames with bitcost export
frames = cv_reader_fast.read_video_fast(
path="/path/to/video.mp4",
thread_count=16,
export_bitcost=1,
thread_type="auto", # "auto" selects "slice" when export_bitcost=1
)
# Decode SELECTED frames only (bitcost + optional pixels)
selected = cv_reader_fast.read_video_fast_selected(
path="/path/to/video.mp4",
frame_ids=[0, 30, 60, 90],
thread_count=16,
export_bitcost=1,
export_pixels=1, # also return BGR pixels
out_w=224, # optional resize width
out_h=224, # optional resize height
thread_type="slice", # recommended for bitcost stability
)
# Segment seek + decode (used internally for parallel workers)
segment = cv_reader_fast.read_video_fast_selected_segment(
path="/path/to/video.mp4",
frame_ids=[30, 60, 90],
seek_frame=0, # seek target (decoder lands on nearest keyframe before this)
end_frame=120, # stop after this frame index
thread_count=4,
export_bitcost=1,
export_pixels=1,
out_w=224,
out_h=224,
)
Each returned frame dict contains:
| Key | Type | Description |
|---|---|---|
frame_idx |
int |
Frame index |
pict_type |
str |
'I', 'P' or 'B' |
width / height |
int |
Frame resolution |
codec_name |
str |
Decoder name (h264, hevc, vp9, ...) |
bitcost |
dict |
MB/CTU bitcost arrays (when export_bitcost=1) |
pixels |
np.ndarray |
(H, W, 3) uint8 BGR array (when export_pixels=1) |
The bitcost dict has one or more of these keys depending on codec and grid:
| Key | Shape | Description |
|---|---|---|
mb_bit_cost |
(mb_h, mb_w) |
Macroblock-level bitcost (H.264) |
ctu_bit_cost |
(ctu_h, ctu_w) |
CTU-level bitcost (HEVC/VP9) |
sub_mb_bit_cost |
(sub_h, sub_w) |
Sub-block bitcost (finer granularity) |
Important: Threading mode for bitcost
When export_bitcost=1, always use thread_type="slice" (or "auto" which automatically selects "slice" for HEVC/H.264). Frame threading ("frame") can drop opaque_ref under the new bitcost_only patch, causing some frames to return empty bitcost.
# Correct — stable bitcost
selected = cv_reader_fast.read_video_fast_selected(
path="video.mp4",
frame_ids=[0, 10, 20],
export_bitcost=1,
thread_type="slice",
)
# Risky — may lose bitcost on some frames
selected = cv_reader_fast.read_video_fast_selected(
path="video.mp4",
frame_ids=[0, 10, 20],
export_bitcost=1,
thread_type="frame",
)
Build a manylinux wheel
# Build cp310 first (compiles FFmpeg)
PY_TAG=cp310-cp310 bash scripts/build_manylinux_wheel.sh
# Build remaining versions reusing FFmpeg
REUSE_FFMPEG=1 PY_TAG=cp311-cp311 bash scripts/build_manylinux_wheel.sh
REUSE_FFMPEG=1 PY_TAG=cp312-cp312 bash scripts/build_manylinux_wheel.sh
REUSE_FFMPEG=1 PY_TAG=cp313-cp313 bash scripts/build_manylinux_wheel.sh
Output:
wheelhouse/codec_video_prep-0.2.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Diagnostics
codec-video-prep-doctor
Checks:
cv_reader_fastC extension can be imported- Bundled FFmpeg shared libraries are present
- Threading defaults (
slicefor bitcost, 16 threads)
Backward Compatibility
The old import path and CLI names are kept as aliases:
compressed_video_preinfercv-preinfercv-preinfer-doctor
Requirements
- Python ≥ 3.10
- numpy >= 1.23, < 2.0
- opencv-python-headless < 4.12
- Pillow
- Patched FFmpeg shared libraries (bundled in the wheel or built from
scripts/build_pixel_ffmpeg.sh)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codec_video_prep-0.2.5-cp313-cp313-manylinux_2_35_aarch64.whl.
File metadata
- Download URL: codec_video_prep-0.2.5-cp313-cp313-manylinux_2_35_aarch64.whl
- Upload date:
- Size: 22.8 MB
- Tags: CPython 3.13, manylinux: glibc 2.35+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fec907da252d692039f47c4559aed67862f493a982cfcae5d9d64cc023d4499f
|
|
| MD5 |
93e3b108573059e60decac5aaa8c30fc
|
|
| BLAKE2b-256 |
eba1055d3b0067c0b0f8f94f2633a8a34f34dc8d91f207a74b29572798f08e23
|
File details
Details for the file codec_video_prep-0.2.5-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.
File metadata
- Download URL: codec_video_prep-0.2.5-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
- Upload date:
- Size: 29.8 MB
- Tags: CPython 3.13, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da54cea57483f309106b679436e23817dc1150e91c8c1a69d5fe76d31edb63d7
|
|
| MD5 |
5dfc7f82c90ffa98c7e44e329fe6cc5e
|
|
| BLAKE2b-256 |
0da717e4778a63cd885691a0f968ba7b4b9aaa27caf2d6d3c8dd4be3b2e0f077
|
File details
Details for the file codec_video_prep-0.2.5-cp312-cp312-manylinux_2_35_aarch64.whl.
File metadata
- Download URL: codec_video_prep-0.2.5-cp312-cp312-manylinux_2_35_aarch64.whl
- Upload date:
- Size: 22.8 MB
- Tags: CPython 3.12, manylinux: glibc 2.35+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c992c577430ce9b43545df23acd29eb79e7eace860fef942496b3a37a25b6ef
|
|
| MD5 |
29ef52349006f456ec288829fc858688
|
|
| BLAKE2b-256 |
8f38081090c9ade816ba0ba1456afe09aba6bb9cf0809eeeb0716e0f1f3f300a
|
File details
Details for the file codec_video_prep-0.2.5-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.
File metadata
- Download URL: codec_video_prep-0.2.5-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
- Upload date:
- Size: 29.8 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fdf52a26a3499b915a3921926391ab78afe0bc703697eacf7da187c43bfbab6
|
|
| MD5 |
78b1051106d3eda834cb315d69a5bbb6
|
|
| BLAKE2b-256 |
b4422d51d621a61b604dd70a748e8acee51e432471d488f61dddded4f52ab255
|
File details
Details for the file codec_video_prep-0.2.5-cp311-cp311-manylinux_2_35_aarch64.whl.
File metadata
- Download URL: codec_video_prep-0.2.5-cp311-cp311-manylinux_2_35_aarch64.whl
- Upload date:
- Size: 22.8 MB
- Tags: CPython 3.11, manylinux: glibc 2.35+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af4a2aefb707a190f720d3b298d58acb5c1a531f9c9bc0ff9b9b01507811fb80
|
|
| MD5 |
8268d8fac83083ee329b96d2ae604007
|
|
| BLAKE2b-256 |
183377a31206054918270779776af81848bcd463cdc8422d38e4d57ba212a043
|
File details
Details for the file codec_video_prep-0.2.5-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.
File metadata
- Download URL: codec_video_prep-0.2.5-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
- Upload date:
- Size: 29.8 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1e43ef733a84ca52ef5e73c372cf3db3646b8a75ee27e2f83e23ae51a430d6b
|
|
| MD5 |
b7906d11e558ceb87b7ce05ce3bb4ed3
|
|
| BLAKE2b-256 |
8ed38bbf55f09c55274f498efbe1873559e1da4cdfce0bdd53226230e92e53b8
|
File details
Details for the file codec_video_prep-0.2.5-cp310-cp310-manylinux_2_35_aarch64.whl.
File metadata
- Download URL: codec_video_prep-0.2.5-cp310-cp310-manylinux_2_35_aarch64.whl
- Upload date:
- Size: 22.8 MB
- Tags: CPython 3.10, manylinux: glibc 2.35+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7c32e51dbc6b0045d2d71db04550279db067730e3b5551ec86cbae591c713a9
|
|
| MD5 |
18d4bc18d5b33a3512f5c8b87a942f1f
|
|
| BLAKE2b-256 |
156b2903cc5b4d3de23d80b422f186f0b95c1fe90b93d1860128e17de20f4e0c
|
File details
Details for the file codec_video_prep-0.2.5-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.
File metadata
- Download URL: codec_video_prep-0.2.5-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
- Upload date:
- Size: 29.8 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0826583f16efcf19523b69a7c102b87332cbaf6c78ecc6fab92c081ac54df666
|
|
| MD5 |
6a5c74a1af2bebeb9b29f1ad332b6fe1
|
|
| BLAKE2b-256 |
77c5edd9759ce8b24038ebc6f5a6a0979eb3ea1cd13259227d8dd0ee737430e4
|
File details
Details for the file codec_video_prep-0.2.5-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.
File metadata
- Download URL: codec_video_prep-0.2.5-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
- Upload date:
- Size: 29.8 MB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cda839fe4cb481deb9d4bbaeda44195aef929f3741368e712db4bc3f2d7a4961
|
|
| MD5 |
a26f79058c17346bbeaeca838ff7feab
|
|
| BLAKE2b-256 |
82f7db9c442cccbcd53c0b2e2a1888d78c420ab51dbed2f72dd53dd52a8ee895
|