Codec-aware video preprocessing for training and inference

Project description

codec-video-prep (v0.2.4)

Codec-aware video preprocessing for training and inference. Extracts codec-level bitcost information from H.264 / HEVC / VP9 videos and turns them into patch-canvases ready for downstream vision models.

What it does

Patched FFmpeg decoder – Instruments the H.264 / HEVC / VP9 decoder to export per-macroblock (H.264) or per-CTU (HEVC) bitcost maps during decoding.
Fast C++ extension (cv_reader_fast) – Decodes video with loop-filter / IDCT skipped and optionally returns bitcost data as NumPy arrays.
Readiness grouping – Groups frames by compressibility (bitcost) so that hard-to-decode regions get more patches.
Top-K patch selection – Selects the most informative 2×2 patch blocks from each group and packs them into JPG/PNG canvases.
One-command pipeline – From a raw video to a folder of canvases + metadata in a single call.

Install

From PyPI (recommended)

python -m pip install codec-video-prep==0.2.4

Verify the installation:

codec-video-prep-doctor

From wheel file

python -m pip install codec_video_prep-0.2.4-*.whl

Build from source

Build the patched FFmpeg shared libraries:
- Pixel-capable (recommended — supports both bitcost and BGR pixel export):
```
bash build_pixel_ffmpeg.sh
```
- Legacy skip-IDCT (faster bitcost-only scan, no pixel output):
```
bash scripts/build_patched_ffmpeg.sh
```
Build and install the Python package:

python -m pip install -e .

CLI Usage (`codec-video-prep`)

Quick start

codec-video-prep \
  --video /path/to/video.mp4 \
  --out_dir ./preinfer_out \
  --num_sampled_frames 1024 \
  --group_size 32 \
  --images_per_group 4 \
  --patch 14 \
  --max_pixels 153664

Full parameter list

Input / Output

Parameter	Default	Description
`--video`	required	Path to input video file
`--out_dir`	required	Output directory for canvases and metadata
`--canvas_format`	`jpg`	Canvas image format: `jpg` or `png`
`--save_mask_video`	`False`	Save a side-by-side mask visualization video

Frame Sampling

Parameter	Default	Description
`--frame_sampling_mode`	`uniform_count`	How to sample frames: `fps`, `uniform_count`, `pkt_size_peak`, `fps_plus_pkt_size_peak`, `all_frames`
`--sample_fps`	`4.0`	Target FPS when `frame_sampling_mode=fps`
`--num_sampled_frames`	`1024`	Exact number of frames to uniformly sample when `frame_sampling_mode=uniform_count`
`--avoid_keyframes` / `--no_avoid_keyframes`	`True`	Shift sampled frames away from keyframes to avoid decoder drift

Grouping

Parameter	Default	Description
`--grouping_mode`	`readiness`	Grouping strategy: `readiness` (dynamic) or `fixed` (fixed-size)
`--group_size`	`32`	Max frames per group (for `fixed` mode or readiness window)
`--images_per_group`	`4`	Number of patch canvases to extract per group
`--min_group_frames`	`8`	Minimum frames per readiness group
`--max_group_frames`	`64`	Maximum frames per readiness group

Readiness Threshold (when `--grouping_mode readiness`)

Parameter	Default	Description
`--readiness_sum_threshold_mode`	`legacy`	Threshold mode: `legacy`, `auto`, `fixed`, `clamped_sqrt_bpppf`
`--readiness_sum_threshold`	`0.0`	Fixed threshold (used by `legacy` and `fixed` modes)
`--readiness_norm_sum_threshold`	`2250000.0`	Normalized threshold (used by `clamped_sqrt_bpppf` mode)
`--readiness_coverage_bins`	`3`	Minimum temporal bins that selected patches must cover
`--readiness_delta_ratio`	`0.05`	Stop extending group when score gain drops below this ratio

Resolution

Parameter	Default	Description
`--patch`	`14`	Vision model patch size (e.g. 14 for ViT)
`--max_pixels`	`153664`	Max pixels per canvas (resize limit)
`--max_dim`	`616`	Max dimension (width or height) before resize
`--block_size`	`2`	Block size for patch grouping (2×2 or 3×3)
`--no_resize`	`False`	Disable resize entirely

Bitcost Scoring

Parameter	Default	Description
`--bitcost_grid`	`adaptive`	Bitcost granularity: `sub`, `mb`, `ctu`, `adaptive`
`--bitcost_pct`	`99.0`	Percentile for bitcost normalization
`--bitcost_log_scale` / `--no_bitcost_log_scale`	`True`	Apply log scale to bitcost scores
`--disable_target_only`	`False`	Disable decoder-internal target-frame-only bitcost pruning

Decode Backend

Parameter	Default	Description
`--decode_backend`	`ffmpeg_native`	Decoder backend: `ffmpeg_native` or `cv_reader_pixels`
`--parallel_segments`	`0`	Number of parallel decode segments (0 = serial)
`--threads_per_segment`	`4`	FFmpeg thread count per segment worker
`--segment_guard_frames`	`30`	Extra frames around segment boundaries for keyframe-seek safety

CLI Example: Reproduce legacy benchmark settings

for id in 001 002 003 004 005 006 007 008 009 010; do
  codec-video-prep \
    --video /data/videommev2/${id}.mp4 \
    --out_dir ./output/${id} \
    --num_sampled_frames 512 \
    --group_size 32 \
    --images_per_group 4 \
    --patch 14 \
    --max_pixels 313600 \
    --min_group_frames 8 \
    --max_group_frames 128 \
    --bitcost_grid sub \
    --grouping_mode readiness \
    --frame_sampling_mode uniform_count \
    --readiness_sum_threshold_mode auto \
    --decode_backend cv_reader_pixels \
    --no_avoid_keyframes \
    --parallel_segments 32 \
    --threads_per_segment 1 \
    --disable_target_only
done

Output files

After running, the output directory contains:

File	Description
`canvas_*.jpg`	Packed patch canvases
`meta.json`	Full metadata, config, timing breakdown, and group info
`frame_ids.npy`	Sampled frame indices
`src_patch_position.npy`	Source patch positions `(group, patch, y1, x1, y2, x2)`

Python API

High-level one-shot call (`run_preinfer`)

from codec_video_prep import run_preinfer

result = run_preinfer(
    video="/path/to/video.mp4",
    out_dir="./preinfer_out",
    num_sampled_frames=1024,
    group_size=32,
    images_per_group=4,
    patch=14,
    max_pixels=153664,
    min_group_frames=8,
    max_group_frames=64,
    bitcost_grid="adaptive",
    grouping_mode="readiness",
    frame_sampling_mode="uniform_count",
    sample_fps=4.0,
    readiness_sum_threshold=0.0,
    readiness_sum_threshold_mode="legacy",
    readiness_norm_sum_threshold=2250000.0,
    avoid_keyframes=True,
    decode_backend="cv_reader_pixels",   # or "ffmpeg_native"
    parallel_segments=4,
    threads_per_segment=4,
    segment_guard_frames=30,
)

print(result.out_dir)       # output directory path
print(result.meta_path)     # path to meta.json
print(result.canvas_files)  # list of canvas image paths
print(result.timings)       # dict of timing breakdowns

All parameters mirror the CLI arguments.

Using `PreinferConfig` directly

from codec_video_prep import run_preinfer_config, PreinferConfig

cfg = PreinferConfig(
    video="/path/to/video.mp4",
    out_dir="./preinfer_out",
    num_sampled_frames=512,
    group_size=32,
    images_per_group=4,
    patch=14,
    max_pixels=313600,
    decode_backend="cv_reader_pixels",
    parallel_segments=32,
    threads_per_segment=1,
)

result = run_preinfer_config(cfg)

Low-level fast decoder (`cv_reader_fast`)

from codec_video_prep import cv_reader_fast

# Decode ALL frames with bitcost export
frames = cv_reader_fast.read_video_fast(
    path="/path/to/video.mp4",
    thread_count=16,
    export_bitcost=1,
    thread_type="auto",   # "auto" selects "slice" when export_bitcost=1
)

# Decode SELECTED frames only (bitcost + optional pixels)
selected = cv_reader_fast.read_video_fast_selected(
    path="/path/to/video.mp4",
    frame_ids=[0, 30, 60, 90],
    thread_count=16,
    export_bitcost=1,
    export_pixels=1,      # also return BGR pixels
    out_w=224,            # optional resize width
    out_h=224,            # optional resize height
    thread_type="slice",  # recommended for bitcost stability
)

# Segment seek + decode (used internally for parallel workers)
segment = cv_reader_fast.read_video_fast_selected_segment(
    path="/path/to/video.mp4",
    frame_ids=[30, 60, 90],
    seek_frame=0,         # seek target (decoder lands on nearest keyframe before this)
    end_frame=120,        # stop after this frame index
    thread_count=4,
    export_bitcost=1,
    export_pixels=1,
    out_w=224,
    out_h=224,
)

Each returned frame dict contains:

Key	Type	Description
`frame_idx`	`int`	Frame index
`pict_type`	`str`	`'I'`, `'P'` or `'B'`
`width` / `height`	`int`	Frame resolution
`codec_name`	`str`	Decoder name (`h264`, `hevc`, `vp9`, ...)
`bitcost`	`dict`	MB/CTU bitcost arrays (when `export_bitcost=1`)
`pixels`	`np.ndarray`	`(H, W, 3)` uint8 BGR array (when `export_pixels=1`)

The bitcost dict has one or more of these keys depending on codec and grid:

Key	Shape	Description
`mb_bit_cost`	`(mb_h, mb_w)`	Macroblock-level bitcost (H.264)
`ctu_bit_cost`	`(ctu_h, ctu_w)`	CTU-level bitcost (HEVC/VP9)
`sub_mb_bit_cost`	`(sub_h, sub_w)`	Sub-block bitcost (finer granularity)

Important: Threading mode for bitcost

When export_bitcost=1, always use thread_type="slice" (or "auto" which automatically selects "slice" for HEVC/H.264). Frame threading ("frame") can drop opaque_ref under the new bitcost_only patch, causing some frames to return empty bitcost.

# Correct — stable bitcost
selected = cv_reader_fast.read_video_fast_selected(
    path="video.mp4",
    frame_ids=[0, 10, 20],
    export_bitcost=1,
    thread_type="slice",
)

# Risky — may lose bitcost on some frames
selected = cv_reader_fast.read_video_fast_selected(
    path="video.mp4",
    frame_ids=[0, 10, 20],
    export_bitcost=1,
    thread_type="frame",
)

Build a manylinux wheel

# Build cp310 first (compiles FFmpeg)
PY_TAG=cp310-cp310 bash scripts/build_manylinux_wheel.sh

# Build remaining versions reusing FFmpeg
REUSE_FFMPEG=1 PY_TAG=cp311-cp311 bash scripts/build_manylinux_wheel.sh
REUSE_FFMPEG=1 PY_TAG=cp312-cp312 bash scripts/build_manylinux_wheel.sh
REUSE_FFMPEG=1 PY_TAG=cp313-cp313 bash scripts/build_manylinux_wheel.sh

Output:

wheelhouse/codec_video_prep-0.2.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl

Diagnostics

codec-video-prep-doctor

Checks:

cv_reader_fast C extension can be imported
Bundled FFmpeg shared libraries are present
Threading defaults (slice for bitcost, 16 threads)

Backward Compatibility

The old import path and CLI names are kept as aliases:

compressed_video_preinfer
cv-preinfer
cv-preinfer-doctor

Requirements

Python ≥ 3.10
numpy >= 1.23, < 2.0
opencv-python-headless < 4.12
Pillow
Patched FFmpeg shared libraries (bundled in the wheel or built from scripts/build_pixel_ffmpeg.sh)

Project details

Release history Release notifications | RSS feed

0.2.5

May 28, 2026

This version

0.2.4

May 27, 2026

0.2.3

May 22, 2026

0.2.2

May 21, 2026

0.2.1

May 21, 2026

0.2.0

May 21, 2026

0.1.1

May 19, 2026

0.1.0

May 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

codec_video_prep-0.2.4-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.8 MB view details)

Uploaded May 27, 2026 CPython 3.13manylinux: glibc 2.17+ x86-64

codec_video_prep-0.2.4-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.8 MB view details)

Uploaded May 27, 2026 CPython 3.12manylinux: glibc 2.17+ x86-64

codec_video_prep-0.2.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.8 MB view details)

Uploaded May 27, 2026 CPython 3.11manylinux: glibc 2.17+ x86-64

codec_video_prep-0.2.4-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.8 MB view details)

Uploaded May 27, 2026 CPython 3.10manylinux: glibc 2.17+ x86-64

codec_video_prep-0.2.4-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.8 MB view details)

Uploaded May 27, 2026 CPython 3.9manylinux: glibc 2.17+ x86-64

File details

Details for the file codec_video_prep-0.2.4-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

Download URL: codec_video_prep-0.2.4-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Upload date: May 27, 2026
Size: 29.8 MB
Tags: CPython 3.13, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for codec_video_prep-0.2.4-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm	Hash digest
SHA256	`e6cef9ed0e3126ecfe7785ee899547c35a2040a733f05edf5043692ab5e619f2`
MD5	`2495b4d20cd5197352a073caa0a6224d`
BLAKE2b-256	`7301409060108333dd84b2e2e788d11710a4cd396e8e6c3dd22eb3ae4316279a`

See more details on using hashes here.

File details

Details for the file codec_video_prep-0.2.4-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

Download URL: codec_video_prep-0.2.4-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Upload date: May 27, 2026
Size: 29.8 MB
Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for codec_video_prep-0.2.4-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm	Hash digest
SHA256	`a11d5f3a1e4c4d6de9768ebbe461bae2391c1e11ba50e48b9836cb2896087077`
MD5	`e5a22ea52bd0f7254458d8ac6fe344fc`
BLAKE2b-256	`a5f85cd17da9bef737adeed0d4cc8432ae3f10f7b266da6842cf99db8f38baf4`

See more details on using hashes here.

File details

Details for the file codec_video_prep-0.2.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

Download URL: codec_video_prep-0.2.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Upload date: May 27, 2026
Size: 29.8 MB
Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for codec_video_prep-0.2.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm	Hash digest
SHA256	`7c917942c51d1b7cdca95ed33fec4ab90384125c846a0917401db1ec05f85ce1`
MD5	`869bb77c1e46a3f10772f877e7cdbce0`
BLAKE2b-256	`dbba25669768aeed9cb3026165d5168dd61b970a192781d0868b2e70a26cd1bf`

See more details on using hashes here.

File details

Details for the file codec_video_prep-0.2.4-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

Download URL: codec_video_prep-0.2.4-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Upload date: May 27, 2026
Size: 29.8 MB
Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for codec_video_prep-0.2.4-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm	Hash digest
SHA256	`5f0fc8295b08eb621bf119ffa5068b9fa0d827a67e43e99053188a5394fbee9d`
MD5	`1e066a2d573c52e8248c88ec10fbf909`
BLAKE2b-256	`63f77de5bc73bfee2f50f88e92b07e39c1ea672b1eae70d032cecfd096affe67`

See more details on using hashes here.

File details

Details for the file codec_video_prep-0.2.4-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

Download URL: codec_video_prep-0.2.4-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Upload date: May 27, 2026
Size: 29.8 MB
Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for codec_video_prep-0.2.4-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm	Hash digest
SHA256	`40e508089f2167745b39be153c6b2f21c4aea1b318bffbe38fc16c9130d436c6`
MD5	`65c82f6afaca9049d12bf7b68589fb3f`
BLAKE2b-256	`942193d80dbc3cf554896f89b0ff5163068582547d445fb5323b62dbbffb89b4`

See more details on using hashes here.

codec-video-prep 0.2.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

codec-video-prep (v0.2.4)

What it does

Install

From PyPI (recommended)

From wheel file

Build from source

CLI Usage (codec-video-prep)

Quick start

Full parameter list

Input / Output

Frame Sampling

Grouping

Readiness Threshold (when --grouping_mode readiness)

Resolution

Bitcost Scoring

Decode Backend

CLI Example: Reproduce legacy benchmark settings

Output files

Python API

High-level one-shot call (run_preinfer)

Using PreinferConfig directly

Low-level fast decoder (cv_reader_fast)

Important: Threading mode for bitcost

Build a manylinux wheel

Diagnostics

Backward Compatibility

Requirements

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

CLI Usage (`codec-video-prep`)

Readiness Threshold (when `--grouping_mode readiness`)

High-level one-shot call (`run_preinfer`)

Using `PreinferConfig` directly

Low-level fast decoder (`cv_reader_fast`)