Skip to main content

Codec-aware video preprocessing for training and inference

Project description

codec-video-prep (v0.2.4)

Codec-aware video preprocessing for training and inference. Extracts codec-level bitcost information from H.264 / HEVC / VP9 videos and turns them into patch-canvases ready for downstream vision models.

What it does

  • Patched FFmpeg decoder – Instruments the H.264 / HEVC / VP9 decoder to export per-macroblock (H.264) or per-CTU (HEVC) bitcost maps during decoding.
  • Fast C++ extension (cv_reader_fast) – Decodes video with loop-filter / IDCT skipped and optionally returns bitcost data as NumPy arrays.
  • Readiness grouping – Groups frames by compressibility (bitcost) so that hard-to-decode regions get more patches.
  • Top-K patch selection – Selects the most informative 2×2 patch blocks from each group and packs them into JPG/PNG canvases.
  • One-command pipeline – From a raw video to a folder of canvases + metadata in a single call.

Install

From PyPI (recommended)

python -m pip install -i https://pypi.org/simple/ codec-video-prep==0.2.4

Verify the installation:

codec-video-prep-doctor

From wheel file

python -m pip install codec_video_prep-0.2.4-*.whl

Build from source

  1. Build the patched FFmpeg shared libraries:

    • Pixel-capable (recommended — supports both bitcost and BGR pixel export):
      bash build_pixel_ffmpeg.sh
      
    • Legacy skip-IDCT (faster bitcost-only scan, no pixel output):
      bash scripts/build_patched_ffmpeg.sh
      
  2. Build and install the Python package:

python -m pip install -e .

CLI Usage (codec-video-prep)

Quick start

codec-video-prep \
  --video /path/to/video.mp4 \
  --out_dir ./preinfer_out \
  --num_sampled_frames 1024 \
  --group_size 32 \
  --images_per_group 4 \
  --patch 14 \
  --max_pixels 153664

Full parameter list

Input / Output

Parameter Default Description
--video required Path to input video file
--out_dir required Output directory for canvases and metadata
--canvas_format jpg Canvas image format: jpg or png
--save_mask_video False Save a side-by-side mask visualization video

Frame Sampling

Parameter Default Description
--frame_sampling_mode uniform_count How to sample frames: fps, uniform_count, pkt_size_peak, fps_plus_pkt_size_peak, all_frames
--sample_fps 4.0 Target FPS when frame_sampling_mode=fps
--num_sampled_frames 1024 Exact number of frames to uniformly sample when frame_sampling_mode=uniform_count
--avoid_keyframes / --no_avoid_keyframes True Shift sampled frames away from keyframes to avoid decoder drift

Grouping

Parameter Default Description
--grouping_mode readiness Grouping strategy: readiness (dynamic) or fixed (fixed-size)
--group_size 32 Max frames per group (for fixed mode or readiness window)
--images_per_group 4 Number of patch canvases to extract per group
--min_group_frames 8 Minimum frames per readiness group
--max_group_frames 64 Maximum frames per readiness group

Readiness Threshold (when --grouping_mode readiness)

Parameter Default Description
--readiness_sum_threshold_mode legacy Threshold mode: legacy, auto, fixed, clamped_sqrt_bpppf
--readiness_sum_threshold 0.0 Fixed threshold (used by legacy and fixed modes)
--readiness_norm_sum_threshold 2250000.0 Normalized threshold (used by clamped_sqrt_bpppf mode)
--readiness_coverage_bins 3 Minimum temporal bins that selected patches must cover
--readiness_delta_ratio 0.05 Stop extending group when score gain drops below this ratio

Resolution

Parameter Default Description
--patch 14 Vision model patch size (e.g. 14 for ViT)
--max_pixels 153664 Max pixels per canvas (resize limit)
--max_dim 616 Max dimension (width or height) before resize
--block_size 2 Block size for patch grouping (2×2 or 3×3)
--no_resize False Disable resize entirely

Bitcost Scoring

Parameter Default Description
--bitcost_grid adaptive Bitcost granularity: sub, mb, ctu, adaptive
--bitcost_pct 99.0 Percentile for bitcost normalization
--bitcost_log_scale / --no_bitcost_log_scale True Apply log scale to bitcost scores
--disable_target_only False Disable decoder-internal target-frame-only bitcost pruning

Decode Backend

Parameter Default Description
--decode_backend ffmpeg_native Decoder backend: ffmpeg_native or cv_reader_pixels
--parallel_segments 0 Number of parallel decode segments (0 = serial)
--threads_per_segment 4 FFmpeg thread count per segment worker
--segment_guard_frames 30 Extra frames around segment boundaries for keyframe-seek safety

CLI Example: Reproduce legacy benchmark settings

for id in 001 002 003 004 005 006 007 008 009 010; do
  codec-video-prep \
    --video /data/videommev2/${id}.mp4 \
    --out_dir ./output/${id} \
    --num_sampled_frames 512 \
    --group_size 32 \
    --images_per_group 4 \
    --patch 14 \
    --max_pixels 313600 \
    --min_group_frames 8 \
    --max_group_frames 128 \
    --bitcost_grid sub \
    --grouping_mode readiness \
    --frame_sampling_mode uniform_count \
    --readiness_sum_threshold_mode auto \
    --decode_backend cv_reader_pixels \
    --no_avoid_keyframes \
    --parallel_segments 32 \
    --threads_per_segment 1 \
    --disable_target_only
done

Output files

After running, the output directory contains:

File Description
canvas_*.jpg Packed patch canvases
meta.json Full metadata, config, timing breakdown, and group info
frame_ids.npy Sampled frame indices
src_patch_position.npy Source patch positions (group, patch, y1, x1, y2, x2)

Python API

High-level one-shot call (run_preinfer)

from codec_video_prep import run_preinfer

result = run_preinfer(
    video="/path/to/video.mp4",
    out_dir="./preinfer_out",
    num_sampled_frames=1024,
    group_size=32,
    images_per_group=4,
    patch=14,
    max_pixels=153664,
    min_group_frames=8,
    max_group_frames=64,
    bitcost_grid="adaptive",
    grouping_mode="readiness",
    frame_sampling_mode="uniform_count",
    sample_fps=4.0,
    readiness_sum_threshold=0.0,
    readiness_sum_threshold_mode="legacy",
    readiness_norm_sum_threshold=2250000.0,
    avoid_keyframes=True,
    decode_backend="cv_reader_pixels",   # or "ffmpeg_native"
    parallel_segments=4,
    threads_per_segment=4,
    segment_guard_frames=30,
)

print(result.out_dir)       # output directory path
print(result.meta_path)     # path to meta.json
print(result.canvas_files)  # list of canvas image paths
print(result.timings)       # dict of timing breakdowns

All parameters mirror the CLI arguments.

Using PreinferConfig directly

from codec_video_prep import run_preinfer_config, PreinferConfig

cfg = PreinferConfig(
    video="/path/to/video.mp4",
    out_dir="./preinfer_out",
    num_sampled_frames=512,
    group_size=32,
    images_per_group=4,
    patch=14,
    max_pixels=313600,
    decode_backend="cv_reader_pixels",
    parallel_segments=32,
    threads_per_segment=1,
)

result = run_preinfer_config(cfg)

Low-level fast decoder (cv_reader_fast)

from codec_video_prep import cv_reader_fast

# Decode ALL frames with bitcost export
frames = cv_reader_fast.read_video_fast(
    path="/path/to/video.mp4",
    thread_count=16,
    export_bitcost=1,
    thread_type="auto",   # "auto" selects "slice" when export_bitcost=1
)

# Decode SELECTED frames only (bitcost + optional pixels)
selected = cv_reader_fast.read_video_fast_selected(
    path="/path/to/video.mp4",
    frame_ids=[0, 30, 60, 90],
    thread_count=16,
    export_bitcost=1,
    export_pixels=1,      # also return BGR pixels
    out_w=224,            # optional resize width
    out_h=224,            # optional resize height
    thread_type="slice",  # recommended for bitcost stability
)

# Segment seek + decode (used internally for parallel workers)
segment = cv_reader_fast.read_video_fast_selected_segment(
    path="/path/to/video.mp4",
    frame_ids=[30, 60, 90],
    seek_frame=0,         # seek target (decoder lands on nearest keyframe before this)
    end_frame=120,        # stop after this frame index
    thread_count=4,
    export_bitcost=1,
    export_pixels=1,
    out_w=224,
    out_h=224,
)

Each returned frame dict contains:

Key Type Description
frame_idx int Frame index
pict_type str 'I', 'P' or 'B'
width / height int Frame resolution
codec_name str Decoder name (h264, hevc, vp9, ...)
bitcost dict MB/CTU bitcost arrays (when export_bitcost=1)
pixels np.ndarray (H, W, 3) uint8 BGR array (when export_pixels=1)

The bitcost dict has one or more of these keys depending on codec and grid:

Key Shape Description
mb_bit_cost (mb_h, mb_w) Macroblock-level bitcost (H.264)
ctu_bit_cost (ctu_h, ctu_w) CTU-level bitcost (HEVC/VP9)
sub_mb_bit_cost (sub_h, sub_w) Sub-block bitcost (finer granularity)

Important: Threading mode for bitcost

When export_bitcost=1, always use thread_type="slice" (or "auto" which automatically selects "slice" for HEVC/H.264). Frame threading ("frame") can drop opaque_ref under the new bitcost_only patch, causing some frames to return empty bitcost.

# Correct — stable bitcost
selected = cv_reader_fast.read_video_fast_selected(
    path="video.mp4",
    frame_ids=[0, 10, 20],
    export_bitcost=1,
    thread_type="slice",
)

# Risky — may lose bitcost on some frames
selected = cv_reader_fast.read_video_fast_selected(
    path="video.mp4",
    frame_ids=[0, 10, 20],
    export_bitcost=1,
    thread_type="frame",
)

Build a manylinux wheel

# Build cp310 first (compiles FFmpeg)
PY_TAG=cp310-cp310 bash scripts/build_manylinux_wheel.sh

# Build remaining versions reusing FFmpeg
REUSE_FFMPEG=1 PY_TAG=cp311-cp311 bash scripts/build_manylinux_wheel.sh
REUSE_FFMPEG=1 PY_TAG=cp312-cp312 bash scripts/build_manylinux_wheel.sh
REUSE_FFMPEG=1 PY_TAG=cp313-cp313 bash scripts/build_manylinux_wheel.sh

Output:

wheelhouse/codec_video_prep-0.2.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl

Diagnostics

codec-video-prep-doctor

Checks:

  • cv_reader_fast C extension can be imported
  • Bundled FFmpeg shared libraries are present
  • Threading defaults (slice for bitcost, 16 threads)

Backward Compatibility

The old import path and CLI names are kept as aliases:

  • compressed_video_preinfer
  • cv-preinfer
  • cv-preinfer-doctor

Requirements

  • Python ≥ 3.10
  • numpy >= 1.23, < 2.0
  • opencv-python-headless < 4.12
  • Pillow
  • Patched FFmpeg shared libraries (bundled in the wheel or built from scripts/build_pixel_ffmpeg.sh)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

codec_video_prep-0.2.5-cp313-cp313-manylinux_2_35_aarch64.whl (22.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.35+ ARM64

codec_video_prep-0.2.5-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

codec_video_prep-0.2.5-cp312-cp312-manylinux_2_35_aarch64.whl (22.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.35+ ARM64

codec_video_prep-0.2.5-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

codec_video_prep-0.2.5-cp311-cp311-manylinux_2_35_aarch64.whl (22.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.35+ ARM64

codec_video_prep-0.2.5-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

codec_video_prep-0.2.5-cp310-cp310-manylinux_2_35_aarch64.whl (22.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.35+ ARM64

codec_video_prep-0.2.5-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

codec_video_prep-0.2.5-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.8 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

File details

Details for the file codec_video_prep-0.2.5-cp313-cp313-manylinux_2_35_aarch64.whl.

File metadata

File hashes

Hashes for codec_video_prep-0.2.5-cp313-cp313-manylinux_2_35_aarch64.whl
Algorithm Hash digest
SHA256 fec907da252d692039f47c4559aed67862f493a982cfcae5d9d64cc023d4499f
MD5 93e3b108573059e60decac5aaa8c30fc
BLAKE2b-256 eba1055d3b0067c0b0f8f94f2633a8a34f34dc8d91f207a74b29572798f08e23

See more details on using hashes here.

File details

Details for the file codec_video_prep-0.2.5-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for codec_video_prep-0.2.5-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 da54cea57483f309106b679436e23817dc1150e91c8c1a69d5fe76d31edb63d7
MD5 5dfc7f82c90ffa98c7e44e329fe6cc5e
BLAKE2b-256 0da717e4778a63cd885691a0f968ba7b4b9aaa27caf2d6d3c8dd4be3b2e0f077

See more details on using hashes here.

File details

Details for the file codec_video_prep-0.2.5-cp312-cp312-manylinux_2_35_aarch64.whl.

File metadata

File hashes

Hashes for codec_video_prep-0.2.5-cp312-cp312-manylinux_2_35_aarch64.whl
Algorithm Hash digest
SHA256 8c992c577430ce9b43545df23acd29eb79e7eace860fef942496b3a37a25b6ef
MD5 29ef52349006f456ec288829fc858688
BLAKE2b-256 8f38081090c9ade816ba0ba1456afe09aba6bb9cf0809eeeb0716e0f1f3f300a

See more details on using hashes here.

File details

Details for the file codec_video_prep-0.2.5-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for codec_video_prep-0.2.5-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 1fdf52a26a3499b915a3921926391ab78afe0bc703697eacf7da187c43bfbab6
MD5 78b1051106d3eda834cb315d69a5bbb6
BLAKE2b-256 b4422d51d621a61b604dd70a748e8acee51e432471d488f61dddded4f52ab255

See more details on using hashes here.

File details

Details for the file codec_video_prep-0.2.5-cp311-cp311-manylinux_2_35_aarch64.whl.

File metadata

File hashes

Hashes for codec_video_prep-0.2.5-cp311-cp311-manylinux_2_35_aarch64.whl
Algorithm Hash digest
SHA256 af4a2aefb707a190f720d3b298d58acb5c1a531f9c9bc0ff9b9b01507811fb80
MD5 8268d8fac83083ee329b96d2ae604007
BLAKE2b-256 183377a31206054918270779776af81848bcd463cdc8422d38e4d57ba212a043

See more details on using hashes here.

File details

Details for the file codec_video_prep-0.2.5-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for codec_video_prep-0.2.5-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 a1e43ef733a84ca52ef5e73c372cf3db3646b8a75ee27e2f83e23ae51a430d6b
MD5 b7906d11e558ceb87b7ce05ce3bb4ed3
BLAKE2b-256 8ed38bbf55f09c55274f498efbe1873559e1da4cdfce0bdd53226230e92e53b8

See more details on using hashes here.

File details

Details for the file codec_video_prep-0.2.5-cp310-cp310-manylinux_2_35_aarch64.whl.

File metadata

File hashes

Hashes for codec_video_prep-0.2.5-cp310-cp310-manylinux_2_35_aarch64.whl
Algorithm Hash digest
SHA256 a7c32e51dbc6b0045d2d71db04550279db067730e3b5551ec86cbae591c713a9
MD5 18d4bc18d5b33a3512f5c8b87a942f1f
BLAKE2b-256 156b2903cc5b4d3de23d80b422f186f0b95c1fe90b93d1860128e17de20f4e0c

See more details on using hashes here.

File details

Details for the file codec_video_prep-0.2.5-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for codec_video_prep-0.2.5-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 0826583f16efcf19523b69a7c102b87332cbaf6c78ecc6fab92c081ac54df666
MD5 6a5c74a1af2bebeb9b29f1ad332b6fe1
BLAKE2b-256 77c5edd9759ce8b24038ebc6f5a6a0979eb3ea1cd13259227d8dd0ee737430e4

See more details on using hashes here.

File details

Details for the file codec_video_prep-0.2.5-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for codec_video_prep-0.2.5-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 cda839fe4cb481deb9d4bbaeda44195aef929f3741368e712db4bc3f2d7a4961
MD5 a26f79058c17346bbeaeca838ff7feab
BLAKE2b-256 82f7db9c442cccbcd53c0b2e2a1888d78c420ab51dbed2f72dd53dd52a8ee895

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page