Skip to main content

Codec-aware video preprocessing for training and inference

Project description

codec-video-prep (v0.2.4)

Codec-aware video preprocessing for training and inference. Extracts codec-level bitcost information from H.264 / HEVC / VP9 videos and turns them into patch-canvases ready for downstream vision models.

What it does

  • Patched FFmpeg decoder – Instruments the H.264 / HEVC / VP9 decoder to export per-macroblock (H.264) or per-CTU (HEVC) bitcost maps during decoding.
  • Fast C++ extension (cv_reader_fast) – Decodes video with loop-filter / IDCT skipped and optionally returns bitcost data as NumPy arrays.
  • Readiness grouping – Groups frames by compressibility (bitcost) so that hard-to-decode regions get more patches.
  • Top-K patch selection – Selects the most informative 2×2 patch blocks from each group and packs them into JPG/PNG canvases.
  • One-command pipeline – From a raw video to a folder of canvases + metadata in a single call.

Install

From PyPI (recommended)

python -m pip install codec-video-prep==0.2.4

Verify the installation:

codec-video-prep-doctor

From wheel file

python -m pip install codec_video_prep-0.2.4-*.whl

Build from source

  1. Build the patched FFmpeg shared libraries:

    • Pixel-capable (recommended — supports both bitcost and BGR pixel export):
      bash build_pixel_ffmpeg.sh
      
    • Legacy skip-IDCT (faster bitcost-only scan, no pixel output):
      bash scripts/build_patched_ffmpeg.sh
      
  2. Build and install the Python package:

python -m pip install -e .

CLI Usage (codec-video-prep)

Quick start

codec-video-prep \
  --video /path/to/video.mp4 \
  --out_dir ./preinfer_out \
  --num_sampled_frames 1024 \
  --group_size 32 \
  --images_per_group 4 \
  --patch 14 \
  --max_pixels 153664

Full parameter list

Input / Output

Parameter Default Description
--video required Path to input video file
--out_dir required Output directory for canvases and metadata
--canvas_format jpg Canvas image format: jpg or png
--save_mask_video False Save a side-by-side mask visualization video

Frame Sampling

Parameter Default Description
--frame_sampling_mode uniform_count How to sample frames: fps, uniform_count, pkt_size_peak, fps_plus_pkt_size_peak, all_frames
--sample_fps 4.0 Target FPS when frame_sampling_mode=fps
--num_sampled_frames 1024 Exact number of frames to uniformly sample when frame_sampling_mode=uniform_count
--avoid_keyframes / --no_avoid_keyframes True Shift sampled frames away from keyframes to avoid decoder drift

Grouping

Parameter Default Description
--grouping_mode readiness Grouping strategy: readiness (dynamic) or fixed (fixed-size)
--group_size 32 Max frames per group (for fixed mode or readiness window)
--images_per_group 4 Number of patch canvases to extract per group
--min_group_frames 8 Minimum frames per readiness group
--max_group_frames 64 Maximum frames per readiness group

Readiness Threshold (when --grouping_mode readiness)

Parameter Default Description
--readiness_sum_threshold_mode legacy Threshold mode: legacy, auto, fixed, clamped_sqrt_bpppf
--readiness_sum_threshold 0.0 Fixed threshold (used by legacy and fixed modes)
--readiness_norm_sum_threshold 2250000.0 Normalized threshold (used by clamped_sqrt_bpppf mode)
--readiness_coverage_bins 3 Minimum temporal bins that selected patches must cover
--readiness_delta_ratio 0.05 Stop extending group when score gain drops below this ratio

Resolution

Parameter Default Description
--patch 14 Vision model patch size (e.g. 14 for ViT)
--max_pixels 153664 Max pixels per canvas (resize limit)
--max_dim 616 Max dimension (width or height) before resize
--block_size 2 Block size for patch grouping (2×2 or 3×3)
--no_resize False Disable resize entirely

Bitcost Scoring

Parameter Default Description
--bitcost_grid adaptive Bitcost granularity: sub, mb, ctu, adaptive
--bitcost_pct 99.0 Percentile for bitcost normalization
--bitcost_log_scale / --no_bitcost_log_scale True Apply log scale to bitcost scores
--disable_target_only False Disable decoder-internal target-frame-only bitcost pruning

Decode Backend

Parameter Default Description
--decode_backend ffmpeg_native Decoder backend: ffmpeg_native or cv_reader_pixels
--parallel_segments 0 Number of parallel decode segments (0 = serial)
--threads_per_segment 4 FFmpeg thread count per segment worker
--segment_guard_frames 30 Extra frames around segment boundaries for keyframe-seek safety

CLI Example: Reproduce legacy benchmark settings

for id in 001 002 003 004 005 006 007 008 009 010; do
  codec-video-prep \
    --video /data/videommev2/${id}.mp4 \
    --out_dir ./output/${id} \
    --num_sampled_frames 512 \
    --group_size 32 \
    --images_per_group 4 \
    --patch 14 \
    --max_pixels 313600 \
    --min_group_frames 8 \
    --max_group_frames 128 \
    --bitcost_grid sub \
    --grouping_mode readiness \
    --frame_sampling_mode uniform_count \
    --readiness_sum_threshold_mode auto \
    --decode_backend cv_reader_pixels \
    --no_avoid_keyframes \
    --parallel_segments 32 \
    --threads_per_segment 1 \
    --disable_target_only
done

Output files

After running, the output directory contains:

File Description
canvas_*.jpg Packed patch canvases
meta.json Full metadata, config, timing breakdown, and group info
frame_ids.npy Sampled frame indices
src_patch_position.npy Source patch positions (group, patch, y1, x1, y2, x2)

Python API

High-level one-shot call (run_preinfer)

from codec_video_prep import run_preinfer

result = run_preinfer(
    video="/path/to/video.mp4",
    out_dir="./preinfer_out",
    num_sampled_frames=1024,
    group_size=32,
    images_per_group=4,
    patch=14,
    max_pixels=153664,
    min_group_frames=8,
    max_group_frames=64,
    bitcost_grid="adaptive",
    grouping_mode="readiness",
    frame_sampling_mode="uniform_count",
    sample_fps=4.0,
    readiness_sum_threshold=0.0,
    readiness_sum_threshold_mode="legacy",
    readiness_norm_sum_threshold=2250000.0,
    avoid_keyframes=True,
    decode_backend="cv_reader_pixels",   # or "ffmpeg_native"
    parallel_segments=4,
    threads_per_segment=4,
    segment_guard_frames=30,
)

print(result.out_dir)       # output directory path
print(result.meta_path)     # path to meta.json
print(result.canvas_files)  # list of canvas image paths
print(result.timings)       # dict of timing breakdowns

All parameters mirror the CLI arguments.

Using PreinferConfig directly

from codec_video_prep import run_preinfer_config, PreinferConfig

cfg = PreinferConfig(
    video="/path/to/video.mp4",
    out_dir="./preinfer_out",
    num_sampled_frames=512,
    group_size=32,
    images_per_group=4,
    patch=14,
    max_pixels=313600,
    decode_backend="cv_reader_pixels",
    parallel_segments=32,
    threads_per_segment=1,
)

result = run_preinfer_config(cfg)

Low-level fast decoder (cv_reader_fast)

from codec_video_prep import cv_reader_fast

# Decode ALL frames with bitcost export
frames = cv_reader_fast.read_video_fast(
    path="/path/to/video.mp4",
    thread_count=16,
    export_bitcost=1,
    thread_type="auto",   # "auto" selects "slice" when export_bitcost=1
)

# Decode SELECTED frames only (bitcost + optional pixels)
selected = cv_reader_fast.read_video_fast_selected(
    path="/path/to/video.mp4",
    frame_ids=[0, 30, 60, 90],
    thread_count=16,
    export_bitcost=1,
    export_pixels=1,      # also return BGR pixels
    out_w=224,            # optional resize width
    out_h=224,            # optional resize height
    thread_type="slice",  # recommended for bitcost stability
)

# Segment seek + decode (used internally for parallel workers)
segment = cv_reader_fast.read_video_fast_selected_segment(
    path="/path/to/video.mp4",
    frame_ids=[30, 60, 90],
    seek_frame=0,         # seek target (decoder lands on nearest keyframe before this)
    end_frame=120,        # stop after this frame index
    thread_count=4,
    export_bitcost=1,
    export_pixels=1,
    out_w=224,
    out_h=224,
)

Each returned frame dict contains:

Key Type Description
frame_idx int Frame index
pict_type str 'I', 'P' or 'B'
width / height int Frame resolution
codec_name str Decoder name (h264, hevc, vp9, ...)
bitcost dict MB/CTU bitcost arrays (when export_bitcost=1)
pixels np.ndarray (H, W, 3) uint8 BGR array (when export_pixels=1)

The bitcost dict has one or more of these keys depending on codec and grid:

Key Shape Description
mb_bit_cost (mb_h, mb_w) Macroblock-level bitcost (H.264)
ctu_bit_cost (ctu_h, ctu_w) CTU-level bitcost (HEVC/VP9)
sub_mb_bit_cost (sub_h, sub_w) Sub-block bitcost (finer granularity)

Important: Threading mode for bitcost

When export_bitcost=1, always use thread_type="slice" (or "auto" which automatically selects "slice" for HEVC/H.264). Frame threading ("frame") can drop opaque_ref under the new bitcost_only patch, causing some frames to return empty bitcost.

# Correct — stable bitcost
selected = cv_reader_fast.read_video_fast_selected(
    path="video.mp4",
    frame_ids=[0, 10, 20],
    export_bitcost=1,
    thread_type="slice",
)

# Risky — may lose bitcost on some frames
selected = cv_reader_fast.read_video_fast_selected(
    path="video.mp4",
    frame_ids=[0, 10, 20],
    export_bitcost=1,
    thread_type="frame",
)

Build a manylinux wheel

# Build cp310 first (compiles FFmpeg)
PY_TAG=cp310-cp310 bash scripts/build_manylinux_wheel.sh

# Build remaining versions reusing FFmpeg
REUSE_FFMPEG=1 PY_TAG=cp311-cp311 bash scripts/build_manylinux_wheel.sh
REUSE_FFMPEG=1 PY_TAG=cp312-cp312 bash scripts/build_manylinux_wheel.sh
REUSE_FFMPEG=1 PY_TAG=cp313-cp313 bash scripts/build_manylinux_wheel.sh

Output:

wheelhouse/codec_video_prep-0.2.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl

Diagnostics

codec-video-prep-doctor

Checks:

  • cv_reader_fast C extension can be imported
  • Bundled FFmpeg shared libraries are present
  • Threading defaults (slice for bitcost, 16 threads)

Backward Compatibility

The old import path and CLI names are kept as aliases:

  • compressed_video_preinfer
  • cv-preinfer
  • cv-preinfer-doctor

Requirements

  • Python ≥ 3.10
  • numpy >= 1.23, < 2.0
  • opencv-python-headless < 4.12
  • Pillow
  • Patched FFmpeg shared libraries (bundled in the wheel or built from scripts/build_pixel_ffmpeg.sh)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

codec_video_prep-0.2.4-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

codec_video_prep-0.2.4-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

codec_video_prep-0.2.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

codec_video_prep-0.2.4-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

codec_video_prep-0.2.4-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.8 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

File details

Details for the file codec_video_prep-0.2.4-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for codec_video_prep-0.2.4-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 e6cef9ed0e3126ecfe7785ee899547c35a2040a733f05edf5043692ab5e619f2
MD5 2495b4d20cd5197352a073caa0a6224d
BLAKE2b-256 7301409060108333dd84b2e2e788d11710a4cd396e8e6c3dd22eb3ae4316279a

See more details on using hashes here.

File details

Details for the file codec_video_prep-0.2.4-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for codec_video_prep-0.2.4-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 a11d5f3a1e4c4d6de9768ebbe461bae2391c1e11ba50e48b9836cb2896087077
MD5 e5a22ea52bd0f7254458d8ac6fe344fc
BLAKE2b-256 a5f85cd17da9bef737adeed0d4cc8432ae3f10f7b266da6842cf99db8f38baf4

See more details on using hashes here.

File details

Details for the file codec_video_prep-0.2.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for codec_video_prep-0.2.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 7c917942c51d1b7cdca95ed33fec4ab90384125c846a0917401db1ec05f85ce1
MD5 869bb77c1e46a3f10772f877e7cdbce0
BLAKE2b-256 dbba25669768aeed9cb3026165d5168dd61b970a192781d0868b2e70a26cd1bf

See more details on using hashes here.

File details

Details for the file codec_video_prep-0.2.4-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for codec_video_prep-0.2.4-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 5f0fc8295b08eb621bf119ffa5068b9fa0d827a67e43e99053188a5394fbee9d
MD5 1e066a2d573c52e8248c88ec10fbf909
BLAKE2b-256 63f77de5bc73bfee2f50f88e92b07e39c1ea672b1eae70d032cecfd096affe67

See more details on using hashes here.

File details

Details for the file codec_video_prep-0.2.4-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for codec_video_prep-0.2.4-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 40e508089f2167745b39be153c6b2f21c4aea1b318bffbe38fc16c9130d436c6
MD5 65c82f6afaca9049d12bf7b68589fb3f
BLAKE2b-256 942193d80dbc3cf554896f89b0ff5163068582547d445fb5323b62dbbffb89b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page