Skip to main content

Codec-aware video preprocessing for training and inference

Project description

codec-video-prep

Codec-aware video preprocessing for training and inference. Extracts codec-level bitcost information from H.264 / HEVC / VP9 videos and turns it into patch-canvases ready for downstream vision models.

What it does

  • Patched FFmpeg decoder – Instruments the H.264 / HEVC / VP9 decoder to export per-macroblock (H.264) or per-CTU (HEVC) bitcost maps during decoding.
  • Fast C++ extension (cv_reader_fast) – Decodes video with loop-filter / IDCT skipped and optionally returns bitcost data as NumPy arrays.
  • Readiness grouping – Groups frames by compressibility (bitcost) so that hard-to-decode regions get more patches.
  • Top-K patch selection – Selects the most informative 2×2 patch blocks from each group and packs them into JPG/PNG canvases.
  • One-command pipeline – From a raw video to a folder of canvases + metadata in a single call.

Install

From wheel (recommended)

python -m pip install codec_video_prep-*.whl

Verify the installation:

codec-video-prep-doctor

Build from source

  1. Build the patched FFmpeg shared libraries:

    • Pixel-capable (recommended — supports both bitcost and BGR pixel export):
      bash build_pixel_ffmpeg.sh
      
    • Legacy skip-IDCT (faster bitcost-only scan, no pixel output):
      bash scripts/build_patched_ffmpeg.sh
      
  2. Build and install the Python package:

python -m pip install -e .

Quick start (CLI)

codec-video-prep \
  --video /path/to/video.mp4 \
  --out_dir ./preinfer_out \
  --num_sampled_frames 1024 \
  --group_size 32 \
  --images_per_group 4 \
  --max_pixels 153664

Output directory will contain:

  • canvas_*.jpg – Packed patch canvases
  • meta.json – Full metadata, timing, and group info
  • frame_ids.npy – Sampled frame indices
  • src_patch_position.npy – Patch source positions

Decode backends

Two decode backends are available:

Backend Description Best for
ffmpeg_native (default) FFmpeg subprocess decode + cv_reader_fast bitcost scan General use
cv_reader_pixels Single-pass decode via cv_reader_fast that returns both bitcost and BGR pixels Speed (~1.8–1.9× faster end-to-end)

Switch backend:

codec-video-prep --decode_backend cv_reader_pixels ...

Parallel segment decoding

For long videos with dense frame sampling, the bitcost-scan step dominates total time. You can split the workload into N parallel decode segments using ProcessPoolExecutor:

codec-video-prep \
  --decode_backend cv_reader_pixels \
  --parallel_segments 4 \
  --threads_per_segment 4 \
  --segment_guard_frames 30 \
  ...
Parameter Default Description
--parallel_segments 0 (disabled) Number of parallel segments. Set to 0 or 1 to use serial decoding.
--threads_per_segment 4 FFmpeg thread_count inside each worker process.
--segment_guard_frames 30 Extra frames decoded before/after each segment boundary to compensate for seek-to-keyframe inaccuracy.

Note: Parallel segment decoding incurs process-spawn overhead. For short clips (< a few thousand frames) serial decoding is usually faster. The benefit appears on long videos with dense sampling (e.g. 10k+ frames).

Python API

High-level one-shot call

from codec_video_prep import run_preinfer

result = run_preinfer(
    video="/path/to/video.mp4",
    out_dir="./preinfer_out",
    num_sampled_frames=1024,
    group_size=32,
    images_per_group=4,
    patch=14,
    max_pixels=153664,
    min_group_frames=8,
    max_group_frames=64,
    bitcost_grid="adaptive",
    decode_backend="cv_reader_pixels",   # or "ffmpeg_native"
    parallel_segments=4,                  # 0 = serial
    threads_per_segment=4,
    segment_guard_frames=30,
)

print(result.out_dir)       # output directory
print(result.meta_path)     # path to meta.json
print(result.timings)       # timing breakdown

Low-level fast decoder

from codec_video_prep import cv_reader_fast

# Decode all frames with bitcost export
frames = cv_reader_fast.read_video_fast(
    path="/path/to/video.mp4",
    thread_count=16,
    export_bitcost=1,
    thread_type="auto",
)

# Decode selected frames only (bitcost + optional pixels)
selected = cv_reader_fast.read_video_fast_selected(
    path="/path/to/video.mp4",
    frame_ids=[0, 30, 60, 90],
    thread_count=16,
    export_bitcost=1,
    export_pixels=1,   # also return BGR pixels
    out_w=224,         # optional resize width
    out_h=224,         # optional resize height
)

# Segment seek + decode (used internally for parallel workers)
segment = cv_reader_fast.read_video_fast_selected_segment(
    path="/path/to/video.mp4",
    frame_ids=[30, 60, 90],
    seek_frame=0,       # seek target (decoder lands on nearest keyframe before this)
    end_frame=120,      # stop after this frame index
    thread_count=4,
    export_bitcost=1,
    export_pixels=1,
    out_w=224,
    out_h=224,
)

Each frame dict contains:

Key Description
frame_idx Frame index
pict_type 'I', 'P' or 'B'
width / height Frame resolution
codec_name Decoder name (h264, hevc, vp9, …)
bitcost Dict with MB/CTU bitcost arrays (when export_bitcost=1)
pixels (H, W, 3) uint8 BGR array (when export_pixels=1)

Project structure

├── src/codec_video_prep/    # Python package
│   ├── api.py                        # run_preinfer() entrypoint
│   ├── cli.py                        # codec-video-prep CLI
│   ├── doctor.py                     # codec-video-prep-doctor diagnostics
│   ├── config.py                     # PreinferConfig
│   └── libs/                         # Bundled FFmpeg .so files
├── codec_selector/                   # Frame sampling / grouping / patch selection
│   ├── core/                         # Pipeline, probe, decode, config
│   ├── plugins/                      # Samplers, scorers, groupers, selectors, packers
│   └── codec_patch_gop/              # Legacy GOP-based utilities
├── native/                           # C++ Python extension
│   └── cv_reader_fast.cpp            # Fast decoder with bitcost + pixel export, segment seek API
├── ffmpeg_patch/                     # FFmpeg source patches
│   ├── bitcost_only/                 # Pixel-capable patches (H.264 + HEVC + VP9, keeps full IDCT)
│   │   ├── h264_cabac.c / h264_cavlc.c
│   │   ├── hevcdec.c / hevcdec.h / hevc_refs.c
│   │   ├── vp9.c / vp9dec.h / vp9shared.h
│   │   └── h264_bitcost_only.patch
│   └── full_skip/                    # Legacy skip-IDCT patches (faster, no pixel output)
│       ├── h264_*.c
│       ├── hevc_*.c
│       └── patch.sh
├── scripts/
│   ├── build_patched_ffmpeg.sh       # Build legacy skip-IDCT FFmpeg libs
│   ├── build_pixel_ffmpeg.sh         # Build pixel-capable FFmpeg libs
│   └── build_manylinux_wheel.sh      # Build manylinux wheel
├── setup.py                          # setuptools build (C++ extension + FFmpeg libs)
└── pyproject.toml                    # PEP 517 project metadata

Build a manylinux wheel

PIP_INDEX_URL=https://mirrors.aliyun.com/pypi/simple \
PIP_TRUSTED_HOST=mirrors.aliyun.com \
bash scripts/build_manylinux_wheel.sh

Output:

wheelhouse/codec_video_prep-0.1.0-cp310-cp310-manylinux2014_x86_64.whl

Install and check:

python -m pip install wheelhouse/codec_video_prep-*.whl
codec-video-prep-doctor

To target a different Python ABI, set PY_TAG:

PY_TAG=cp311-cp311 bash scripts/build_manylinux_wheel.sh

Diagnostics

codec-video-prep-doctor checks:

  • cv_reader_fast C extension can be imported
  • Bundled FFmpeg shared libraries are present
  • Threading defaults (auto thread type, 16 threads)

Backward Compatibility

The old import path and CLI names are kept as aliases:

  • compressed_video_preinfer
  • cv-preinfer
  • cv-preinfer-doctor

Requirements

  • Python ≥ 3.10
  • numpy >= 1.23, < 2.0
  • opencv-python-headless < 4.12
  • Pillow
  • Patched FFmpeg shared libraries (built automatically by scripts/build_patched_ffmpeg.sh)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

codec_video_prep-0.2.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

codec_video_prep-0.2.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

codec_video_prep-0.2.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

codec_video_prep-0.2.2-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

codec_video_prep-0.2.2-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.8 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

File details

Details for the file codec_video_prep-0.2.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for codec_video_prep-0.2.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 ee78fee4be380892b97f54061fff6fa4b3bdeacafc57ae6677353d8150497a45
MD5 81bff65595687d22384cc17b269fce30
BLAKE2b-256 186f51011d0555e000671e5d386266951b2360a635f81ec31528d3b07c555787

See more details on using hashes here.

File details

Details for the file codec_video_prep-0.2.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for codec_video_prep-0.2.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 5f11bbf8bfc4001dc9e2255107d3966072ef509152c4e790ac8a0e88fd1190ca
MD5 ae59ad5fe58877eedcd696afc34dc1c9
BLAKE2b-256 4b7ae861fc7b4994fee8741d60c306da2ca091b7f252135a449e203ee22f62b2

See more details on using hashes here.

File details

Details for the file codec_video_prep-0.2.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for codec_video_prep-0.2.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 cd8d28f4e37677e0fb8dc913cfa0233dfce273dd54dd9eef6cb3c1e9ab82e7d2
MD5 67ac33349e7a89bdc81a062a3964909c
BLAKE2b-256 de3daf638888c9b82f1a55e2d6bd09771625e70fe8cd85bc053192cbc54260b8

See more details on using hashes here.

File details

Details for the file codec_video_prep-0.2.2-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for codec_video_prep-0.2.2-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 62d4419ee55fe41e49e1a23c93868d4306a14f4c4135e3a1a21834ca3c8a303f
MD5 6a38d2ffa081af102108b3e240767ce0
BLAKE2b-256 ec56ab2c62a4a7ca69ddb6f54faed6e69e22e3e041df98dd3f0a374231892367

See more details on using hashes here.

File details

Details for the file codec_video_prep-0.2.2-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for codec_video_prep-0.2.2-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 465ff0ab4bfd1c3c4d7c4386fd0924f8973c978cad53d9142a131f6cac9499e8
MD5 207963484d5f360ef13cab53d45d8b18
BLAKE2b-256 d019970d1c80e607e93bbf0809f83b6973fbf72b1b16588fd9a86977e21b3940

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page