Codec-aware video preprocessing for training and inference
Project description
codec-video-prep
Codec-aware video preprocessing for training and inference. Extracts codec-level bitcost information from H.264 / HEVC / VP9 videos and turns it into patch-canvases ready for downstream vision models.
What it does
- Patched FFmpeg decoder – Instruments the H.264 / HEVC / VP9 decoder to export per-macroblock (H.264) or per-CTU (HEVC) bitcost maps during decoding.
- Fast C++ extension (
cv_reader_fast) – Decodes video with loop-filter / IDCT skipped and optionally returns bitcost data as NumPy arrays. - Readiness grouping – Groups frames by compressibility (bitcost) so that hard-to-decode regions get more patches.
- Top-K patch selection – Selects the most informative 2×2 patch blocks from each group and packs them into JPG/PNG canvases.
- One-command pipeline – From a raw video to a folder of canvases + metadata in a single call.
Install
From wheel (recommended)
python -m pip install codec_video_prep-*.whl
Verify the installation:
codec-video-prep-doctor
Build from source
-
Build the patched FFmpeg shared libraries:
- Pixel-capable (recommended — supports both bitcost and BGR pixel export):
bash build_pixel_ffmpeg.sh - Legacy skip-IDCT (faster bitcost-only scan, no pixel output):
bash scripts/build_patched_ffmpeg.sh
- Pixel-capable (recommended — supports both bitcost and BGR pixel export):
-
Build and install the Python package:
python -m pip install -e .
Quick start (CLI)
codec-video-prep \
--video /path/to/video.mp4 \
--out_dir ./preinfer_out \
--num_sampled_frames 1024 \
--group_size 32 \
--images_per_group 4 \
--max_pixels 153664
Output directory will contain:
canvas_*.jpg– Packed patch canvasesmeta.json– Full metadata, timing, and group infoframe_ids.npy– Sampled frame indicessrc_patch_position.npy– Patch source positions
Decode backends
Two decode backends are available:
| Backend | Description | Best for |
|---|---|---|
ffmpeg_native (default) |
FFmpeg subprocess decode + cv_reader_fast bitcost scan |
General use |
cv_reader_pixels |
Single-pass decode via cv_reader_fast that returns both bitcost and BGR pixels |
Speed (~1.8–1.9× faster end-to-end) |
Switch backend:
codec-video-prep --decode_backend cv_reader_pixels ...
Parallel segment decoding
For long videos with dense frame sampling, the bitcost-scan step dominates total time. You can split the workload into N parallel decode segments using ProcessPoolExecutor:
codec-video-prep \
--decode_backend cv_reader_pixels \
--parallel_segments 4 \
--threads_per_segment 4 \
--segment_guard_frames 30 \
...
| Parameter | Default | Description |
|---|---|---|
--parallel_segments |
0 (disabled) |
Number of parallel segments. Set to 0 or 1 to use serial decoding. |
--threads_per_segment |
4 |
FFmpeg thread_count inside each worker process. |
--segment_guard_frames |
30 |
Extra frames decoded before/after each segment boundary to compensate for seek-to-keyframe inaccuracy. |
Note: Parallel segment decoding incurs process-spawn overhead. For short clips (< a few thousand frames) serial decoding is usually faster. The benefit appears on long videos with dense sampling (e.g. 10k+ frames).
Python API
High-level one-shot call
from codec_video_prep import run_preinfer
result = run_preinfer(
video="/path/to/video.mp4",
out_dir="./preinfer_out",
num_sampled_frames=1024,
group_size=32,
images_per_group=4,
patch=14,
max_pixels=153664,
min_group_frames=8,
max_group_frames=64,
bitcost_grid="adaptive",
decode_backend="cv_reader_pixels", # or "ffmpeg_native"
parallel_segments=4, # 0 = serial
threads_per_segment=4,
segment_guard_frames=30,
)
print(result.out_dir) # output directory
print(result.meta_path) # path to meta.json
print(result.timings) # timing breakdown
Low-level fast decoder
from codec_video_prep import cv_reader_fast
# Decode all frames with bitcost export
frames = cv_reader_fast.read_video_fast(
path="/path/to/video.mp4",
thread_count=16,
export_bitcost=1,
thread_type="auto",
)
# Decode selected frames only (bitcost + optional pixels)
selected = cv_reader_fast.read_video_fast_selected(
path="/path/to/video.mp4",
frame_ids=[0, 30, 60, 90],
thread_count=16,
export_bitcost=1,
export_pixels=1, # also return BGR pixels
out_w=224, # optional resize width
out_h=224, # optional resize height
)
# Segment seek + decode (used internally for parallel workers)
segment = cv_reader_fast.read_video_fast_selected_segment(
path="/path/to/video.mp4",
frame_ids=[30, 60, 90],
seek_frame=0, # seek target (decoder lands on nearest keyframe before this)
end_frame=120, # stop after this frame index
thread_count=4,
export_bitcost=1,
export_pixels=1,
out_w=224,
out_h=224,
)
Each frame dict contains:
| Key | Description |
|---|---|
frame_idx |
Frame index |
pict_type |
'I', 'P' or 'B' |
width / height |
Frame resolution |
codec_name |
Decoder name (h264, hevc, vp9, …) |
bitcost |
Dict with MB/CTU bitcost arrays (when export_bitcost=1) |
pixels |
(H, W, 3) uint8 BGR array (when export_pixels=1) |
Project structure
├── src/codec_video_prep/ # Python package
│ ├── api.py # run_preinfer() entrypoint
│ ├── cli.py # codec-video-prep CLI
│ ├── doctor.py # codec-video-prep-doctor diagnostics
│ ├── config.py # PreinferConfig
│ └── libs/ # Bundled FFmpeg .so files
├── codec_selector/ # Frame sampling / grouping / patch selection
│ ├── core/ # Pipeline, probe, decode, config
│ ├── plugins/ # Samplers, scorers, groupers, selectors, packers
│ └── codec_patch_gop/ # Legacy GOP-based utilities
├── native/ # C++ Python extension
│ └── cv_reader_fast.cpp # Fast decoder with bitcost + pixel export, segment seek API
├── ffmpeg_patch/ # FFmpeg source patches
│ ├── bitcost_only/ # Pixel-capable patches (H.264 + HEVC + VP9, keeps full IDCT)
│ │ ├── h264_cabac.c / h264_cavlc.c
│ │ ├── hevcdec.c / hevcdec.h / hevc_refs.c
│ │ ├── vp9.c / vp9dec.h / vp9shared.h
│ │ └── h264_bitcost_only.patch
│ └── full_skip/ # Legacy skip-IDCT patches (faster, no pixel output)
│ ├── h264_*.c
│ ├── hevc_*.c
│ └── patch.sh
├── scripts/
│ ├── build_patched_ffmpeg.sh # Build legacy skip-IDCT FFmpeg libs
│ ├── build_pixel_ffmpeg.sh # Build pixel-capable FFmpeg libs
│ └── build_manylinux_wheel.sh # Build manylinux wheel
├── setup.py # setuptools build (C++ extension + FFmpeg libs)
└── pyproject.toml # PEP 517 project metadata
Build a manylinux wheel
PIP_INDEX_URL=https://mirrors.aliyun.com/pypi/simple \
PIP_TRUSTED_HOST=mirrors.aliyun.com \
bash scripts/build_manylinux_wheel.sh
Output:
wheelhouse/codec_video_prep-0.1.0-cp310-cp310-manylinux2014_x86_64.whl
Install and check:
python -m pip install wheelhouse/codec_video_prep-*.whl
codec-video-prep-doctor
To target a different Python ABI, set PY_TAG:
PY_TAG=cp311-cp311 bash scripts/build_manylinux_wheel.sh
Diagnostics
codec-video-prep-doctor checks:
cv_reader_fastC extension can be imported- Bundled FFmpeg shared libraries are present
- Threading defaults (auto thread type, 16 threads)
Backward Compatibility
The old import path and CLI names are kept as aliases:
compressed_video_preinfercv-preinfercv-preinfer-doctor
Requirements
- Python ≥ 3.10
- numpy >= 1.23, < 2.0
- opencv-python-headless < 4.12
- Pillow
- Patched FFmpeg shared libraries (built automatically by
scripts/build_patched_ffmpeg.sh)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codec_video_prep-0.2.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.
File metadata
- Download URL: codec_video_prep-0.2.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
- Upload date:
- Size: 29.8 MB
- Tags: CPython 3.13, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee78fee4be380892b97f54061fff6fa4b3bdeacafc57ae6677353d8150497a45
|
|
| MD5 |
81bff65595687d22384cc17b269fce30
|
|
| BLAKE2b-256 |
186f51011d0555e000671e5d386266951b2360a635f81ec31528d3b07c555787
|
File details
Details for the file codec_video_prep-0.2.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.
File metadata
- Download URL: codec_video_prep-0.2.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
- Upload date:
- Size: 29.8 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f11bbf8bfc4001dc9e2255107d3966072ef509152c4e790ac8a0e88fd1190ca
|
|
| MD5 |
ae59ad5fe58877eedcd696afc34dc1c9
|
|
| BLAKE2b-256 |
4b7ae861fc7b4994fee8741d60c306da2ca091b7f252135a449e203ee22f62b2
|
File details
Details for the file codec_video_prep-0.2.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.
File metadata
- Download URL: codec_video_prep-0.2.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
- Upload date:
- Size: 29.8 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd8d28f4e37677e0fb8dc913cfa0233dfce273dd54dd9eef6cb3c1e9ab82e7d2
|
|
| MD5 |
67ac33349e7a89bdc81a062a3964909c
|
|
| BLAKE2b-256 |
de3daf638888c9b82f1a55e2d6bd09771625e70fe8cd85bc053192cbc54260b8
|
File details
Details for the file codec_video_prep-0.2.2-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.
File metadata
- Download URL: codec_video_prep-0.2.2-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
- Upload date:
- Size: 29.8 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62d4419ee55fe41e49e1a23c93868d4306a14f4c4135e3a1a21834ca3c8a303f
|
|
| MD5 |
6a38d2ffa081af102108b3e240767ce0
|
|
| BLAKE2b-256 |
ec56ab2c62a4a7ca69ddb6f54faed6e69e22e3e041df98dd3f0a374231892367
|
File details
Details for the file codec_video_prep-0.2.2-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.
File metadata
- Download URL: codec_video_prep-0.2.2-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
- Upload date:
- Size: 29.8 MB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
465ff0ab4bfd1c3c4d7c4386fd0924f8973c978cad53d9142a131f6cac9499e8
|
|
| MD5 |
207963484d5f360ef13cab53d45d8b18
|
|
| BLAKE2b-256 |
d019970d1c80e607e93bbf0809f83b6973fbf72b1b16588fd9a86977e21b3940
|