Codec-stream video frontend for LMMs (LLaVA-OneVision-2 style).
Project description
lmms-video-utils
A codec-stream style video frontend for large multimodal models, modeled
after LLaVA-OneVision-2's codec tokenization. Each video is decoded,
partitioned into adaptive GOPs, and each GOP emits one I-canvas plus a
small number of P-canvases that pack the highest-scoring 2x2 patch blocks
from later frames. The output keeps a patch_positions table aligned with
2D-MRoPE block layouts, so downstream VLMs can place every patch back at
its source (t, h, w).
Install
pip install -e .
pip install -e .[gpu] # add TorchCodec
pip install -e .[all] # everything
PyAV is the default backend for portability; install [gpu] for TorchCodec.
Three usage levels
Level 1 - direct fetch:
from lmms_video_utils import fetch_codec_video
out = fetch_codec_video("clip.mp4", target_canvas=8)
print(out.canvases.shape, out.patch_positions.shape)
Level 2 - qwen-vl-utils-like:
from lmms_video_utils import process_video_info
messages = [{"role": "user", "content": [
{"type": "video", "video": "clip.mp4",
"video_start": 0.0, "video_end": 5.0,
"fps": 2.0, "max_pixels": 100_000},
{"type": "text", "text": "describe"},
]}]
_, videos = process_video_info(messages, video_backend="codec")
Inline keys recognized on each video / video_url item map to
CodecConfig fields:
| Inline key (qwen-vl-utils) | CodecConfig field |
|---|---|
video_start |
start_time |
video_end |
end_time |
fps |
target_fps |
nframes |
max_frames |
max_pixels |
max_pixels |
min_pixels |
min_pixels |
Inline overrides win over defaults passed as kwargs to
process_video_info(messages, **defaults). total_pixels is silently
ignored.
Level 3 - reader object:
from lmms_video_utils import CodecVideoReader
reader = CodecVideoReader("clip.mp4")
for i in range(len(reader)):
canvas = reader[i]
Roadmap
| Feature | Status |
|---|---|
| Uniform GOP, frame-diff scoring, PyAV/TorchCodec backends | implemented |
MV-warp residual scoring (score_mode="mvwarp") |
implemented |
Bit-cost-adaptive GOP (gop_mode="bitcost") + per-frame bit-cost score multiplier |
implemented |
qwen-vl-utils style per-message overrides (process_video_info) |
implemented |
| On-disk caching, batched GPU scoring | planned |
| Optional patched-libav backend for true codec residual | planned |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lmms_video_utils-0.1.0.tar.gz.
File metadata
- Download URL: lmms_video_utils-0.1.0.tar.gz
- Upload date:
- Size: 24.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7df49f8eb3d3b9c0605324edd78082f30930868e92b592cb15048a11daa5075a
|
|
| MD5 |
d88b3126715713347a117d78ef66df68
|
|
| BLAKE2b-256 |
ce8f6c461ffa0a2e39d04fb33d01fd8f0ade77533e3d8b64d6525e2bcd9f8156
|
File details
Details for the file lmms_video_utils-0.1.0-py3-none-any.whl.
File metadata
- Download URL: lmms_video_utils-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
84165f41622f70f8ea696c0728e2a382f07c67ff46a18b8f18ae4e116bd5079e
|
|
| MD5 |
518eb70bb0da5380c803f305989de034
|
|
| BLAKE2b-256 |
026f9eea437fee0cc5e89ffee9634ba5ad8728e99d0d0cb70bca3685f333d7ed
|