High-speed video mosaic on CUDA: NVDEC/NVENC + torch-lap Hungarian (sibling of mosaic-temporal)

These details have not been verified by PyPI

Project links

Sibling

Project description

mosaic-temporal-gpu

The high-speed sibling of mosaic-temporal. NVDEC/NVENC + torch-lap Hungarian + on-GPU torch kernels (Triton port queued for v0.2).

⚠️ Status: 0.1.0 release candidate. Public API (run_pipeline), kernels, solver, NVDEC/NVENC bridge, config schema, and CPU-host tests are in place. The remaining work toward 0.1.0 final is the parity-gate CI on a CUDA runner and the bench-spike sign-off on Kaggle T4 — see Roadmap. The Quickstart below is the supported API; the 3-stream CUDA-overlap optimization that motivated this repo lands in 0.2 without changing the signature.

Positioning

This is the high-speed build of the video mosaic pipeline. The portable sibling mosaic-temporal keeps a CPU fallback at every step for users without a GPU; this repo drops every fallback so the hot path can be NVDEC → Triton → torch-lap → NVENC end-to-end. The cost is hard: NVIDIA GPU with CUDA ≥ 12.0 is required. The benefit is real throughput on long clips.

Feature	mosaic-temporal	mosaic-temporal-gpu (high-speed)
Hungarian assignment	scipy CPU (default)	torch-linear-assignment (only)
Cost matrix	numpy CPU loop	torch.cdist on CUDA (Triton in v0.2)
Oklab grid mean	numpy	torch view+reduce on CUDA (Triton v0.2)
Video I/O	cv2 PNG round-trip	PyAV NVDEC → ndarray → NVENC
RAFT optical flow	CPU torch (slow)	not in v0.1.0 — queued for v0.3
Bit-exact CPU output	yes (`bit-exact-cpu`)	no — parity gated at SSIM ≥ 0.98
Runtime requirement	none	NVIDIA GPU with CUDA ≥ 12.0

If you need the CPU fallback, the bit-exact reference, or Windows/macOS support, use mosaic-temporal. If you have a CUDA GPU and want speed, you're in the right place.

Install (once 0.1.0 ships to PyPI)

mosaic-temporal-gpu requires a CUDA build of PyTorch. Install torch first from the official CUDA wheel index, then install this package:

# 1. CUDA 12.1 wheels (adjust cu121 to your CUDA version)
pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision

# 2. Pure compute kernels only (no video I/O — no PyAV)
pip install mosaic-temporal-gpu

# 2'. With NVDEC/NVENC video I/O (needs a cuvid-enabled FFmpeg + PyAV).
#     The PyPI `av` wheel is software-only — see benchmarks/README.md for
#     the FFmpeg+PyAV self-build recipe. The `[nvdec]` extra declares the
#     `av>=12` dependency; it does NOT build FFmpeg for you.
pip install "mosaic-temporal-gpu[nvdec]"

If you skip step 1, pip will resolve torch to the CPU build from PyPI and every CUDA-only call will fail at runtime — there is no CPU fallback on purpose. NVIDIA driver ≥ R535 and CUDA ≥ 12.0 are prerequisites. Until 0.1.0 ships to PyPI, install from source:

git clone https://github.com/hinanohart/mosaic-temporal-gpu
cd mosaic-temporal-gpu
pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision
pip install -e ".[dev]"

Quickstart

from pathlib import Path
from mosaic_temporal_gpu import run_pipeline

stats = run_pipeline(
    input_video=Path("input.mp4"),
    output_video=Path("output.mp4"),
    tile_dir=Path("tiles/"),       # keyword-only
    fps=30,                        # NVENC output frame rate (input fps
                                   # auto-detection lands in 0.2)
    cq=19,                         # h264_nvenc constant-quality (lower = better)
)
print(stats)
# {"frames": 720, "width": 1920, "height": 1080,
#  "fps": 30, "active_codec": "h264_cuvid"}

Pass a D1Config to override the default vivid_b preset:

from mosaic_temporal_gpu import D1Config, run_pipeline
run_pipeline(..., config=D1Config.from_preset("vivid_b"))

For 0.1.0 we ship the vivid_b preset only (saturation_boost=2.10, mkl_hybrid, neighbor_swap_rounds=5). Additional presets and a CLI front-end are deferred to 0.2 to keep the launch surface narrow.

The active_codec field in the return value is how you confirm NVDEC engaged on the decode side ("h264_cuvid" / "hevc_cuvid"); if it silently falls back to software, the reader raises before any frame is processed — see the R8 assertion in io/nvdec.py.

What works today (component-level)

import torch
from mosaic_temporal_gpu import D1Config
from mosaic_temporal_gpu.kernels.cost_matrix import compute_cost_matrix_gpu
from mosaic_temporal_gpu.solvers.torch_lap import TorchLapSolver

cfg = D1Config.from_preset("vivid_b")          # ✅ schema + preset
cost = compute_cost_matrix_gpu(cells, tiles)   # ✅ GPU cost matrix (CUDA req'd)
assignment = TorchLapSolver().solve(cost)      # ✅ GPU Hungarian

NvdecReader / NvencWriter are likewise importable and tested on CPU host for their error paths; full round-trip needs CUDA.

Parity guarantee (planned, not yet wired)

The release contract is: for each frame of a fixed 24-frame synthetic clip, SSIM(mosaic_temporal_gpu candidate, mosaicraft CPU reference) ≥ 0.98. The test exists (tests/test_parity_vs_mosaicraft.py, @pytest.mark.parity), but GitHub's free runners have no CUDA, so the parity job is not in CI today — it runs locally on a CUDA host with pytest -m parity. A scheduled GPU runner (Modal / RunPod) is queued for 0.1.0 final. Output is not bit-exact (GPU reductions are non-associative); the SSIM gate is the operative contract.

Repository layout

src/mosaic_temporal_gpu/
  __init__.py            # version, public API (D1Config + exceptions today)
  _version.py            # single source of truth
  config.py              # D1Config schema (mirror of mosaic-temporal's GPU-valid subset)
  kernels/
    cost_matrix.py       # GPU cost matrix (torch.cdist on CUDA; Triton port = v0.2)
    oklab_grid.py        # GPU Oklab grid mean (torch view+reduce; Triton port = v0.2)
  solvers/
    torch_lap.py         # torch-linear-assignment wrapper
  io/
    nvdec.py             # PyAV NVDEC reader
    nvenc.py             # PyAV NVENC writer
  pipeline.py            # end-to-end run_pipeline (single CUDA stream;
                         # 3-stream overlap is v0.2)
tests/
  test_parity_vs_mosaicraft.py   # SSIM ≥ 0.98 gate (xfail until CUDA CI)
  test_pipeline_smoke.py         # run_pipeline public-API contract
  test_kernel_shapes.py
  test_solver_torch_lap.py
  test_io_bridges.py
  test_config_schema.py
  test_version_smoke.py

Roadmap

0.1.0 — run_pipeline() shipped (single-stream NVDEC → mosaic → NVENC); parity gate green on a CUDA runner (Modal / RunPod queued); bench-spike sign-off on Kaggle T4.
0.2 — 3-stream CUDA overlap (decode | compute | encode); DLPack zero-copy on both ends of the video bridge; Triton kernels for cost matrix and Oklab grid (replace torch.cdist / torch.view+mean once we benchmark a real win); CLI front-end; additional presets.
0.3 — RAFT optical flow on GPU for temporal coherence; flow_warp module.
1.0 — Stable parity gate across two driver/CUDA upgrades; one breaking-change cycle behind us.

Relation to siblings

mosaicraft (image mosaic, pure numpy/cv2/scipy) — used here as the CPU reference for the parity gate and for the Oklab / MKL OT / Laplacian primitives.
mosaic-temporal (video mosaic, CPU/GPU dual path) — the portable sibling. Same D1Config surface, so config files port between the two.

License

MIT. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Sibling

Release history Release notifications | RSS feed

This version

0.1.0

May 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mosaic_temporal_gpu-0.1.0.tar.gz (36.5 kB view details)

Uploaded May 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mosaic_temporal_gpu-0.1.0-py3-none-any.whl (25.7 kB view details)

Uploaded May 12, 2026 Python 3

File details

Details for the file mosaic_temporal_gpu-0.1.0.tar.gz.

File metadata

Download URL: mosaic_temporal_gpu-0.1.0.tar.gz
Upload date: May 12, 2026
Size: 36.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mosaic_temporal_gpu-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`43d262b3e8eaab4503a8b137d42a6d7b85529a68b9735da8ae94f91652340d30`
MD5	`f2d110f3f2fdaedd39f718af6e5156b0`
BLAKE2b-256	`0dbd2c82729a87935cb500c73e4b9c9ecf0005ae0dd7ecdfde8aa09fb0c7714a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mosaic_temporal_gpu-0.1.0.tar.gz:

Publisher: release.yml on hinanohart/mosaic-temporal-gpu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mosaic_temporal_gpu-0.1.0.tar.gz
- Subject digest: 43d262b3e8eaab4503a8b137d42a6d7b85529a68b9735da8ae94f91652340d30
- Sigstore transparency entry: 1520054037
- Sigstore integration time: May 12, 2026
Source repository:
- Permalink: hinanohart/mosaic-temporal-gpu@a34091e2328c48e2d026d3c0129ee4cf8ddd0a2d
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/hinanohart
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@a34091e2328c48e2d026d3c0129ee4cf8ddd0a2d
- Trigger Event: push

File details

Details for the file mosaic_temporal_gpu-0.1.0-py3-none-any.whl.

File metadata

Download URL: mosaic_temporal_gpu-0.1.0-py3-none-any.whl
Upload date: May 12, 2026
Size: 25.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mosaic_temporal_gpu-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2cb34b357f248a245f18e46e44b4cedfd4e5348a2b71c1c1691f3d046855bd07`
MD5	`ec706f825904ccc2004f711102719839`
BLAKE2b-256	`3058be40084e074c88db7793f68e594394e91e6e36a70f17cdd6feaf303e27ab`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mosaic_temporal_gpu-0.1.0-py3-none-any.whl:

Publisher: release.yml on hinanohart/mosaic-temporal-gpu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mosaic_temporal_gpu-0.1.0-py3-none-any.whl
- Subject digest: 2cb34b357f248a245f18e46e44b4cedfd4e5348a2b71c1c1691f3d046855bd07
- Sigstore transparency entry: 1520054047
- Sigstore integration time: May 12, 2026
Source repository:
- Permalink: hinanohart/mosaic-temporal-gpu@a34091e2328c48e2d026d3c0129ee4cf8ddd0a2d
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/hinanohart
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@a34091e2328c48e2d026d3c0129ee4cf8ddd0a2d
- Trigger Event: push

mosaic-temporal-gpu 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

mosaic-temporal-gpu

Positioning

Install (once 0.1.0 ships to PyPI)

Quickstart

What works today (component-level)

Parity guarantee (planned, not yet wired)

Repository layout

Roadmap

Relation to siblings

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance