Skip to main content

Zero-PyAV macOS H.264 encode (Apple VideoToolbox) for pdum.rfb

Project description

habemus-papadum-vtenc (import pdum.vtenc)

macOS host NV12 → H.264 Annex B via Apple's VideoToolbox (VTCompressionSession), with no PyAV and no ffmpeg. The companion encoder for pdum.rfb (PyPI: habemus-papadum-rfb) on Apple Silicon — the counterpart of habemus-papadum-nvenc on NVIDIA. A uv workspace member of this repo. Design notes: docs/mlx_metal_videotoolbox_encoder_design.md.

Why it exists:

  1. Hardware H.264 on macOS without PyAV. VideoToolbox is the Apple-Silicon hardware encoder; this binds it directly, so the GPU path needs no ffmpeg layer.
  2. MLX-friendly. Its encode() takes any Python buffer-protocol object, so an evaluated MLX mx.array (Apple-Silicon unified memory) feeds it directly.

What's ours

Everything is ours — there is no vendored SDK (VideoToolbox/CoreVideo/CoreMedia are macOS system frameworks):

src/cpp/vtenc_ext.mm        OURS — the only native code; a thin pybind11 binding over
                            VTCompressionSession (Objective-C++).
src/pdum/vtenc/__init__.py  OURS — Python surface + single-extension loader.
CMakeLists.txt              OURS — pybind11 3.0.4; -framework links; one _vtenc module.
build-wheel.sh              OURS — self-contained wheel build (delocate).

Behaviour (matches the pdum.rfb invariants)

  • NV12 in → H.264 Annex B out (start codes, in-band SPS/PPS on every IDR — what the browser's WebCodecs VideoDecoder wants).
  • Low-latency, no frame reordering (no B-frames ⇒ output order == input order) and synchronous 1-in-1-out: each encode() returns its own frame's access unit (CompleteFrames after each submit) — required for correct seq attribution.
  • BT.601 limited range VUI (matches pdum.rfb's gpu.rgb_to_nv12 kernel), so a browser decodes the color correctly.
  • Fixed-resolution, even dimensions; one VTCompressionSession per instance.

Usage

import numpy as np
from pdum.vtenc import VtEncoder

enc = VtEncoder(1920, 1080, fps=30, bitrate=12_000_000)
nv12 = np.zeros((1080 * 3 // 2, 1920), dtype=np.uint8)   # contiguous NV12 (Y then UV)
# ... fill nv12 (e.g. from an evaluated MLX array) ...
annexb = enc.encode(nv12, force_idr=True)                # bytes; H.264 Annex B
annexb += enc.flush()
print(enc.codec_string)                                  # e.g. "avc1.420028" (from the SPS)
enc.close()

encode() accepts any contiguous (H*3//2, W) uint8 buffer-protocol object — numpy or an evaluated MLX mx.array (call mx.eval(frame) first; MLX is lazy).

VtEncoder.codec_string is the avc1.PPCCLL string derived from the actual emitted SPS (VideoToolbox picks the level from the resolution, so it is not a constant — 1080p Baseline is avc1.420028, not avc1.42E01F). Empty until the first keyframe.

Build & test (local, CMake)

cmake -S . -B build -G Ninja
cmake --build build -j

Build wheels (maintainer)

./build-wheel.sh                                 # cp314 -> dist/habemus_papadum_vtenc-*.whl
PYTHON_VERSIONS="3.12 3.13 3.14" ./build-wheel.sh

Requires only Xcode Command Line Tools (clang + the macOS SDK frameworks); the full Metal toolchain is not needed for v1. The wheel bundles nothing beyond the extension — the frameworks come from macOS, as they must. Publishing to PyPI is done by scripts/publish.sh, not from CI.

Scope / caveats

  • Fixed-resolution NV12 in, Annex B out, one encoder per instance. H.264 only (HEVC is a follow-up). No EncoderBackend/serve() wiring yet — that's the pdum.rfb integration.
  • Input is a host-visible (CPU / unified-memory) NV12 buffer, memcpy'd into an encoder-owned CVPixelBuffer. Wrapping an MLX unified-memory buffer as the CVPixelBuffer backing directly (true zero-copy) is a follow-up.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

habemus_papadum_vtenc-0.2.1-cp314-cp314-macosx_12_0_arm64.whl (92.6 kB view details)

Uploaded CPython 3.14macOS 12.0+ ARM64

habemus_papadum_vtenc-0.2.1-cp313-cp313-macosx_12_0_arm64.whl (92.6 kB view details)

Uploaded CPython 3.13macOS 12.0+ ARM64

habemus_papadum_vtenc-0.2.1-cp312-cp312-macosx_12_0_arm64.whl (92.5 kB view details)

Uploaded CPython 3.12macOS 12.0+ ARM64

File details

Details for the file habemus_papadum_vtenc-0.2.1-cp314-cp314-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for habemus_papadum_vtenc-0.2.1-cp314-cp314-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 ac4c9769b1e08f7f589cba884d370eba11701d1a48e6cb1b8bc195c125f5f01f
MD5 a77431c24c0d1fddf1d69d9e7893ce3e
BLAKE2b-256 6aa5f7018586ea2481d92355ced9400d797102cbfcb573eaebf0e96cd392d045

See more details on using hashes here.

File details

Details for the file habemus_papadum_vtenc-0.2.1-cp313-cp313-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for habemus_papadum_vtenc-0.2.1-cp313-cp313-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 a4f408b491740d781e9f04a7ebc1168cab29dcf51c2f7275662848fc23c74017
MD5 aac054c8fb5b2ba1870ecc6db7809d67
BLAKE2b-256 7c3b428be71cc3c5f9c6b394c3fe0036e120827e57e4b2953aee4030b70636c2

See more details on using hashes here.

File details

Details for the file habemus_papadum_vtenc-0.2.1-cp312-cp312-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for habemus_papadum_vtenc-0.2.1-cp312-cp312-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 c595299379f520ec2b068b3d58e836b0a0884b5811e6b8a5f836a1cabb2288fb
MD5 212ffa232193022cf514956c8f07417f
BLAKE2b-256 ac8b3f44fb46e9479d1fe61102baebbf8531ccd67fd93e10e39fec6adffc605d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page