Skip to main content

Zero-PyAV macOS H.264 encode (Apple VideoToolbox) for pdum.rfb

Project description

habemus-papadum-vtenc (import pdum.vtenc)

macOS host NV12 → H.264 Annex B via Apple's VideoToolbox (VTCompressionSession), with no PyAV and no ffmpeg. The companion encoder for pdum.rfb (PyPI: habemus-papadum-rfb) on Apple Silicon — the counterpart of habemus-papadum-nvenc on NVIDIA. A uv workspace member of this repo. Design notes: docs/mlx_metal_videotoolbox_encoder_design.md.

Why it exists:

  1. Hardware H.264 on macOS without PyAV. VideoToolbox is the Apple-Silicon hardware encoder; this binds it directly, so the GPU path needs no ffmpeg layer.
  2. MLX-friendly. Its encode() takes any Python buffer-protocol object, so an evaluated MLX mx.array (Apple-Silicon unified memory) feeds it directly.

What's ours

Everything is ours — there is no vendored SDK (VideoToolbox/CoreVideo/CoreMedia are macOS system frameworks):

src/cpp/vtenc_ext.mm        OURS — the only native code; a thin pybind11 binding over
                            VTCompressionSession (Objective-C++).
src/pdum/vtenc/__init__.py  OURS — Python surface + single-extension loader.
CMakeLists.txt              OURS — pybind11 3.0.4; -framework links; one _vtenc module.
build-wheel.sh              OURS — self-contained wheel build (delocate).

Behaviour (matches the pdum.rfb invariants)

  • NV12 in → H.264 Annex B out (start codes, in-band SPS/PPS on every IDR — what the browser's WebCodecs VideoDecoder wants).
  • Low-latency, no frame reordering (no B-frames ⇒ output order == input order) and synchronous 1-in-1-out: each encode() returns its own frame's access unit (CompleteFrames after each submit) — required for correct seq attribution.
  • BT.601 limited range VUI (matches pdum.rfb's gpu.rgb_to_nv12 kernel), so a browser decodes the color correctly.
  • Fixed-resolution, even dimensions; one VTCompressionSession per instance.

Usage

import numpy as np
from pdum.vtenc import VtEncoder

enc = VtEncoder(1920, 1080, fps=30, bitrate=12_000_000)
nv12 = np.zeros((1080 * 3 // 2, 1920), dtype=np.uint8)   # contiguous NV12 (Y then UV)
# ... fill nv12 (e.g. from an evaluated MLX array) ...
annexb = enc.encode(nv12, force_idr=True)                # bytes; H.264 Annex B
annexb += enc.flush()
print(enc.codec_string)                                  # e.g. "avc1.420028" (from the SPS)
enc.close()

encode() accepts any contiguous (H*3//2, W) uint8 buffer-protocol object — numpy or an evaluated MLX mx.array (call mx.eval(frame) first; MLX is lazy).

VtEncoder.codec_string is the avc1.PPCCLL string derived from the actual emitted SPS (VideoToolbox picks the level from the resolution, so it is not a constant — 1080p Baseline is avc1.420028, not avc1.42E01F). Empty until the first keyframe.

Build & test (local, CMake)

cmake -S . -B build -G Ninja
cmake --build build -j

Build wheels (maintainer)

./build-wheel.sh                                 # cp314 -> dist/habemus_papadum_vtenc-*.whl
PYTHON_VERSIONS="3.12 3.13 3.14" ./build-wheel.sh

Requires only Xcode Command Line Tools (clang + the macOS SDK frameworks); the full Metal toolchain is not needed for v1. The wheel bundles nothing beyond the extension — the frameworks come from macOS, as they must. Publishing to PyPI is done by scripts/publish.sh, not from CI.

Scope / caveats

  • Fixed-resolution NV12 in, Annex B out, one encoder per instance. H.264 only (HEVC is a follow-up). No EncoderBackend/serve() wiring yet — that's the pdum.rfb integration.
  • Input is a host-visible (CPU / unified-memory) NV12 buffer, memcpy'd into an encoder-owned CVPixelBuffer. Wrapping an MLX unified-memory buffer as the CVPixelBuffer backing directly (true zero-copy) is a follow-up.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

habemus_papadum_vtenc-0.3.0-cp314-cp314-macosx_12_0_arm64.whl (92.6 kB view details)

Uploaded CPython 3.14macOS 12.0+ ARM64

habemus_papadum_vtenc-0.3.0-cp313-cp313-macosx_12_0_arm64.whl (92.6 kB view details)

Uploaded CPython 3.13macOS 12.0+ ARM64

habemus_papadum_vtenc-0.3.0-cp312-cp312-macosx_12_0_arm64.whl (92.5 kB view details)

Uploaded CPython 3.12macOS 12.0+ ARM64

File details

Details for the file habemus_papadum_vtenc-0.3.0-cp314-cp314-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for habemus_papadum_vtenc-0.3.0-cp314-cp314-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 c85ffa478a6e803882feffdee3904f94fd6bd034dba7aa9005d47ac4008a0907
MD5 24d915e58a52c87f08445a24e50e16d3
BLAKE2b-256 61c9c21f2c4e56351f86534718526ee6cb5d91ce4c8c19ab998368c801a59cf5

See more details on using hashes here.

File details

Details for the file habemus_papadum_vtenc-0.3.0-cp313-cp313-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for habemus_papadum_vtenc-0.3.0-cp313-cp313-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 aada6be76c8aed64f1d319ea0d2ae7b2eb2143707049aed270f7ba2f224054fb
MD5 445cc172de612b0ea35b52b44315e543
BLAKE2b-256 922071422d427a1b409e4fde57689b00bcf9fc407b57928c43abc36f0d5a5d6d

See more details on using hashes here.

File details

Details for the file habemus_papadum_vtenc-0.3.0-cp312-cp312-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for habemus_papadum_vtenc-0.3.0-cp312-cp312-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 59f474b4f43866fcea3d16c3b0a3c85da16ed9fa9fc055b3beb4331d381aedb2
MD5 4ce164e0961e0c6990e3687d8ac369bf
BLAKE2b-256 829e8f524459497659498b8646c37e87cb16a7bd8a2973d7fd9923c760bb5d07

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page