GPU-accelerated, cross-platform CLI to blur human faces in MP4 videos using PyTorch (YOLOv8-face / MTCNN backends), with audio passthrough and built-in evaluation metrics.
Project description
Blurface
Blurface is a cross-platform command-line tool — and a tiny Python library — that blurs every human face in an MP4 video with a fully GPU-accelerated PyTorch pipeline. The default detector is YOLOv8-face via ultralytics (a state-of-the-art single-stage detector, robust on moving and partially-occluded faces); a lighter facenet-pytorch MTCNN backend is available as a fallback. The pixel mosaic is computed on the GPU with torch.nn.functional.interpolate, and the original audio track is re-muxed back into the output via ffmpeg. A built-in evaluation module emits a CSV, a JSON metrics report, and six PNG plots so you can quantify every run.
Highlights
- Pure PyTorch, end-to-end. No TensorFlow anywhere on the hot path. Detection and mosaic both live on the same
torch.device. - State-of-the-art detector for motion. Default backend is YOLOv8-face — single forward pass per frame, low jitter on moving faces, no
transformersimport noise. - Cross-platform GPU acceleration. Auto-selects CUDA on Windows / Linux, MPS on Apple Silicon, CPU otherwise — with graceful fallback.
- Batched inference + FP16. Set
--batch-sizeto whatever your GPU can hold; add--halffor FP16 on CUDA. - Rectangular or elliptical mosaic with a configurable block size.
- Audio passthrough via the
ffmpegCLI (preferred) orffmpeg-python(fallback). - Built-in evaluation. Per-frame metrics CSV + JSON summary + six PNG plots and an optional CPU-vs-GPU benchmark.
- Three console scripts.
blurface,blurface-eval, andblurface-install-gpuare registered on install.
Table of contents
- Blurface
Installation
Blurface targets Python ≥ 3.9 and is verified on Windows, Linux, and macOS.
1. Create / activate a Python environment
# Recommended: a clean conda env
conda create -n blurface python=3.11 -y
conda activate blurface
2. Install PyTorch — with CUDA wheels if you have an NVIDIA GPU
This is the single most common failure point. The default pip install torch on Windows installs the CPU build, which is why --device cuda would otherwise refuse to run.
NVIDIA GPU (recommended) — CUDA 12.1 wheels:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
Newer GPUs (e.g. RTX 50-series / Blackwell,
sm_120compute): Standard CUDA 12.1/12.4 builds will lack your GPU's kernel architecture and crash withCUDA error: no kernel image is available. Install the PyTorch nightly bundled with CUDA 13.0 (or newer):pip install --pre torch torchvision \ --index-url https://download.pytorch.org/whl/nightly/cu130 --upgrade
If your NVIDIA driver is older, you may need cu118 instead. Check with nvidia-smi and the official PyTorch install matrix.
Apple Silicon (MPS):
pip install torch torchvision
CPU only:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
3. Install Blurface
From PyPI:
pip install blurface
Or from a git clone (editable):
git clone https://github.com/Ezharjan/blurface.git
cd blurface
pip install -e .
This pulls ultralytics, opencv-python, ffmpeg-python, matplotlib, pandas, tqdm, Pillow, … and registers three console scripts: blurface, blurface-eval, blurface-install-gpu.
Optional MTCNN fallback backend:
pip install "blurface[mtcnn]"
4. Install FFmpeg
The audio re-mux step needs the ffmpeg binary on PATH:
| Platform | Command |
|---|---|
| Windows | choco install ffmpeg (or download from https://ffmpeg.org/download.html and add ffmpeg.exe to PATH) |
| macOS | brew install ffmpeg |
| Linux | sudo apt install ffmpeg |
If ffmpeg isn't available the pipeline still produces a video-only MP4 — it just skips the audio.
Verify your GPU
After installation, run the diagnostic:
blurface-install-gpu
You should see something like:
========================================================================
PyTorch
========================================================================
torch : 2.4.1+cu121
CUDA build : 12.1
cuda avail. : True
device[0] : NVIDIA GeForce RTX 4090 (sm_89, 24.0 GB)
If cuda avail. is False but nvidia-smi works, you're on the CPU build of torch — repair it with:
blurface-install-gpu --fix --cuda 12.1
The same script also accepts --cpu (force CPU wheels) and --nightly (use the PyTorch nightly index for very new architectures).
Usage: blurface CLI
blurface <input.mp4> [options]
The most common flags:
| Flag | Default | Description |
|---|---|---|
input |
— | Path to the input MP4 video (required). |
--output, -o |
<stem><YYMMDDHHMM>.mp4 |
Output file path. |
--mosaic-size, -m |
10 |
Mosaic block size in pixels; higher = coarser blur. |
--blur-shape, -s |
ellipse |
ellipse or rectangle. |
--device, -d |
auto |
auto, cuda, mps, or cpu. |
--backend |
auto |
auto (→ yolo), yolo, or mtcnn. |
--batch-size, -b |
8 |
Frames per detection batch. |
--half |
off | FP16 inference on CUDA. |
--confidence, -c |
0.5 |
Minimum face confidence in [0, 1]. |
--imgsz |
640 |
YOLO inference image size. Raise for tiny faces, lower for speed. |
--min-face-size |
20 |
MTCNN minimum face edge in px. |
--model-path |
— | Local YOLO-face .pt file (skips the download). |
--model-url |
— | Custom URL for YOLO-face weights. |
--no-cpu-fallback |
off | Hard-fail when CUDA/MPS is requested but unavailable. |
--report |
— | Path for a JSON metrics report. |
--plots-dir |
— | If set, evaluation PNGs and CSV are written here. |
--quiet / --verbose |
off | Lower / raise the log level. |
--version |
— | Print the installed version and exit. |
Run blurface --help for the full reference and worked examples.
Worked examples
# 1. Defaults: ellipse mosaic, auto device, YOLOv8-face detector.
blurface input.mp4
# 2. Force CUDA, FP16, larger batch, custom output path.
blurface input.mp4 -d cuda -b 32 --half -o out/blurred.mp4
# 3. Coarser rectangular mosaic (block size 20).
blurface input.mp4 -m 20 -s rectangle
# 4. Use the MTCNN fallback backend (needs the [mtcnn] extra).
blurface input.mp4 --backend mtcnn
# 5. Emit a full JSON metrics report and a directory of PNG plots.
blurface input.mp4 --report out/report.json --plots-dir out/plots
# 6. Provide your own YOLO-face weights (skips the download).
blurface input.mp4 --model-path /path/to/yolov8n-face.pt
# 7. Raise the inference image size for lots of tiny faces.
blurface input.mp4 --imgsz 1280 --batch-size 4
# 8. Full evaluation: report + plots + CPU-vs-GPU benchmark
blurface-eval video.mp4 --output D:\blurface\out\blurred.mp4 --report-dir D:\blurface\out\report --device auto --batch-size 8 --benchmark --benchmark-frames 120
Python API
from blurface import FaceMosaicProcessor
from blurface.evaluate import render_plots
proc = FaceMosaicProcessor(
device="auto", # cuda > mps > cpu, with fallback
backend="yolo", # or "mtcnn", or "auto"
batch_size=16,
half=True, # FP16 on CUDA (no-op elsewhere)
imgsz=640,
confidence=0.5,
)
report = proc.process_video(
"input.mp4", "output.mp4",
report_path="out/report.json",
collect_metrics=True,
)
render_plots(report, "out/plots")
print(f"{report.realtime_fps:.1f} fps on {report.device} ({report.backend})")
Public objects re-exported from the top-level package:
FaceMosaicProcessor— the pipeline.RunReport,FrameMetric— dataclasses returned byprocess_video.select_device(preferred, allow_cpu_fallback)— the device picker.describe_device(device)— human-readable device label.build_detector(...),YoloFaceDetector,MtcnnDetector— detection backends.
Pipeline internals
The video is processed in five clearly-separated stages, kept on the same torch.device to avoid host round-trips:
- Decode (CPU).
cv2.VideoCapturereads MP4 frames as BGRuint8numpy arrays. Frames are accumulated into a list of length--batch-size. - Detect (device). The batch is converted to RGB and handed to the active detector backend. The detector returns, per frame, an
(N, 4)array of[x1, y1, x2, y2]boxes in original pixel space and an(N,)array of confidences. - Mosaic (device). Each frame is uploaded once to the device as a CHW float tensor (FP16 if
--half). For every box:- the cropped face region is down-sampled to
mosaic_size × mosaic_sizewithF.interpolate(mode="bilinear", align_corners=False); - it is then up-sampled back to the box size with
F.interpolate(mode="nearest")— that's the classic pixelation effect, computed in a single bilinear + nearest kernel pair; - for
blur_shape="ellipse"an inscribed elliptical mask is built on-device ((x − cx)² / rx² + (y − cy)² / ry² ≤ 1) and the mosaic is alpha-blended over the original — only the elliptical region is replaced, the corners of the bounding box are preserved.
- the cropped face region is down-sampled to
- Encode (CPU). The blurred frame is clamped, cast back to
uint8, transposed to HWC, copied to the CPU, and written to a temporarymp4v-encoded MP4 withcv2.VideoWriter. - Mux (FFmpeg). Finally
ffmpegre-encodes the temporary video as H.264 (libx264, CRF 20,mediumpreset) and stream-copies the original audio track with-c:a copy -map 0:v:0 -map 1:a:0?. The audio is preserved bit-for-bit — no re-encoding, no quality loss, same codec / bitrate / sample rate as the source. If stream-copy is rejected (rare; happens when the source audio codec isn't allowed in the MP4 container, e.g. PCM) Blurface falls back to a 192 kbit/s AAC re-encode.ffprobethen verifies the output actually contains audio when the source did — mismatches raise rather than silently producing a muted file. Ifffmpegis missing and the source has audio, Blurface fails loudly with install instructions instead of dropping the audio.
Throughout the run, optional per-frame metrics (detect / mosaic latency, GPU memory, face counts, mean confidence) are collected into a RunReport, which render_plots turns into PNG charts and a CSV.
┌──────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────┐ ┌──────────┐
│ decode │ → │ detect │ → │ mosaic │ → │ encode │ → │ mux │
│ (cv2) │ │ (YOLO/MTCNN) │ │ (torch.F) │ │ (cv2) │ │ (ffmpeg) │
│ CPU │ │ device │ │ device │ │ CPU │ │ CPU │
└──────────┘ └──────────────┘ └──────────────┘ └──────────┘ └──────────┘
│ │
▼ ▼
per-frame metrics ──→ RunReport ──→ CSV / JSON / PNG plots
Performance knobs
--batch-sizeis the single biggest lever once CUDA is enabled. Raise it until you hit your GPU's memory limit.--halfroughly halves the detector's memory footprint on CUDA and is faster on Ampere/Ada/Hopper. It has no effect on CPU or MPS.--imgsztrades detector accuracy for speed. Default 640 is a good compromise; 1280 helps on tiny faces in 4K footage; 480 is markedly faster on tight latency budgets.--mosaic-sizeis not a speed knob — the down-sample target is tiny either way — but it changes the visual effect. 4–8 = strongly recognisable as pixelation; 12–20 = blocky, friendlier on small faces; 30+ = single coloured patch.
Detection methods explained
Blurface ships two interchangeable backends with the same detect(frames_rgb) API.
YOLOv8-face (default, --backend yolo)
A single-stage anchor-free detector built on Ultralytics' YOLOv8 backbone, fine-tuned on a face-detection dataset. Why it is the default:
- Single forward pass per frame. Detection is a single conv-net evaluation, so latency stays flat as the number of faces grows. Cascade detectors (MTCNN, Haar, etc.) keep proposing and refining candidates, which inflates per-frame cost on busy scenes.
- Robust to motion blur, profile angles and partial occlusion. The anchor-free head and the deep backbone learn richer face priors than the small classification networks inside MTCNN's P/R/O stages.
- Lower jitter across frames. Because the model is deeper and operates at a single scale per call, box positions are noticeably more stable from frame to frame than MTCNN's, giving smoother mosaics in the output.
- GPU-friendly. Batched inference on CUDA is the design point; FP16 is a one-flag switch.
Weights (yolov8n-face.pt, ~6 MB) are downloaded once from the akanametov/yolo-face release into ~/.cache/blurface/ and reused on subsequent runs. Override with --model-path or --model-url.
facenet-pytorch MTCNN (fallback, --backend mtcnn)
A three-stage cascade detector (P-Net → R-Net → O-Net) from facenet-pytorch. Useful when:
- you cannot install
ultralytics(e.g. very old Python, restricted environments), - you want a second opinion on a hard clip,
- you specifically need MTCNN's facial landmark output (landmarks are computed internally but not exposed by Blurface today),
- you're CPU-only and prefer MTCNN's lighter memory footprint.
Trade-offs: MTCNN is slower per frame on GPU than YOLOv8-face, less robust on motion-blurred or sideways faces, and produces more frame-to-frame jitter. The --min-face-size flag is honoured only by this backend.
Install with pip install "blurface[mtcnn]".
--backend auto
Tries YOLOv8-face first; if its ultralytics import or weight download fails, falls back to MTCNN. This is the default.
Evaluation: blurface-eval
blurface-eval runs the full pipeline and writes a complete report directory:
blurface-eval input.mp4 \
--output out/blurred.mp4 \
--report-dir out/report \
--device cuda --half --batch-size 16 \
--benchmark --benchmark-frames 240
It accepts the same backend / device / mosaic options as blurface, plus --benchmark and --benchmark-frames N, which produce a CPU-vs-GPU bar chart on a short subclip. Run blurface-eval --help for the full reference.
The output directory ends up looking like:
out/report/
├── report.json # full RunReport (incl. per-frame metrics)
├── summary.json # aggregate scorecard
├── per_frame_metrics.csv # one row per processed frame
├── summary.png # text scorecard, ready to share
├── faces_per_frame.png # detections across the timeline
├── latency_per_frame.png # detect vs mosaic vs total latency
├── fps_rolling.png # rolling throughput vs source FPS
├── gpu_memory.png # allocated GPU memory (CUDA only)
├── confidence_histogram.png # distribution of per-frame mean confidence
└── benchmark/ # only with --benchmark
├── cpu_vs_gpu.png
├── cpu_vs_gpu.json
├── benchmark_cpu.mp4
└── benchmark_cuda.mp4
Metrics reference
Every run produces, conceptually, three artefacts:
report.json— the fullRunReportdataclass: device, backend, source resolution / FPS, frames processed, processing FPS, total wall time, detect / mosaic / mux time breakdowns, total faces detected, average faces per frame, frames with faces, peak GPU memory, batch size, FP16 flag, mosaic configuration, confidence threshold, and the full per-frame metrics list.per_frame_metrics.csv— one row per processed frame with columns:frame_idx, num_faces, mean_confidence, detect_ms, mosaic_ms, total_ms, gpu_mem_mb.- PNG plots, each focused on a single question:
- faces_per_frame.png — how many faces were detected across the timeline.
- latency_per_frame.png — detect vs mosaic vs total latency per frame.
- fps_rolling.png — rolling throughput, overlaid with the source FPS line and the run's average processing FPS.
- gpu_memory.png — allocated GPU memory over time (CUDA only).
- confidence_histogram.png — distribution of per-frame mean detection confidences (on frames that had faces).
- summary.png — a monospaced text scorecard you can drop into a slide.
GPU diagnostic: blurface-install-gpu
A standalone helper to inspect and repair your PyTorch install:
# 1. Diagnose only (the default)
blurface-install-gpu
# 2. Reinstall with the right wheels for your CUDA driver
blurface-install-gpu --fix --cuda 12.1
# 3. Very new architectures (RTX 50-series / Blackwell, sm_120)
blurface-install-gpu --fix --nightly --cuda 13.0
# 4. Force the CPU build
blurface-install-gpu --fix --cpu
It reports Python, conda env, platform, PyTorch version + CUDA build, every visible CUDA device (with its compute capability and memory), MPS availability on Apple Silicon, the NVIDIA driver via nvidia-smi, and whether ffmpeg is on PATH. With --fix, it pip uninstalls torch + torchvision and reinstalls them from the appropriate wheel index.
Run as a module too: python -m blurface.install_gpu.
Testing
A minimal pytest suite ships with the repo. It builds a tiny synthetic clip and runs the pipeline end-to-end on CPU — no GPU or face dataset required.
pip install pytest
pytest -q
Tests live in tests/test_pipeline.py.
Troubleshooting
RuntimeError: CUDA requested but no CUDA device is available.
Your installed torch is the CPU build. Repair with the bundled diagnostic:
blurface-install-gpu --fix --cuda 12.1
…or manually:
pip uninstall -y torch torchvision
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
CUDA error: no kernel image is available for execution on the device
Your GPU's compute capability is newer than the CUDA version your PyTorch was built against (typical on RTX 50-series / Blackwell). Use the nightly + CUDA 13 wheels:
blurface-install-gpu --fix --nightly --cuda 13.0
Disabling PyTorch because PyTorch >= 2.4 is required but found 2.2.2
That's a warning emitted by the transformers library when something else in your environment imports it. Blurface's default --backend yolo does not pull transformers in, so the warning is harmless. If you need --backend mtcnn with an old torch, upgrade torch (see above) or pin pip install "transformers<4.40".
ImportError: ultralytics is required for the YOLO backend.
pip install ultralytics — or simply pip install blurface, which already depends on it.
CUDA out of memory. Lower --batch-size, enable --half, or lower --imgsz.
No audio in the output. This should never happen silently in v0.2.0 — if the source has audio and ffmpeg can't preserve it, Blurface raises with install instructions. If you do see a muted output, first check: did the source have an audio track? (Run ffprobe -i your_input.mp4 and look for a Stream #0:1: Audio: line.) If the source genuinely has no audio, the muted output is correct. If the source does have audio and you got a muted output anyway, please file a bug at https://github.com/Ezharjan/blurface/issues.
macOS MPS warnings about unimplemented ops. Harmless — those ops automatically fall back to CPU.
The downloaded YOLO weights file is corrupted / partial. Delete ~/.cache/blurface/yolov8n-face.pt and let the next run re-download, or pass --model-path to use a known-good copy.
Changelog
0.2.0 — 2026
- Audio preservation (bug fix). Previously, three silent-failure paths in the mux step could quietly produce a muted output: the outer wrapper caught any ffmpeg error and copied the audio-less temp file, the ffmpeg-python fallback re-encoded video alone on failure, and even on the happy path the audio was re-encoded to AAC 192k (a quality loss). The mux now:
- Stream-copies the original audio (
-c:a copy) — preserved bit-for-bit, same codec / bitrate / sample rate as the source. No re-encoding. - Probes the source with
ffprobeto decide whether to expect audio at all. - Falls back to AAC 192k only if stream-copy is rejected by the MP4 container.
- Verifies the output actually contains audio when the source did; raises if not.
- Raises a clear, actionable error (with install instructions) when ffmpeg is missing and the source has audio, instead of silently dropping the track.
- Stream-copies the original audio (
- Packaging:
blurface-install-gpunow ships inside the installed package, so the console script works afterpip install(it was broken before). PyPI metadata (project_urls,keywords, fullclassifiers, MANIFEST,pyproject.toml) brought up to standard. - Pipeline: fixed an aggregation bug where
RunReport.total_faces_detected,frames_with_faces,detect_time_s, andmosaic_time_swere0whenprocess_video(..., collect_metrics=False). They are now tracked independently of the per-frame list. - Report: new
frames_processedandtotal_faces_detectedfields onRunReport;summary.jsonand the PNG scorecard updated to match. - CLI: richer
--helpoutput (epilog with worked examples), new--verboseflag, more actionable error messages, validated--confidencerange, cleaner exit codes (0/1/2/130). blurface-install-gpu: lists every visible CUDA device (with compute capability + memory), reportsffmpegpresence, gains--nightlyfor new architectures, gains a module form (python -m blurface.install_gpu).blurface-eval: aligned defaults withblurface(confidence 0.5, benchmark-frames 240), exposes--backend,--imgsz,--half,--quiet.- Public API: top-level package re-exports
select_device,describe_device,build_detector,YoloFaceDetector,MtcnnDetectoralongside the existingFaceMosaicProcessor,RunReport,FrameMetric. - Docs: README rewritten with explicit pipeline-internals and detection-methods sections.
0.1.0
- Initial public release: GPU PyTorch pipeline, YOLOv8-face + MTCNN backends, FFmpeg audio re-mux, evaluation plots,
blurfaceandblurface-evalCLIs.
License
MIT — see LICENSE.
Contact
Issues and PRs welcome at https://github.com/Ezharjan/blurface.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file blurface-0.2.0.tar.gz.
File metadata
- Download URL: blurface-0.2.0.tar.gz
- Upload date:
- Size: 43.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cfc08d40298ade2f4a7c0cf19e39d1573c4f9e6a1522ffe7b089244db061f26b
|
|
| MD5 |
3550d071c9f5faecb1215916c83167eb
|
|
| BLAKE2b-256 |
2124e1e06b6504f8e1a94b102462d7b0b0bc0d572b583af841eae0af098fb71e
|
File details
Details for the file blurface-0.2.0-py3-none-any.whl.
File metadata
- Download URL: blurface-0.2.0-py3-none-any.whl
- Upload date:
- Size: 35.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32bac766ad2a77a96934305bba731643091a5a01251b6e2a1cd324840f76b183
|
|
| MD5 |
8eadce5a2c964a8b7cb09ea1f1de0164
|
|
| BLAKE2b-256 |
aad50f81f6b5c1e6d4ef00dba1b5e18107eb5317e5e8cbf4672faec20e8b5f37
|