Skip to main content

High-performance cross-platform MTCNN face detection with CUDA and Apple Neural Engine support

Project description

PyMTCNN

High-performance cross-platform MTCNN face detection with CUDA and Apple Neural Engine support.

Overview

PyMTCNN is a pure Python implementation of MTCNN (Multi-task Cascaded Convolutional Networks) with multi-backend support for optimal performance across different hardware platforms. It achieves 175.7x speedup over baseline Python implementations while maintaining 95% IoU accuracy.

Key Features

  • Cross-Platform: Works on Windows, Linux, and macOS
  • Multi-Backend: Auto-selects best backend (CoreML, CUDA, or CPU)
  • High Performance:
    • Apple Silicon (CoreML): 34.26 FPS
    • NVIDIA GPUs (CUDA): 50+ FPS
    • CPU fallback: 5-10 FPS
  • Accurate: 95% IoU agreement with C++ OpenFace baseline
  • Easy to Use: Simple, unified Python API
  • Hardware Accelerated: Leverages Apple Neural Engine or NVIDIA CUDA
  • Flexible: Single-frame or batch processing modes
  • Production Ready: Optimized for real-time video analysis

Performance

Backend Hardware FPS ms/frame
CoreML Apple M1/M2/M3 34.26 29.2
ONNX+CUDA NVIDIA RTX 3090 50+ <20
ONNX+CPU Intel/AMD CPU 5-10 100-200

Speedup: 175.7x faster than baseline Python implementation

Requirements

  • Python: 3.8 or later
  • OS: macOS, Windows, or Linux
  • Hardware (one of):
    • Apple Silicon (M1, M2, M3) for CoreML
    • NVIDIA GPU with CUDA for GPU acceleration
    • Any CPU for CPU fallback

Installation

From PyPI (Recommended)

Choose the installation that matches your hardware:

macOS with Apple Silicon

pip install pymtcnn[coreml]

NVIDIA GPU (CUDA)

pip install pymtcnn[onnx-gpu]

CPU only

pip install pymtcnn[onnx]

All backends (development)

pip install pymtcnn[all]

From Source

git clone https://github.com/johnwilsoniv/pymtcnn.git
cd pymtcnn
pip install -e .[coreml]  # or [onnx-gpu] or [onnx]

Quick Start

Auto-Backend Selection (Recommended)

PyMTCNN automatically selects the best available backend:

import cv2
from pymtcnn import MTCNN

# Auto-select best backend (CoreML on Mac, CUDA on NVIDIA, CPU fallback)
detector = MTCNN(verbose=True)  # Shows which backend was selected

# Load image
img = cv2.imread("image.jpg")

# Detect faces
bboxes, landmarks = detector.detect(img)

# Process results
print(f"Detected {len(bboxes)} faces")
for i, bbox in enumerate(bboxes):
    x, y, w, h, conf = bbox
    print(f"Face {i+1}: ({x:.0f}, {y:.0f}) {w:.0f}×{h:.0f} (confidence: {conf:.3f})")

Force Specific Backend

from pymtcnn import MTCNN

# Force CoreML (Apple Neural Engine)
detector = MTCNN(backend='coreml')

# Force CUDA (NVIDIA GPU)
detector = MTCNN(backend='cuda')

# Force CPU
detector = MTCNN(backend='cpu')

Batch Video Processing

import cv2
from pymtcnn import MTCNN

# Initialize detector
detector = MTCNN()

# Load video frames
cap = cv2.VideoCapture("video.mp4")
frames = []
for _ in range(4):  # Process 4 frames at a time
    ret, frame = cap.read()
    if ret:
        frames.append(frame)

# Batch detection (cross-frame batching for maximum throughput)
results = detector.detect_batch(frames)

# Process results
for i, (bboxes, landmarks) in enumerate(results):
    print(f"Frame {i+1}: {len(bboxes)} faces detected")

Advanced: Direct Backend Access

For advanced users who need backend-specific features:

from pymtcnn import CoreMLMTCNN, ONNXMTCNN

# Use CoreML directly
coreml_detector = CoreMLMTCNN(verbose=True)

# Use ONNX directly with specific provider
onnx_detector = ONNXMTCNN(provider='cuda', verbose=True)

API Reference

CoreMLMTCNN

Main face detector class.

Constructor

CoreMLMTCNN(
    min_face_size=60,
    thresholds=[0.6, 0.7, 0.7],
    factor=0.709,
    coreml_dir=None,
    verbose=False
)

Parameters:

  • min_face_size (int): Minimum face size in pixels. Default: 60
  • thresholds (list): Detection thresholds for [PNet, RNet, ONet]. Default: [0.6, 0.7, 0.7]
  • factor (float): Image pyramid scale factor. Default: 0.709
  • coreml_dir (str): Path to CoreML models directory. Default: bundled models
  • verbose (bool): Enable verbose logging. Default: False

Methods

detect(image)

Detect faces in a single image using within-frame batching.

Parameters:

  • image (numpy.ndarray): Input image (BGR format, H×W×3)

Returns:

  • bboxes (numpy.ndarray): Bounding boxes (N×5), format: [x, y, w, h, confidence]
  • landmarks (numpy.ndarray): Facial landmarks (N×5×2), 5 points per face: left eye, right eye, nose, left mouth, right mouth

Performance: 31.88 FPS (31.4 ms/frame)

detect_batch(frames)

Detect faces in multiple frames using cross-frame batching.

Parameters:

  • frames (list): List of images (each BGR format, H×W×3)

Returns:

  • results (list): List of (bboxes, landmarks) tuples, one per frame

Performance: 34.26 FPS (29.2 ms/frame) with batch_size=4

Recommended batch size: 4 frames for optimal throughput

Performance Guide

When to Use Each Method

  • detect(): Use for real-time per-frame processing, webcam feeds, or when you need lowest latency
  • detect_batch(): Use for offline batch video processing, maximum throughput, or when processing multiple frames simultaneously

Optimization Tips

  1. Batch Size: Use 4 frames for optimal throughput

    • Larger batches (8, 16) are slower due to overhead
  2. Frame Resolution: Performance tested on 1920×1080

    • Lower resolution → faster processing
    • Higher resolution → more candidates, may require batch splitting
  3. Min Face Size: Increase min_face_size for better performance

    • Default: 60 pixels
    • 80-100 pixels: 1.2-1.5x faster (may miss smaller faces)

Examples

See the examples/ directory for complete examples:

  • single_frame_detection.py: Basic single-frame face detection
  • batch_processing.py: Batch video processing
  • s1_integration_example.py: Integration with S1 video pipeline

Accuracy

PyMTCNN maintains high accuracy while achieving exceptional performance:

  • Mean IoU: 95% vs C++ OpenFace baseline
  • Detection Agreement: 100% (same faces detected)
  • Validation: Tested on 30 frames from real-world patient videos

Architecture

PyMTCNN uses a three-stage cascade architecture:

  1. PNet (Proposal Network): Fast candidate generation using image pyramid
  2. RNet (Refinement Network): Candidate refinement with batching
  3. ONet (Output Network): Final bbox regression and landmark prediction

All networks are converted to CoreML FP32 format with flexible batch dimensions (1-50) for optimal ANE utilization.

Optimization Journey

PyMTCNN achieved a 175.7x speedup through multiple optimization phases:

Phase Implementation FPS Speedup Status
Baseline Pure Python CNN 0.195 1.0x
Phase 1 Vectorized NumPy 0.910 4.7x
Phase 2 ONNX Runtime CPU 5.870 30.1x
Phase 3 CoreML + ANE 13.56 69.5x
Phase 4 Within-Frame Batching 31.88 163.5x
Phase 5 Cross-Frame Batching 34.26 175.7x

See docs/OPTIMIZATION_JOURNEY.md for the complete story.

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.

You are free to:

  • Share: Copy and redistribute the material
  • Adapt: Remix, transform, and build upon the material

Under the following terms:

  • Attribution: You must give appropriate credit
  • NonCommercial: You may not use the material for commercial purposes

See LICENSE for full terms.

Citation

If you use PyMTCNN in your research, please cite:

@software{pymtcnn2025,
  title={PyMTCNN: High-Performance MTCNN Face Detection for Apple Silicon},
  author={SplitFace},
  year={2025},
  url={https://github.com/your-org/PyMTCNN}
}

Acknowledgments

  • Original MTCNN paper: Zhang et al., "Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks"
  • C++ OpenFace implementation: Tadas Baltrušaitis et al.
  • Apple Neural Engine optimization insights from the CoreML community

Support

For issues, questions, or contributions, please visit the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymtcnn-1.1.0.tar.gz (2.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pymtcnn-1.1.0-py3-none-any.whl (2.8 MB view details)

Uploaded Python 3

File details

Details for the file pymtcnn-1.1.0.tar.gz.

File metadata

  • Download URL: pymtcnn-1.1.0.tar.gz
  • Upload date:
  • Size: 2.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pymtcnn-1.1.0.tar.gz
Algorithm Hash digest
SHA256 58bf4401c49e9d42b7c4a06cfac032a353b0ab6698ae1704cf293c4729a666c1
MD5 a126d315d9db335ecbb23082cb2466fd
BLAKE2b-256 063bed90adae17a6ca995c98d5703b37f6472f6a1e0a287140e0e7810ecf22cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymtcnn-1.1.0.tar.gz:

Publisher: publish.yml on johnwilsoniv/pymtcnn

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymtcnn-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: pymtcnn-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pymtcnn-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8af1031842cb1cbad2d488c58220977029d402a5e2bedd2cf7a58a914035ddb7
MD5 939e2b02902377e0f29c0cb36304c079
BLAKE2b-256 98849ff3effdb73bcae893a26384d931e488a251010c9de34da439e109fd98fb

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymtcnn-1.1.0-py3-none-any.whl:

Publisher: publish.yml on johnwilsoniv/pymtcnn

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page