High-performance cross-platform MTCNN face detection with CUDA and Apple Neural Engine support
Project description
PyMTCNN
High-performance cross-platform MTCNN face detection with CUDA and Apple Neural Engine support.
Overview
PyMTCNN is a pure Python implementation of MTCNN (Multi-task Cascaded Convolutional Networks) with multi-backend support for optimal performance across different hardware platforms. It achieves 175.7x speedup over baseline Python implementations while maintaining 95% IoU accuracy.
Key Features
- Cross-Platform: Works on Windows, Linux, and macOS
- Multi-Backend: Auto-selects best backend (CoreML, CUDA, or CPU)
- High Performance:
- Apple Silicon (CoreML): 34.26 FPS
- NVIDIA GPUs (CUDA): 50+ FPS
- CPU fallback: 5-10 FPS
- Accurate: 95% IoU agreement with C++ OpenFace baseline
- Easy to Use: Simple, unified Python API
- Hardware Accelerated: Leverages Apple Neural Engine or NVIDIA CUDA
- Flexible: Single-frame or batch processing modes
- Production Ready: Optimized for real-time video analysis
Performance
| Backend | Hardware | FPS | ms/frame |
|---|---|---|---|
| CoreML | Apple M1/M2/M3 | 34.26 | 29.2 |
| ONNX+CUDA | NVIDIA RTX 3090 | 50+ | <20 |
| ONNX+CPU | Intel/AMD CPU | 5-10 | 100-200 |
Speedup: 175.7x faster than baseline Python implementation
Requirements
- Python: 3.8 or later
- OS: macOS, Windows, or Linux
- Hardware (one of):
- Apple Silicon (M1, M2, M3) for CoreML
- NVIDIA GPU with CUDA for GPU acceleration
- Any CPU for CPU fallback
Installation
From PyPI (Recommended)
Choose the installation that matches your hardware:
macOS with Apple Silicon
pip install pymtcnn[coreml]
NVIDIA GPU (CUDA)
pip install pymtcnn[onnx-gpu]
CPU only
pip install pymtcnn[onnx]
All backends (development)
pip install pymtcnn[all]
From Source
git clone https://github.com/johnwilsoniv/pymtcnn.git
cd pymtcnn
pip install -e .[coreml] # or [onnx-gpu] or [onnx]
Quick Start
Auto-Backend Selection (Recommended)
PyMTCNN automatically selects the best available backend:
import cv2
from pymtcnn import MTCNN
# Auto-select best backend (CoreML on Mac, CUDA on NVIDIA, CPU fallback)
detector = MTCNN(verbose=True) # Shows which backend was selected
# Load image
img = cv2.imread("image.jpg")
# Detect faces
bboxes, landmarks = detector.detect(img)
# Process results
print(f"Detected {len(bboxes)} faces")
for i, bbox in enumerate(bboxes):
x, y, w, h, conf = bbox
print(f"Face {i+1}: ({x:.0f}, {y:.0f}) {w:.0f}×{h:.0f} (confidence: {conf:.3f})")
Force Specific Backend
from pymtcnn import MTCNN
# Force CoreML (Apple Neural Engine)
detector = MTCNN(backend='coreml')
# Force CUDA (NVIDIA GPU)
detector = MTCNN(backend='cuda')
# Force CPU
detector = MTCNN(backend='cpu')
Batch Video Processing
import cv2
from pymtcnn import MTCNN
# Initialize detector
detector = MTCNN()
# Load video frames
cap = cv2.VideoCapture("video.mp4")
frames = []
for _ in range(4): # Process 4 frames at a time
ret, frame = cap.read()
if ret:
frames.append(frame)
# Batch detection (cross-frame batching for maximum throughput)
results = detector.detect_batch(frames)
# Process results
for i, (bboxes, landmarks) in enumerate(results):
print(f"Frame {i+1}: {len(bboxes)} faces detected")
Advanced: Direct Backend Access
For advanced users who need backend-specific features:
from pymtcnn import CoreMLMTCNN, ONNXMTCNN
# Use CoreML directly
coreml_detector = CoreMLMTCNN(verbose=True)
# Use ONNX directly with specific provider
onnx_detector = ONNXMTCNN(provider='cuda', verbose=True)
API Reference
CoreMLMTCNN
Main face detector class.
Constructor
CoreMLMTCNN(
min_face_size=60,
thresholds=[0.6, 0.7, 0.7],
factor=0.709,
coreml_dir=None,
verbose=False
)
Parameters:
min_face_size(int): Minimum face size in pixels. Default: 60thresholds(list): Detection thresholds for [PNet, RNet, ONet]. Default: [0.6, 0.7, 0.7]factor(float): Image pyramid scale factor. Default: 0.709coreml_dir(str): Path to CoreML models directory. Default: bundled modelsverbose(bool): Enable verbose logging. Default: False
Methods
detect(image)
Detect faces in a single image using within-frame batching.
Parameters:
image(numpy.ndarray): Input image (BGR format, H×W×3)
Returns:
bboxes(numpy.ndarray): Bounding boxes (N×5), format: [x, y, w, h, confidence]landmarks(numpy.ndarray): Facial landmarks (N×5×2), 5 points per face: left eye, right eye, nose, left mouth, right mouth
Performance: 31.88 FPS (31.4 ms/frame)
detect_batch(frames)
Detect faces in multiple frames using cross-frame batching.
Parameters:
frames(list): List of images (each BGR format, H×W×3)
Returns:
results(list): List of (bboxes, landmarks) tuples, one per frame
Performance: 34.26 FPS (29.2 ms/frame) with batch_size=4
Recommended batch size: 4 frames for optimal throughput
Performance Guide
When to Use Each Method
detect(): Use for real-time per-frame processing, webcam feeds, or when you need lowest latencydetect_batch(): Use for offline batch video processing, maximum throughput, or when processing multiple frames simultaneously
Optimization Tips
-
Batch Size: Use 4 frames for optimal throughput
- Larger batches (8, 16) are slower due to overhead
-
Frame Resolution: Performance tested on 1920×1080
- Lower resolution → faster processing
- Higher resolution → more candidates, may require batch splitting
-
Min Face Size: Increase
min_face_sizefor better performance- Default: 60 pixels
- 80-100 pixels: 1.2-1.5x faster (may miss smaller faces)
Examples
See the examples/ directory for complete examples:
single_frame_detection.py: Basic single-frame face detectionbatch_processing.py: Batch video processings1_integration_example.py: Integration with S1 video pipeline
Accuracy
PyMTCNN maintains high accuracy while achieving exceptional performance:
- Mean IoU: 95% vs C++ OpenFace baseline
- Detection Agreement: 100% (same faces detected)
- Validation: Tested on 30 frames from real-world patient videos
Architecture
PyMTCNN uses a three-stage cascade architecture:
- PNet (Proposal Network): Fast candidate generation using image pyramid
- RNet (Refinement Network): Candidate refinement with batching
- ONet (Output Network): Final bbox regression and landmark prediction
All networks are converted to CoreML FP32 format with flexible batch dimensions (1-50) for optimal ANE utilization.
Optimization Journey
PyMTCNN achieved a 175.7x speedup through multiple optimization phases:
| Phase | Implementation | FPS | Speedup | Status |
|---|---|---|---|---|
| Baseline | Pure Python CNN | 0.195 | 1.0x | ✅ |
| Phase 1 | Vectorized NumPy | 0.910 | 4.7x | ✅ |
| Phase 2 | ONNX Runtime CPU | 5.870 | 30.1x | ✅ |
| Phase 3 | CoreML + ANE | 13.56 | 69.5x | ✅ |
| Phase 4 | Within-Frame Batching | 31.88 | 163.5x | ✅ |
| Phase 5 | Cross-Frame Batching | 34.26 | 175.7x | ✅ |
See docs/OPTIMIZATION_JOURNEY.md for the complete story.
License
This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.
You are free to:
- Share: Copy and redistribute the material
- Adapt: Remix, transform, and build upon the material
Under the following terms:
- Attribution: You must give appropriate credit
- NonCommercial: You may not use the material for commercial purposes
See LICENSE for full terms.
Citation
If you use PyMTCNN in your research, please cite:
@software{pymtcnn2025,
title={PyMTCNN: High-Performance MTCNN Face Detection for Apple Silicon},
author={SplitFace},
year={2025},
url={https://github.com/your-org/PyMTCNN}
}
Acknowledgments
- Original MTCNN paper: Zhang et al., "Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks"
- C++ OpenFace implementation: Tadas Baltrušaitis et al.
- Apple Neural Engine optimization insights from the CoreML community
Support
For issues, questions, or contributions, please visit the GitHub repository.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pymtcnn-1.1.0.tar.gz.
File metadata
- Download URL: pymtcnn-1.1.0.tar.gz
- Upload date:
- Size: 2.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
58bf4401c49e9d42b7c4a06cfac032a353b0ab6698ae1704cf293c4729a666c1
|
|
| MD5 |
a126d315d9db335ecbb23082cb2466fd
|
|
| BLAKE2b-256 |
063bed90adae17a6ca995c98d5703b37f6472f6a1e0a287140e0e7810ecf22cc
|
Provenance
The following attestation bundles were made for pymtcnn-1.1.0.tar.gz:
Publisher:
publish.yml on johnwilsoniv/pymtcnn
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pymtcnn-1.1.0.tar.gz -
Subject digest:
58bf4401c49e9d42b7c4a06cfac032a353b0ab6698ae1704cf293c4729a666c1 - Sigstore transparency entry: 701456752
- Sigstore integration time:
-
Permalink:
johnwilsoniv/pymtcnn@2378a5d7777405bce4b9c9cf2e3ed9e178bbc9bf -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/johnwilsoniv
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2378a5d7777405bce4b9c9cf2e3ed9e178bbc9bf -
Trigger Event:
release
-
Statement type:
File details
Details for the file pymtcnn-1.1.0-py3-none-any.whl.
File metadata
- Download URL: pymtcnn-1.1.0-py3-none-any.whl
- Upload date:
- Size: 2.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8af1031842cb1cbad2d488c58220977029d402a5e2bedd2cf7a58a914035ddb7
|
|
| MD5 |
939e2b02902377e0f29c0cb36304c079
|
|
| BLAKE2b-256 |
98849ff3effdb73bcae893a26384d931e488a251010c9de34da439e109fd98fb
|
Provenance
The following attestation bundles were made for pymtcnn-1.1.0-py3-none-any.whl:
Publisher:
publish.yml on johnwilsoniv/pymtcnn
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pymtcnn-1.1.0-py3-none-any.whl -
Subject digest:
8af1031842cb1cbad2d488c58220977029d402a5e2bedd2cf7a58a914035ddb7 - Sigstore transparency entry: 701456759
- Sigstore integration time:
-
Permalink:
johnwilsoniv/pymtcnn@2378a5d7777405bce4b9c9cf2e3ed9e178bbc9bf -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/johnwilsoniv
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2378a5d7777405bce4b9c9cf2e3ed9e178bbc9bf -
Trigger Event:
release
-
Statement type: