High-performance cross-platform MTCNN face detection with CUDA and Apple Neural Engine support

These details have not been verified by PyPI

Project description

PyMTCNN

High-performance cross-platform MTCNN face detection with CUDA and Apple Neural Engine support.

Overview

PyMTCNN is a pure Python implementation of MTCNN (Multi-task Cascaded Convolutional Networks) with multi-backend support for optimal performance across different hardware platforms. It achieves 175.7x speedup over baseline Python implementations while maintaining 95% IoU accuracy.

Key Features

Cross-Platform: Works on Windows, Linux, and macOS
Multi-Backend: Auto-selects best backend (CoreML, CUDA, or CPU)
High Performance:
- Apple Silicon (CoreML): 34.26 FPS
- NVIDIA GPUs (CUDA): 50+ FPS
- CPU fallback: 5-10 FPS
Accurate: 95% IoU agreement with C++ OpenFace baseline
Easy to Use: Simple, unified Python API
Hardware Accelerated: Leverages Apple Neural Engine or NVIDIA CUDA
Flexible: Single-frame or batch processing modes
Production Ready: Optimized for real-time video analysis

Performance

Backend	Hardware	FPS	ms/frame
CoreML	Apple M1/M2/M3	34.26	29.2
ONNX+CUDA	NVIDIA RTX 3090	50+	<20
ONNX+CPU	Intel/AMD CPU	5-10	100-200

Speedup: 175.7x faster than baseline Python implementation

Requirements

Python: 3.8 or later
OS: macOS, Windows, or Linux
Hardware (one of):
- Apple Silicon (M1, M2, M3) for CoreML
- NVIDIA GPU with CUDA for GPU acceleration
- Any CPU for CPU fallback

Installation

From PyPI (Recommended)

Choose the installation that matches your hardware:

macOS with Apple Silicon

pip install pymtcnn[coreml]

NVIDIA GPU (CUDA)

pip install pymtcnn[onnx-gpu]

CPU only

pip install pymtcnn[onnx]

All backends (development)

pip install pymtcnn[all]

From Source

git clone https://github.com/johnwilsoniv/pymtcnn.git
cd pymtcnn
pip install -e .[coreml]  # or [onnx-gpu] or [onnx]

Quick Start

Auto-Backend Selection (Recommended)

PyMTCNN automatically selects the best available backend:

import cv2
from pymtcnn import MTCNN

# Auto-select best backend (CoreML on Mac, CUDA on NVIDIA, CPU fallback)
detector = MTCNN(verbose=True)  # Shows which backend was selected

# Load image
img = cv2.imread("image.jpg")

# Detect faces
bboxes, landmarks = detector.detect(img)

# Process results
print(f"Detected {len(bboxes)} faces")
for i, bbox in enumerate(bboxes):
    x, y, w, h, conf = bbox
    print(f"Face {i+1}: ({x:.0f}, {y:.0f}) {w:.0f}×{h:.0f} (confidence: {conf:.3f})")

Force Specific Backend

from pymtcnn import MTCNN

# Force CoreML (Apple Neural Engine)
detector = MTCNN(backend='coreml')

# Force CUDA (NVIDIA GPU)
detector = MTCNN(backend='cuda')

# Force CPU
detector = MTCNN(backend='cpu')

Batch Video Processing

import cv2
from pymtcnn import MTCNN

# Initialize detector
detector = MTCNN()

# Load video frames
cap = cv2.VideoCapture("video.mp4")
frames = []
for _ in range(4):  # Process 4 frames at a time
    ret, frame = cap.read()
    if ret:
        frames.append(frame)

# Batch detection (cross-frame batching for maximum throughput)
results = detector.detect_batch(frames)

# Process results
for i, (bboxes, landmarks) in enumerate(results):
    print(f"Frame {i+1}: {len(bboxes)} faces detected")

Advanced: Direct Backend Access

For advanced users who need backend-specific features:

from pymtcnn import CoreMLMTCNN, ONNXMTCNN

# Use CoreML directly
coreml_detector = CoreMLMTCNN(verbose=True)

# Use ONNX directly with specific provider
onnx_detector = ONNXMTCNN(provider='cuda', verbose=True)

API Reference

`CoreMLMTCNN`

Main face detector class.

Constructor

CoreMLMTCNN(
    min_face_size=60,
    thresholds=[0.6, 0.7, 0.7],
    factor=0.709,
    coreml_dir=None,
    verbose=False
)

Parameters:

min_face_size (int): Minimum face size in pixels. Default: 60
thresholds (list): Detection thresholds for [PNet, RNet, ONet]. Default: [0.6, 0.7, 0.7]
factor (float): Image pyramid scale factor. Default: 0.709
coreml_dir (str): Path to CoreML models directory. Default: bundled models
verbose (bool): Enable verbose logging. Default: False

Methods

`detect(image)`

Detect faces in a single image using within-frame batching.

Parameters:

image (numpy.ndarray): Input image (BGR format, H×W×3)

Returns:

bboxes (numpy.ndarray): Bounding boxes (N×5), format: [x, y, w, h, confidence]
landmarks (numpy.ndarray): Facial landmarks (N×5×2), 5 points per face: left eye, right eye, nose, left mouth, right mouth

Performance: 31.88 FPS (31.4 ms/frame)

`detect_batch(frames)`

Detect faces in multiple frames using cross-frame batching.

Parameters:

frames (list): List of images (each BGR format, H×W×3)

Returns:

results (list): List of (bboxes, landmarks) tuples, one per frame

Performance: 34.26 FPS (29.2 ms/frame) with batch_size=4

Recommended batch size: 4 frames for optimal throughput

Performance Guide

When to Use Each Method

detect(): Use for real-time per-frame processing, webcam feeds, or when you need lowest latency
detect_batch(): Use for offline batch video processing, maximum throughput, or when processing multiple frames simultaneously

Optimization Tips

Batch Size: Use 4 frames for optimal throughput
- Larger batches (8, 16) are slower due to overhead
Frame Resolution: Performance tested on 1920×1080
- Lower resolution → faster processing
- Higher resolution → more candidates, may require batch splitting
Min Face Size: Increase min_face_size for better performance
- Default: 60 pixels
- 80-100 pixels: 1.2-1.5x faster (may miss smaller faces)

Examples

See the examples/ directory for complete examples:

single_frame_detection.py: Basic single-frame face detection
batch_processing.py: Batch video processing
s1_integration_example.py: Integration with S1 video pipeline

Accuracy

PyMTCNN maintains high accuracy while achieving exceptional performance:

Mean IoU: 95% vs C++ OpenFace baseline
Detection Agreement: 100% (same faces detected)
Validation: Tested on 30 frames from real-world patient videos

Architecture

PyMTCNN uses a three-stage cascade architecture:

PNet (Proposal Network): Fast candidate generation using image pyramid
RNet (Refinement Network): Candidate refinement with batching
ONet (Output Network): Final bbox regression and landmark prediction

All networks are converted to CoreML FP32 format with flexible batch dimensions (1-50) for optimal ANE utilization.

Optimization Journey

PyMTCNN achieved a 175.7x speedup through multiple optimization phases:

Phase	Implementation	FPS	Speedup	Status
Baseline	Pure Python CNN	0.195	1.0x	✅
Phase 1	Vectorized NumPy	0.910	4.7x	✅
Phase 2	ONNX Runtime CPU	5.870	30.1x	✅
Phase 3	CoreML + ANE	13.56	69.5x	✅
Phase 4	Within-Frame Batching	31.88	163.5x	✅
Phase 5	Cross-Frame Batching	34.26	175.7x	✅

See docs/OPTIMIZATION_JOURNEY.md for the complete story.

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.

You are free to:

Share: Copy and redistribute the material
Adapt: Remix, transform, and build upon the material

Under the following terms:

Attribution: You must give appropriate credit
NonCommercial: You may not use the material for commercial purposes

See LICENSE for full terms.

Citation

If you use PyMTCNN in your research, please cite:

@software{pymtcnn2025,
  title={PyMTCNN: High-Performance MTCNN Face Detection for Apple Silicon},
  author={SplitFace},
  year={2025},
  url={https://github.com/your-org/PyMTCNN}
}

Acknowledgments

Original MTCNN paper: Zhang et al., "Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks"
C++ OpenFace implementation: Tadas Baltrušaitis et al.
Apple Neural Engine optimization insights from the CoreML community

Support

For issues, questions, or contributions, please visit the GitHub repository.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.1.5

Feb 3, 2026

1.1.4

Feb 3, 2026

1.1.3

Dec 25, 2025

1.1.2

Dec 25, 2025

1.1.1

Dec 17, 2025

This version

1.1.0

Nov 14, 2025

1.0.0

Nov 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymtcnn-1.1.0.tar.gz (2.8 MB view details)

Uploaded Nov 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pymtcnn-1.1.0-py3-none-any.whl (2.8 MB view details)

Uploaded Nov 14, 2025 Python 3

File details

Details for the file pymtcnn-1.1.0.tar.gz.

File metadata

Download URL: pymtcnn-1.1.0.tar.gz
Upload date: Nov 14, 2025
Size: 2.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pymtcnn-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`58bf4401c49e9d42b7c4a06cfac032a353b0ab6698ae1704cf293c4729a666c1`
MD5	`a126d315d9db335ecbb23082cb2466fd`
BLAKE2b-256	`063bed90adae17a6ca995c98d5703b37f6472f6a1e0a287140e0e7810ecf22cc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymtcnn-1.1.0.tar.gz:

Publisher: publish.yml on johnwilsoniv/pymtcnn

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pymtcnn-1.1.0.tar.gz
- Subject digest: 58bf4401c49e9d42b7c4a06cfac032a353b0ab6698ae1704cf293c4729a666c1
- Sigstore transparency entry: 701456752
- Sigstore integration time: Nov 14, 2025
Source repository:
- Permalink: johnwilsoniv/pymtcnn@2378a5d7777405bce4b9c9cf2e3ed9e178bbc9bf
- Branch / Tag: refs/tags/v1.1.0
- Owner: https://github.com/johnwilsoniv
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2378a5d7777405bce4b9c9cf2e3ed9e178bbc9bf
- Trigger Event: release

File details

Details for the file pymtcnn-1.1.0-py3-none-any.whl.

File metadata

Download URL: pymtcnn-1.1.0-py3-none-any.whl
Upload date: Nov 14, 2025
Size: 2.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pymtcnn-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8af1031842cb1cbad2d488c58220977029d402a5e2bedd2cf7a58a914035ddb7`
MD5	`939e2b02902377e0f29c0cb36304c079`
BLAKE2b-256	`98849ff3effdb73bcae893a26384d931e488a251010c9de34da439e109fd98fb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymtcnn-1.1.0-py3-none-any.whl:

Publisher: publish.yml on johnwilsoniv/pymtcnn

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pymtcnn-1.1.0-py3-none-any.whl
- Subject digest: 8af1031842cb1cbad2d488c58220977029d402a5e2bedd2cf7a58a914035ddb7
- Sigstore transparency entry: 701456759
- Sigstore integration time: Nov 14, 2025
Source repository:
- Permalink: johnwilsoniv/pymtcnn@2378a5d7777405bce4b9c9cf2e3ed9e178bbc9bf
- Branch / Tag: refs/tags/v1.1.0
- Owner: https://github.com/johnwilsoniv
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2378a5d7777405bce4b9c9cf2e3ed9e178bbc9bf
- Trigger Event: release

pymtcnn 1.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

PyMTCNN

Overview

Key Features

Performance

Requirements

Installation

From PyPI (Recommended)

macOS with Apple Silicon

NVIDIA GPU (CUDA)

CPU only

All backends (development)

From Source

Quick Start

Auto-Backend Selection (Recommended)

Force Specific Backend

Batch Video Processing

Advanced: Direct Backend Access

API Reference

CoreMLMTCNN

Constructor

Methods

detect(image)

detect_batch(frames)

Performance Guide

When to Use Each Method

Optimization Tips

Examples

Accuracy

Architecture

Optimization Journey

License

Citation

Acknowledgments

Support

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`CoreMLMTCNN`

`detect(image)`

`detect_batch(frames)`