SyncNet: Audio-visual synchronization detection using deep learning. Updated version of https://github.com/joonson/syncnet_python for modern Python versions.
Project description
SyncNet Python
Audio-visual synchronization detection using deep learning with modern Python architecture.
This is a refactored and enhanced version of the original SyncNet implementation by Joon Son Chung, updated for Python 3.9+ with clean architecture, comprehensive error handling, and performance optimizations.
Overview
SyncNet Python is a PyTorch implementation of the SyncNet model, which detects audio-visual synchronization in videos. It can identify lip-sync errors by analyzing the correspondence between mouth movements and spoken audio.
Features
Core Functionality
- 🎥 Audio-Visual Sync Detection: Accurately detect synchronization between audio and video
- 🔍 Face Detection: Automatic face detection and tracking using S3FD
- 📊 Detailed Analysis: Per-crop offsets, confidence scores, and minimum distances
- 🚀 Batch Processing: Process multiple videos efficiently
- 🐍 Python API: Easy-to-use Python interface with proper error handling
Architecture Improvements
- 🏗️ Clean Architecture: Abstract base classes and factory patterns
- ⚡ Performance Optimized: Parallel processing and memory management
- 🛡️ Robust Error Handling: Comprehensive exception hierarchy
- ⚙️ Configuration Management: YAML/JSON configuration support
- 📝 Advanced Logging: Structured logging with progress tracking
- 🔄 Backward Compatibility: Maintains compatibility with original API
Installation
pip install syncnet-python
Additional Requirements
-
FFmpeg: Required for video processing
# Ubuntu/Debian sudo apt-get install ffmpeg # macOS brew install ffmpeg
-
Model Weights: Download pre-trained weights
- Download
sfd_face.pthandsyncnet_v2.model - Place them in a
weights/directory
- Download
Quick Start
from syncnet_python import SyncNetPipeline
# Initialize pipeline
pipeline = SyncNetPipeline(
s3fd_weights="weights/sfd_face.pth",
syncnet_weights="weights/syncnet_v2.model",
device="cuda" # or "cpu"
)
# Process video
results = pipeline.inference(
video_path="video.mp4",
audio_path=None # Extract from video
)
# Extract results (returns tuple)
offset_list, confidence_list, min_dist_list, best_confidence, best_min_dist, detections_json, success = results
# Get best results
offset = offset_list[0] # AV offset in frames
confidence = confidence_list[0] # Confidence score
min_distance = min_dist_list[0] # Minimum distance
print(f"AV Offset: {offset} frames")
print(f"Confidence: {confidence:.3f}")
print(f"Min Distance: {min_distance:.3f}")
Detailed Analysis
# For detailed per-crop analysis
for i, (offset, conf, dist) in enumerate(zip(offset_list, confidence_list, min_dist_list)):
print(f"Crop {i+1}: offset={offset}, confidence={conf:.3f}, min_dist={dist:.3f}")
# Parse face detections
import json
detections = json.loads(detections_json)
print(f"Total frames with face detection: {len(detections)}")
Command Line Usage
# Process single video
syncnet-python video.mp4
# Process multiple videos
syncnet-python video1.mp4 video2.mp4 --output results.json
# Use CPU instead of GPU
syncnet-python video.mp4 --device cpu
Performance
Tested with example files:
- Processing Speed: 191.4 fps
- Face Detection: 100% success rate
- Accuracy: Detects 1-frame offsets with high confidence (4.5+)
- Compute Time: ~0.65 seconds for 134 frames
Architecture
Refactored Core Modules
syncnet/core/- Modern refactored implementationbase.py- Abstract base classes and interfacesmodels.py- Enhanced SyncNet model with factory patternaudio.py- MFCC audio processing with streaming supportvideo.py- Parallel video processing with OpenCVsync_analyzer.py- Optimized sync analysis with cachingconfig.py- Configuration management systemexceptions.py- Comprehensive error handlinglogging.py- Advanced logging with progress trackingutils.py- Memory management and utility functions
Legacy Compatibility
syncnet_python/- Maintains original API compatibility- Full backward compatibility with existing code
Requirements
- Python 3.9+ (tested on 3.13)
- PyTorch 2.0+
- CUDA (optional but recommended)
- FFmpeg
- Additional dependencies: OpenCV, SciPy, NumPy, pandas
Credits
This package is based on the original SyncNet implementation by Joon Son Chung, enhanced with modern Python architecture and performance optimizations.
Citation
If you use this code in your research, please cite the original paper:
@inproceedings{chung2016out,
title={Out of time: automated lip sync in the wild},
author={Chung, Joon Son and Zisserman, Andrew},
booktitle={Asian Conference on Computer Vision},
year={2016}
}
License
MIT License - see LICENSE file for details.
Links
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file syncnet_python-0.2.1.tar.gz.
File metadata
- Download URL: syncnet_python-0.2.1.tar.gz
- Upload date:
- Size: 4.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
270959576c8da40caed28c277d74e00917f3310732c28613821273ea46d82ccc
|
|
| MD5 |
c36055a1e46d09da9215163eeb246136
|
|
| BLAKE2b-256 |
6598edf49b9c6d28466ab6ae67055ee69c0c233ac64dc89d0536a81dea81c59e
|
File details
Details for the file syncnet_python-0.2.1-py3-none-any.whl.
File metadata
- Download URL: syncnet_python-0.2.1-py3-none-any.whl
- Upload date:
- Size: 25.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6da675a3bf331dce7dab1c409cf3fad7816b91afaea81358659e3f83c18ece19
|
|
| MD5 |
1a0a92e8112e0572738daa5697b2c2e9
|
|
| BLAKE2b-256 |
9baed82cab317acdd1a6a82c244e2c5db1bc748e643b7adefbdcde1e0c625826
|