LASER ASD - Lip Landmark Assisted Speaker Detection for Active Speaker Detection
Project description
LASER ASD - Lip Landmark Assisted Speaker Detection (PyPI Package)
Note: This is a PyPI packaging wrapper for easy installation.
Original Repository: https://github.com/plnguyen2908/LASER_ASD
All credit for the LASER ASD model architecture and training goes to the original authors.
This package provides a simple Python API for the LASER ASD (Lip Landmark Assisted Speaker Detection) model, enabling easy installation via pip and straightforward inference.
Original Work
This package is based on the LASER ASD implementation by Phat Lam Nguyen et al.:
- Original Repository: https://github.com/plnguyen2908/LASER_ASD
- Paper: LASER: Lip Landmark Assisted Speaker Detection
If you use this code, please cite the original work (see Citation section).
Features
- Active Speaker Detection using audio-visual fusion
- Based on LoCoNet architecture with LASER landmark injection
- Simple Python API for inference
- GPU acceleration with CUDA support
- Compatible with PyTorch 1.10+
Installation
pip install laser-asd
Quick Start
import numpy as np
from laser_asd import LaserASDModel
# Initialize model
model = LaserASDModel(device="cuda")
# Load pre-trained weights (download from original repo)
model.load_weights("/path/to/loconet_laser.model")
# Prepare inputs
# face_crops: numpy array of shape [T, H, W, C] or [T, H, W]
# audio_data: numpy array of shape [samples] at 16kHz
face_crops = np.random.rand(100, 112, 112, 3).astype(np.float32)
audio_data = np.random.rand(64000).astype(np.float32) # 4 seconds at 16kHz
# Predict speaking scores
scores = model.predict(face_crops, audio_data, sample_rate=16000, fps=25.0)
# scores >= 0 indicates speaking
is_speaking = scores >= 0
Model Weights
Pre-trained model weights must be downloaded from the original repository:
Set the model path via environment variable:
export LASER_ASD_MODEL_PATH=/path/to/loconet_laser.model
Or pass it directly:
model = LaserASDModel(device="cuda", model_path="/path/to/loconet_laser.model")
API Reference
LaserASDModel
class LaserASDModel:
def __init__(
self,
device: str = "cuda",
model_path: Optional[Path] = None,
use_landmarks: bool = False,
):
"""
Initialize LASER ASD model.
Args:
device: Device to run model on ('cuda' or 'cpu')
model_path: Path to model weights file
use_landmarks: Whether to use landmark features (False uses zeros)
"""
def load_weights(self, model_path: Optional[str] = None):
"""Load model weights."""
def predict(
self,
face_crops: np.ndarray,
audio_data: np.ndarray,
sample_rate: int = 16000,
fps: float = 25.0,
) -> np.ndarray:
"""
Predict speaking probability for each frame.
Args:
face_crops: Face crop images [T, H, W, C] or [T, H, W]
audio_data: Audio waveform [samples]
sample_rate: Audio sample rate (default: 16000)
fps: Video frame rate (default: 25.0)
Returns:
Per-frame speaking scores [T] (>= 0 means speaking)
"""
Factory Function
def create_laser_model(
device: str = "cuda",
model_path: Optional[str] = None,
**kwargs,
) -> LaserASDModel:
"""Factory function to create LASER ASD model."""
Requirements
- Python >= 3.8
- PyTorch >= 1.10.0
- torchvision >= 0.11.0
- numpy >= 1.20.0
- opencv-python >= 4.5.0
- resampy >= 0.4.0
- python_speech_features >= 0.6
Citation
If you use this code, please cite the original LASER ASD paper:
@inproceedings{nguyen2024laser,
title={LASER: Lip Landmark Assisted Speaker Detection},
author={Nguyen, Phat Lam and others},
booktitle={Proceedings},
year={2024}
}
License
This project is licensed under the MIT License - see the LICENSE file for details.
The original LASER ASD code is also MIT licensed.
Acknowledgments
- Original LASER ASD: https://github.com/plnguyen2908/LASER_ASD (Phat Lam Nguyen et al.)
- LoCoNet encoder architecture
- TalkNet audio-visual framework
Differences from Original Repository
This PyPI package adds:
wrapper.py: Simple inference API (LaserASDModelclass)pyproject.toml: PyPI packaging configuration- Modified imports for package compatibility
The core model architecture and weights are unchanged from the original.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
For issues related to the model architecture, please refer to the original repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file laser_asd-0.1.1.tar.gz.
File metadata
- Download URL: laser_asd-0.1.1.tar.gz
- Upload date:
- Size: 50.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
534e786338288b9740daaae280d826f769fc3792dfc3d09e9ddea7481867d5cd
|
|
| MD5 |
9638d2386d65fd11738dabd93c047c2b
|
|
| BLAKE2b-256 |
de5c28282a6b02a70fd25a0cde253e3133fc615b33edb50453548a841289b39b
|
File details
Details for the file laser_asd-0.1.1-py3-none-any.whl.
File metadata
- Download URL: laser_asd-0.1.1-py3-none-any.whl
- Upload date:
- Size: 62.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5bf589132cfa31f13fca8659cd2b0dc2f1e531b015304496d5079cac02643d8c
|
|
| MD5 |
4c0e5f01c8d99a0656e4ebd045645016
|
|
| BLAKE2b-256 |
08be62a429e04442c77f0636d435dd3b86f8c3a2ffa1bfdad2863d7cd3a9cf56
|