Production-ready transcription and diarization pipeline with parallel processing
Project description
WhisperX-NeMo Pipeline
A production-ready transcription and diarization pipeline with parallel processing.
Features
- Parallel Processing: Runs Whisper transcription and NeMo diarization simultaneously
- Multiple Backends: Supports both faster-whisper and WhisperX
- Speaker Diarization: Uses NeMo MSDD models for accurate speaker identification
- Audio Source Separation: Optional vocal extraction using Demucs
- Punctuation Restoration: Automatic punctuation using deep learning models
- Memory Efficient: Proper GPU memory management and cleanup
Installation
pip install whisperx-nemo-pipeline
With constraints (recommended for production):
pip install whisperx-nemo-pipeline -c constraints.txt
Quick Start
from whisperx_nemo_pipeline import create_transcription_pipeline
# Create pipeline
pipeline = create_transcription_pipeline(
audio_path="path/to/your/audio.wav",
model_name="large-v2",
device="cuda", # or "cpu"
stemming=True, # Enable source separation
backend="faster_whisper" # or "whisperx"
)
# Process audio
transcript_path, srt_path, timing_info = pipeline.process()
print(f"Transcript saved to: {transcript_path}")
print(f"Subtitles saved to: {srt_path}")
print(f"Processing took: {timing_info['total_time']:.2f}s")
Advanced Usage
from whisperx_nemo_pipeline import TranscriptionPipeline, TranscriptionConfig
# Custom configuration
config = TranscriptionConfig(
audio_path="path/to/audio.wav",
model_name="large-v2",
device="cuda",
batch_size=8,
language="en", # or None for auto-detection
stemming=True,
suppress_numerals=False,
backend="faster_whisper"
)
# Create pipeline with custom config
pipeline = TranscriptionPipeline(config)
# Process
transcript_path, srt_path, timing_info = pipeline.process()
Configuration Options
audio_path: Path to input audio filemodel_name: Whisper model size ("tiny", "base", "small", "medium", "large-v2")device: Computing device ("cuda" or "cpu")batch_size: Batch size for inference (default: 4)language: Language code or None for auto-detectionstemming: Enable audio source separation (default: True)suppress_numerals: Suppress numerical tokens (default: False)backend: "faster_whisper" or "whisperx"
Requirements
- Python 3.8+
- CUDA-capable GPU (recommended)
- See
requirements.txtfor full dependency list
License
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file whisperx_nemo_pipeline-1.0.1.tar.gz.
File metadata
- Download URL: whisperx_nemo_pipeline-1.0.1.tar.gz
- Upload date:
- Size: 27.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68c06f99173371486e7167183ac5fd986ab718d0debe60c265a1d806d9b4e6c0
|
|
| MD5 |
ba265f00b6cbf98f02d5e3abab394cbb
|
|
| BLAKE2b-256 |
8084242fb4b9459b697a58e10c54d5edd332cbc78f2d25df3a0459d25035c226
|