A Python package for easy transcription using WhisperX.
Project description
easy-whisperx
A streamlined Python wrapper around the WhisperX project, providing enhanced type safety, automatic resource management, and simplified API for audio transcription with GPU acceleration, word-level alignment, and speaker diarization.
Acknowledgments
This project builds upon the outstanding work of the WhisperX team, particularly:
- Max Bain and contributors for creating WhisperX
- The original Whisper team at OpenAI
- The faster-whisper project for performance improvements
What easy-whisperx adds:
- Type Safety: Comprehensive type hints and mypy compatibility
- Resource Management: Automatic GPU memory cleanup using context managers
- Performance Tracking: Built-in metrics collection for all operations
- Simplified API: Cleaner interface with sensible defaults
- Error Handling: Robust error handling with detailed logging
- Bulk Processing: Efficient batch processing capabilities
All the core transcription, alignment, and diarization capabilities are provided by the underlying WhisperX library.
Python Version Requirements
This package requires Python 3.10, 3.11, or 3.12. Python 3.13+ is not supported due to dependency limitations with the WhisperX library.
Features
- Audio Transcription: WhisperX-powered speech-to-text conversion
- Word-level Alignment: Precise timestamp alignment for individual words
- Speaker Diarization: Automatic speaker identification and assignment
- GPU Acceleration: CUDA support for faster processing
- Performance Tracking: Built-in metrics collection for all operations
- Bulk Processing: Efficient batch processing with individual item tracking
- Type Safety: Comprehensive type hints throughout
- Context Management: Automatic resource cleanup and memory management
Installation
Standard Installation
git clone https://github.com/falahat/easy-whisperx.git
cd easy-whisperx
pip install -e .
Development Installation
git clone https://github.com/falahat/easy-whisperx.git
cd easy-whisperx
pip install -e .[dev]
Notebook Support
pip install -e .[notebook]
Prerequisites for GPU Transcription
- NVIDIA GPU with CUDA support
- Hugging Face Token (for speaker diarization models)
- PyTorch with GPU support
# Install PyTorch with GPU support
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
Setting up Transcription Environment
-
Get a Hugging Face Token:
- Go to Hugging Face Settings
- Create a token with "read" permissions
- Accept user agreements for segmentation and diarization models
-
Set Environment Variable:
# Windows PowerShell $env:HF_TOKEN="your_token_here"
# Linux/macOS export HF_TOKEN="your_token_here"
Quick Start
Basic Transcription
from easy_whisperx import Transcriber
# Initialize transcriber
transcriber = Transcriber(
model_size="base",
device="cuda", # or "cpu"
compute_type="float16",
batch_size=16
)
# Transcribe audio file
with transcriber:
result = transcriber("path/to/audio.mp3")
print(result["text"])
Complete Pipeline with Alignment and Diarization
import os
from easy_whisperx import Transcriber, Aligner, Diarizer
audio_path = "path/to/audio.mp3"
hf_token = os.getenv("HF_TOKEN")
# Step 1: Transcribe
with Transcriber("base", "cuda", "float16", 16) as transcriber:
transcript = transcriber(audio_path)
# Step 2: Align words
with Aligner("cuda", "en") as aligner:
aligned_transcript = aligner(transcript["segments"], audio_path)
# Step 3: Diarize speakers (optional)
if hf_token:
with Diarizer("cuda", hf_token) as diarizer:
final_transcript = diarizer(aligned_transcript, audio_path)
else:
final_transcript = aligned_transcript
print(final_transcript)
Bulk Processing
from easy_whisperx import BulkExecutor, Transcriber
audio_files = ["file1.mp3", "file2.mp3", "file3.mp3"]
with Transcriber("base", "cuda", "float16", 16) as transcriber:
with BulkExecutor(transcriber) as executor:
def transcribe_file(model, audio_path, tracker):
result = model(audio_path)
tracker["segments_count"] = len(result.get("segments", []))
executor.for_each(audio_files, transcribe_file)
metrics = executor.get_metrics()
print(f"Bulk processing metrics: {metrics}")
WhisperX Integration
This package uses WhisperX as its core transcription engine. We maintain a fork at falahat/whisperx that may include specific optimizations or compatibility fixes, but all credit for the underlying transcription technology goes to the original WhisperX project.
The original WhisperX provides:
- Fast automatic speech recognition with word-level timestamps
- Speaker diarization capabilities
- Multiple language support
- GPU acceleration with optimized inference
Our wrapper adds the resource management and type safety layer on top of this excellent foundation.
Development
Setting up Development Environment
git clone https://github.com/falahat/easy-whisperx.git
cd easy-whisperx
# Create virtual environment (note the .venv name)
python -m venv .venv
# Activate virtual environment
# Windows PowerShell:
.\.venv\Scripts\Activate.ps1
# Linux/macOS:
source .venv/bin/activate
# Install in development mode
pip install -e .[dev]
Running Tests
# Run all tests
pytest
# Run with coverage report
pytest --cov=easy_whisperx --cov-report=html
# Run specific test file
pytest tests/test_transcriber.py -v
# Run integration tests
pytest -m integration
Code Quality Tools
The project uses:
- Black for code formatting
- mypy for type checking
- flake8 for linting
- pytest for testing
# Format code
black src/ tests/
# Type checking
mypy src/easy_whisperx/
# Linting
flake8 src/easy_whisperx/
Core Components
The package is built with a modular architecture:
Transcriber- Main transcription using WhisperX modelsAligner- Word-level timestamp alignmentDiarizer- Speaker identification and assignmentBulkExecutor- Bulk processing with performance trackingPerformanceTracker- Performance metrics collectionBaseWhisperxModel- Abstract base for model management- Utility functions - Audio loading and device configuration
Performance and Memory Management
The package includes automatic resource management:
- Context Managers: All models automatically clean up GPU memory
- Performance Tracking: Built-in metrics for all operations
- Memory Optimization: Automatic garbage collection and CUDA cache clearing
- Error Handling: Graceful failure handling with detailed logging
API Reference
Device Configuration
from easy_whisperx.utils import _determine_device_config
# Automatic device selection
device, compute_type = _determine_device_config("auto", "auto")
# Returns ("cuda", "float16") if GPU available, ("cpu", "int8") otherwise
Performance Tracking
from easy_whisperx import PerformanceTracker
with PerformanceTracker("my_operation") as tracker:
# Your code here
tracker["custom_metric"] = "value"
metrics = tracker.to_dict()
print(f"Operation took {metrics['my_operation']['duration_seconds']} seconds")
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes with tests
- Ensure all tests pass (
pytest) - Check code quality (
black src/ tests/andmypy src/) - Submit a pull request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file easy_whisperx-0.0.8.tar.gz.
File metadata
- Download URL: easy_whisperx-0.0.8.tar.gz
- Upload date:
- Size: 20.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c4ad0d86f87d919e4189a3935c774caa1c121986e36e776c4a66733d0c9332c1
|
|
| MD5 |
e2d5fe14e2b275a7baf599c7a332a964
|
|
| BLAKE2b-256 |
20e52e36a04867594c8146edd7596aab8f376f31d0d61440d346f9bd99274caf
|
Provenance
The following attestation bundles were made for easy_whisperx-0.0.8.tar.gz:
Publisher:
python-publish.yml on falahat/easy-whisperx
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
easy_whisperx-0.0.8.tar.gz -
Subject digest:
c4ad0d86f87d919e4189a3935c774caa1c121986e36e776c4a66733d0c9332c1 - Sigstore transparency entry: 517087845
- Sigstore integration time:
-
Permalink:
falahat/easy-whisperx@81222964ec5c47ca06a5d86529ed980b37a775d9 -
Branch / Tag:
refs/tags/0.0.8 - Owner: https://github.com/falahat
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@81222964ec5c47ca06a5d86529ed980b37a775d9 -
Trigger Event:
release
-
Statement type:
File details
Details for the file easy_whisperx-0.0.8-py3-none-any.whl.
File metadata
- Download URL: easy_whisperx-0.0.8-py3-none-any.whl
- Upload date:
- Size: 12.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
08254d34df05491256c4a71e5e8c719f4e700033800163980174531fd720e688
|
|
| MD5 |
e5b248101a489a6ecc4cb4a3e075eb63
|
|
| BLAKE2b-256 |
3f25fa25e0459938d52331e36e5a740e1858a1ab67e109d535917d3fffde81d4
|
Provenance
The following attestation bundles were made for easy_whisperx-0.0.8-py3-none-any.whl:
Publisher:
python-publish.yml on falahat/easy-whisperx
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
easy_whisperx-0.0.8-py3-none-any.whl -
Subject digest:
08254d34df05491256c4a71e5e8c719f4e700033800163980174531fd720e688 - Sigstore transparency entry: 517087868
- Sigstore integration time:
-
Permalink:
falahat/easy-whisperx@81222964ec5c47ca06a5d86529ed980b37a775d9 -
Branch / Tag:
refs/tags/0.0.8 - Owner: https://github.com/falahat
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@81222964ec5c47ca06a5d86529ed980b37a775d9 -
Trigger Event:
release
-
Statement type: