A Python package for easy transcription using WhisperX.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

easy-whisperx

A streamlined Python wrapper around the WhisperX project, providing enhanced type safety, automatic resource management, and simplified API for audio transcription with GPU acceleration, word-level alignment, and speaker diarization.

Acknowledgments

This project builds upon the outstanding work of the WhisperX team, particularly:

Max Bain and contributors for creating WhisperX
The original Whisper team at OpenAI
The faster-whisper project for performance improvements

What easy-whisperx adds:

Type Safety: Comprehensive type hints and mypy compatibility
Resource Management: Automatic GPU memory cleanup using context managers
Performance Tracking: Built-in metrics collection for all operations
Simplified API: Cleaner interface with sensible defaults
Error Handling: Robust error handling with detailed logging
Bulk Processing: Efficient batch processing capabilities

All the core transcription, alignment, and diarization capabilities are provided by the underlying WhisperX library.

Python Version Requirements

This package requires Python 3.10, 3.11, or 3.12. Python 3.13+ is not supported due to dependency limitations with the WhisperX library.

Features

Audio Transcription: WhisperX-powered speech-to-text conversion
Word-level Alignment: Precise timestamp alignment for individual words
Speaker Diarization: Automatic speaker identification and assignment
GPU Acceleration: CUDA support for faster processing
Performance Tracking: Built-in metrics collection for all operations
Bulk Processing: Efficient batch processing with individual item tracking
Type Safety: Comprehensive type hints throughout
Context Management: Automatic resource cleanup and memory management

Installation

Standard Installation

git clone https://github.com/falahat/easy-whisperx.git
cd easy-whisperx
pip install -e .

Development Installation

git clone https://github.com/falahat/easy-whisperx.git
cd easy-whisperx
pip install -e .[dev]

Notebook Support

pip install -e .[notebook]

Prerequisites for GPU Transcription

NVIDIA GPU with CUDA support
Hugging Face Token (for speaker diarization models)
PyTorch with GPU support

# Install PyTorch with GPU support
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Setting up Transcription Environment

Get a Hugging Face Token:
- Go to Hugging Face Settings
- Create a token with "read" permissions
- Accept user agreements for segmentation and diarization models

Set Environment Variable:

# Windows PowerShell
$env:HF_TOKEN="your_token_here"

# Linux/macOS
export HF_TOKEN="your_token_here"

Quick Start

Quick start — `transcribe()`

from easy_whisperx import transcribe

# Transcription only. device / compute_type / batch_size are auto-detected.
result = transcribe("audio.mp3", model_size="base")
for seg in result.transcript["segments"]:
    print(seg["start"], seg["text"])

Alignment and diarization are optional follow-up steps — chain only the ones you want:

# Transcribe + word-level alignment
result = transcribe("audio.mp3", model_size="base").align()

# Transcribe + speaker diarization (no alignment)
result = transcribe("audio.mp3", model_size="base").diarize(hf_token)

# All three
result = transcribe("audio.mp3", model_size="base").align().diarize(hf_token)

print(result.aligned, result.diarized)   # which stages ran

Each stage loads and unloads one model, so at most one model occupies VRAM at a time, and the audio is decoded once and reused across stages.

Advanced — the stage classes

For full control (custom per-stage handling, per-stage metrics), use the context-managed stage classes directly:

from easy_whisperx import Transcriber, Aligner, Diarizer

with Transcriber("base") as transcriber:        # device/compute/batch default to "auto"
    transcript = transcriber("audio.mp3")

with Aligner("en") as aligner:
    aligned = aligner(transcript["segments"], "audio.mp3")

with Diarizer(hf_token) as diarizer:
    final = diarizer(aligned, "audio.mp3")

Batch processing

Each stage is a callable context manager, so batching is a plain loop: load the model once and collect a typed list of results.

from easy_whisperx import Transcriber

audio_files = ["file1.mp3", "file2.mp3", "file3.mp3"]
with Transcriber("base") as transcriber:
    results = [transcriber(path) for path in audio_files]

WhisperX Integration

This package is a thin wrapper around the upstream WhisperX project, which is its core transcription engine. All credit for the underlying transcription technology goes to WhisperX.

The original WhisperX provides:

Fast automatic speech recognition with word-level timestamps
Speaker diarization capabilities
Multiple language support
GPU acceleration with optimized inference

Our wrapper adds the resource management and type safety layer on top of this excellent foundation.

Development

Setting up Development Environment

git clone https://github.com/falahat/easy-whisperx.git
cd easy-whisperx

# Create virtual environment (note the .venv name)
python -m venv .venv

# Activate virtual environment
# Windows PowerShell:
.\.venv\Scripts\Activate.ps1
# Linux/macOS:
source .venv/bin/activate

# Install in development mode
pip install -e .[dev]

Running Tests

# Run all tests
pytest

# Run with coverage report
pytest --cov=easy_whisperx --cov-report=html

# Run specific test file
pytest tests/test_transcriber.py -v

# Run integration tests
pytest -m integration

Code Quality Tools

The project uses:

Black for code formatting
mypy for type checking
flake8 for linting
pytest for testing

# Format code
black src/ tests/

# Type checking
mypy src/easy_whisperx/

# Linting
flake8 src/easy_whisperx/

Core Components

The package is built with a modular architecture:

transcribe() - Top-level entry point; returns a Transcription you can .align() / .diarize()
Transcription - Result object carrying the transcript and the optional-stage methods
Transcriber - Main transcription using WhisperX models
Aligner - Word-level timestamp alignment
Diarizer - Speaker identification and assignment
PerformanceTracker - Performance metrics collection
BaseWhisperxModel - Abstract base for model management
Utility functions - Audio loading and device configuration

Performance and Memory Management

The package includes automatic resource management:

Context Managers: All models automatically clean up GPU memory
Performance Tracking: Built-in metrics for all operations
Memory Optimization: Automatic garbage collection and CUDA cache clearing
Error Handling: Graceful failure handling with detailed logging

API Reference

Device Configuration

from easy_whisperx.utils import resolve_device_config

# Automatic device selection
device, compute_type = resolve_device_config("auto", "auto")
# Returns ("cuda", "float16") if GPU available, ("cpu", "int8") otherwise

Performance Tracking

from easy_whisperx import PerformanceTracker

with PerformanceTracker("my_operation") as tracker:
    # Your code here
    tracker["custom_metric"] = "value"

metrics = tracker.to_dict()
print(f"Operation took {metrics['my_operation']['duration_seconds']} seconds")

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes with tests
Ensure all tests pass (pytest)
Check code quality (black src/ tests/ and mypy src/)
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

falahat

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

Jun 28, 2026

0.1.0

Jun 27, 2026

0.0.8

Sep 15, 2025

0.0.7

Sep 15, 2025

0.0.4

Sep 15, 2025

0.0.1

Sep 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easy_whisperx-0.1.1.tar.gz (29.6 kB view details)

Uploaded Jun 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

easy_whisperx-0.1.1-py3-none-any.whl (20.3 kB view details)

Uploaded Jun 28, 2026 Python 3

File details

Details for the file easy_whisperx-0.1.1.tar.gz.

File metadata

Download URL: easy_whisperx-0.1.1.tar.gz
Upload date: Jun 28, 2026
Size: 29.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for easy_whisperx-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`860724c40081813efa8a8855f359dd15ca6ab792dd34ae5550459026bf3ab326`
MD5	`cb94c3fdbc1b72a241c065498f055f56`
BLAKE2b-256	`c30236ba4daf423e7e7c3d8c441ca7f0ca1f11cc214dfa6137b4abb8bc2078e1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for easy_whisperx-0.1.1.tar.gz:

Publisher: python-publish.yml on falahat/easy-whisperx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: easy_whisperx-0.1.1.tar.gz
- Subject digest: 860724c40081813efa8a8855f359dd15ca6ab792dd34ae5550459026bf3ab326
- Sigstore transparency entry: 1995231774
- Sigstore integration time: Jun 28, 2026
Source repository:
- Permalink: falahat/easy-whisperx@1969a80565997bb42b94af0c6626c1fb58bfafef
- Branch / Tag: refs/tags/0.1.1
- Owner: https://github.com/falahat
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@1969a80565997bb42b94af0c6626c1fb58bfafef
- Trigger Event: push

File details

Details for the file easy_whisperx-0.1.1-py3-none-any.whl.

File metadata

Download URL: easy_whisperx-0.1.1-py3-none-any.whl
Upload date: Jun 28, 2026
Size: 20.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for easy_whisperx-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`693f963b27b2d754c3ae5a4bb2f4f8db26d7977433110caef40d7b6d4ba55201`
MD5	`507cc4acfc4c4c04fc9f3fe890173ba4`
BLAKE2b-256	`356d8378f923dded187ba610ac17cbdfe1b6bbbcc24a801a0f07d71f5e14826e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for easy_whisperx-0.1.1-py3-none-any.whl:

Publisher: python-publish.yml on falahat/easy-whisperx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: easy_whisperx-0.1.1-py3-none-any.whl
- Subject digest: 693f963b27b2d754c3ae5a4bb2f4f8db26d7977433110caef40d7b6d4ba55201
- Sigstore transparency entry: 1995231862
- Sigstore integration time: Jun 28, 2026
Source repository:
- Permalink: falahat/easy-whisperx@1969a80565997bb42b94af0c6626c1fb58bfafef
- Branch / Tag: refs/tags/0.1.1
- Owner: https://github.com/falahat
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@1969a80565997bb42b94af0c6626c1fb58bfafef
- Trigger Event: push

easy-whisperx 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

easy-whisperx

Acknowledgments

Python Version Requirements

Features

Installation

Standard Installation

Development Installation

Notebook Support

Prerequisites for GPU Transcription

Setting up Transcription Environment

Quick Start

Quick start — transcribe()

Advanced — the stage classes

Batch processing

WhisperX Integration

Development

Setting up Development Environment

Running Tests

Code Quality Tools

Core Components

Performance and Memory Management

API Reference

Device Configuration

Performance Tracking

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Quick start — `transcribe()`