Local WhisperX transcription API with speaker diarization
Project description
WhisperX API
GPU-powered transcription API in one command
Features • Quick Start • API • Config • Development
Turn any audio into text with speaker labels. No cloud. No limits. Just run:
uvx whisperx-api
WhisperX API wraps WhisperX in a REST API with speaker diarization, word-level timestamps, and multiple export formats. Self-hosted alternative to AssemblyAI, Deepgram, and Rev.ai.
Features
- Speaker Diarization - Identify who said what with pyannote
- Word-Level Timestamps - Precise alignment for every word
- Multiple Export Formats - SRT, WebVTT, TXT, JSON
- Webhook Callbacks - Get notified when transcription completes
- GPU Model Caching - Fast subsequent transcriptions
- Background Processing - Non-blocking async jobs
- Progress Tracking - Poll for real-time status
Quick Start
Prerequisites
- NVIDIA GPU with 6GB+ VRAM (or CPU mode for testing)
- CUDA 12.x drivers installed
Option A: One-Liner Install (Recommended)
curl -fsSL https://raw.githubusercontent.com/namastexlabs/whisperx-api/main/get-whisperx.sh | bash
This installs Python 3.12, uv, checks CUDA, and sets up whisperx-api.
Option B: Direct Run (if dependencies met)
uvx whisperx-api
Option C: pip install
pip install whisperx-api
whisperx-api
The API starts at http://localhost:8880. Swagger docs at /docs.
First Transcription
# Default API key is "namastex888" - works out of the box
curl -X POST http://localhost:8880/v1/transcript \
-H "Authorization: namastex888" \
-F "file=@audio.mp3"
# Check status (replace {id} with returned transcript ID)
curl http://localhost:8880/v1/transcript/{id} \
-H "Authorization: namastex888"
API Reference
| Method | Endpoint | Description |
|---|---|---|
POST |
/v1/transcript |
Submit transcription job |
GET |
/v1/transcript/{id} |
Get transcript status/result |
GET |
/v1/transcript/{id}/srt |
Export as SRT subtitles |
GET |
/v1/transcript/{id}/vtt |
Export as WebVTT |
GET |
/v1/transcript/{id}/txt |
Export as plain text |
GET |
/v1/transcript/{id}/json |
Export as JSON |
DELETE |
/v1/transcript/{id} |
Delete transcript |
GET |
/health |
Health check (no auth) |
Submit Transcription
File upload:
curl -X POST http://localhost:8880/v1/transcript \
-H "Authorization: namastex888" \
-F "file=@audio.mp3"
URL download:
curl -X POST http://localhost:8880/v1/transcript \
-H "Authorization: namastex888" \
-F "audio_url=https://example.com/audio.mp3"
With speaker diarization:
curl -X POST http://localhost:8880/v1/transcript \
-H "Authorization: namastex888" \
-F "file=@audio.mp3" \
-F "speaker_labels=true" \
-F "speakers_expected=2"
Response Format
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"text": "Hello world, this is a transcription.",
"words": [
{"text": "Hello", "start": 0, "end": 500, "confidence": 0.98, "speaker": "A"}
],
"utterances": [
{"speaker": "A", "text": "Hello world...", "start": 0, "end": 3000}
],
"language_code": "en"
}
Status values: queued → processing → completed (or error)
Configuration
All settings via environment variables with WHISPERX_ prefix. Everything has sensible defaults - no .env file needed for local use.
| Variable | Default | Description |
|---|---|---|
WHISPERX_API_KEY |
namastex888 |
API authentication key |
WHISPERX_HOST |
0.0.0.0 |
Server bind address |
WHISPERX_PORT |
8880 |
Server port |
WHISPERX_MODEL |
large-v3-turbo |
WhisperX model |
WHISPERX_DATA_DIR |
./data |
SQLite database location |
WHISPERX_HF_TOKEN |
- | HuggingFace token (for diarization) |
WHISPERX_DEVICE |
0 |
GPU device index |
Speaker Diarization Setup
To enable speaker_labels=true:
- Accept license at pyannote/speaker-diarization
- Get token at huggingface.co/settings/tokens
- Add to config:
echo "WHISPERX_HF_TOKEN=hf_xxx" >> ~/.config/whisperx-api/.env
Troubleshooting
CUDA not available:
# Check NVIDIA driver
nvidia-smi
# Check PyTorch CUDA
python -c "import torch; print(torch.cuda.is_available())"
Out of VRAM:
- Use smaller model:
WHISPERX_MODEL=medium - Reduce batch size:
WHISPERX_BATCH_SIZE=8
Diarization fails:
- Verify HF token:
echo $WHISPERX_HF_TOKEN - Accept license at HuggingFace (link above)
Built On
This project wraps the incredible WhisperX by @m-bain - fast automatic speech recognition with word-level timestamps and speaker diarization.
Development
Setup
git clone https://github.com/namastexlabs/whisperx-api.git
cd whisperx-api
uv sync
Run Tests
uv run pytest tests/ -v
Code Quality
uv run ruff check .
uv run ruff format .
uv run mypy src/
Project Structure
whisperx-api/
├── src/whisperx_api/
│ ├── server.py # FastAPI application
│ ├── transcriber.py # WhisperX pipeline
│ ├── model_manager.py # GPU model caching
│ ├── database.py # SQLite persistence
│ ├── config.py # Settings management
│ ├── auth.py # API authentication
│ ├── models.py # Pydantic schemas
│ ├── deps.py # Dependency checks
│ └── main.py # CLI entry point
├── tests/ # Test suite
├── get-whisperx.sh # One-liner installer
└── pyproject.toml # Project config
CI/CD
- CI: Runs on every push (lint, typecheck, test)
Performance Notes
- First request: ~60-90s (model loading)
- Subsequent: ~same as audio duration
- VRAM usage: ~5-6GB for large-v3-turbo
Made with ❤️ by Namastex Labs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file whisperx_api-2.0.1rc2.tar.gz.
File metadata
- Download URL: whisperx_api-2.0.1rc2.tar.gz
- Upload date:
- Size: 496.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f4aa2d15c7dafbb5bcec1930f38e2619405e3433aeba9910c8134fd308c87c6
|
|
| MD5 |
9def95e5e40cc139d2d2adee6e1a6a95
|
|
| BLAKE2b-256 |
a8cf5010f48d39af80aec28147064672f191cf8c489b1d8b7efd8e5859376f0f
|
File details
Details for the file whisperx_api-2.0.1rc2-py3-none-any.whl.
File metadata
- Download URL: whisperx_api-2.0.1rc2-py3-none-any.whl
- Upload date:
- Size: 27.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ee11bde176696f19b205a58ef24fa5b0274ce6c6a1111dd5e8a6a00215e9a74
|
|
| MD5 |
978944ea13b4466c026618f7201b70ee
|
|
| BLAKE2b-256 |
e6d2b6dd23fa92243d1693c40fe8c93b71ce93b29d7875d6e0292d2a9d959ed2
|