Skip to main content

Local WhisperX transcription API with speaker diarization

Project description

WhisperX API

GPU-powered transcription API in one command

PyPI CI Python 3.12 MIT License

FeaturesQuick StartAPIConfigDevelopment


Turn any audio into text with speaker labels. No cloud. No limits. Just run:

uvx whisperx-api

WhisperX API wraps WhisperX in a REST API with speaker diarization, word-level timestamps, and multiple export formats. Self-hosted alternative to AssemblyAI, Deepgram, and Rev.ai.

Features

  • Speaker Diarization - Identify who said what with pyannote
  • Word-Level Timestamps - Precise alignment for every word
  • Multiple Export Formats - SRT, WebVTT, TXT, JSON
  • Webhook Callbacks - Get notified when transcription completes
  • GPU Model Caching - Fast subsequent transcriptions
  • Background Processing - Non-blocking async jobs
  • Progress Tracking - Poll for real-time status

Quick Start

Prerequisites

  • NVIDIA GPU with 6GB+ VRAM (or CPU mode for testing)
  • CUDA 12.x drivers installed

Option A: One-Liner Install (Recommended)

curl -fsSL https://raw.githubusercontent.com/namastexlabs/whisperx-api/main/get-whisperx.sh | bash

This installs Python 3.12, uv, checks CUDA, and sets up whisperx-api.

Option B: Direct Run (if dependencies met)

uvx whisperx-api

Option C: pip install

pip install whisperx-api
whisperx-api

The API starts at http://localhost:8880. Swagger docs at /docs.

First Transcription

# Default API key is "namastex888" - works out of the box
curl -X POST http://localhost:8880/v1/transcript \
  -H "Authorization: namastex888" \
  -F "file=@audio.mp3"

# Check status (replace {id} with returned transcript ID)
curl http://localhost:8880/v1/transcript/{id} \
  -H "Authorization: namastex888"

API Reference

Method Endpoint Description
POST /v1/transcript Submit transcription job
GET /v1/transcript/{id} Get transcript status/result
GET /v1/transcript/{id}/srt Export as SRT subtitles
GET /v1/transcript/{id}/vtt Export as WebVTT
GET /v1/transcript/{id}/txt Export as plain text
GET /v1/transcript/{id}/json Export as JSON
DELETE /v1/transcript/{id} Delete transcript
GET /health Health check (no auth)

Submit Transcription

File upload:

curl -X POST http://localhost:8880/v1/transcript \
  -H "Authorization: namastex888" \
  -F "file=@audio.mp3"

URL download:

curl -X POST http://localhost:8880/v1/transcript \
  -H "Authorization: namastex888" \
  -F "audio_url=https://example.com/audio.mp3"

With speaker diarization:

curl -X POST http://localhost:8880/v1/transcript \
  -H "Authorization: namastex888" \
  -F "file=@audio.mp3" \
  -F "speaker_labels=true" \
  -F "speakers_expected=2"

Response Format

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "text": "Hello world, this is a transcription.",
  "words": [
    {"text": "Hello", "start": 0, "end": 500, "confidence": 0.98, "speaker": "A"}
  ],
  "utterances": [
    {"speaker": "A", "text": "Hello world...", "start": 0, "end": 3000}
  ],
  "language_code": "en"
}

Status values: queuedprocessingcompleted (or error)

Configuration

All settings via environment variables with WHISPERX_ prefix. Everything has sensible defaults - no .env file needed for local use.

Variable Default Description
WHISPERX_API_KEY namastex888 API authentication key
WHISPERX_HOST 0.0.0.0 Server bind address
WHISPERX_PORT 8880 Server port
WHISPERX_MODEL large-v3-turbo WhisperX model
WHISPERX_DATA_DIR ./data SQLite database location
WHISPERX_HF_TOKEN - HuggingFace token (for diarization)
WHISPERX_DEVICE 0 GPU device index

Speaker Diarization Setup

To enable speaker_labels=true:

  1. Accept license at pyannote/speaker-diarization
  2. Get token at huggingface.co/settings/tokens
  3. Add to config:
    echo "WHISPERX_HF_TOKEN=hf_xxx" >> ~/.config/whisperx-api/.env
    

Troubleshooting

CUDA not available:

# Check NVIDIA driver
nvidia-smi

# Check PyTorch CUDA
python -c "import torch; print(torch.cuda.is_available())"

Out of VRAM:

  • Use smaller model: WHISPERX_MODEL=medium
  • Reduce batch size: WHISPERX_BATCH_SIZE=8

Diarization fails:

  • Verify HF token: echo $WHISPERX_HF_TOKEN
  • Accept license at HuggingFace (link above)

Built On

This project wraps the incredible WhisperX by @m-bain - fast automatic speech recognition with word-level timestamps and speaker diarization.


Development

Setup

git clone https://github.com/namastexlabs/whisperx-api.git
cd whisperx-api
uv sync

Run Tests

uv run pytest tests/ -v

Code Quality

uv run ruff check .
uv run ruff format .
uv run mypy src/

Project Structure

whisperx-api/
├── src/whisperx_api/
│   ├── server.py          # FastAPI application
│   ├── transcriber.py     # WhisperX pipeline
│   ├── model_manager.py   # GPU model caching
│   ├── database.py        # SQLite persistence
│   ├── config.py          # Settings management
│   ├── auth.py            # API authentication
│   ├── models.py          # Pydantic schemas
│   ├── deps.py            # Dependency checks
│   └── main.py            # CLI entry point
├── tests/                 # Test suite
├── get-whisperx.sh        # One-liner installer
└── pyproject.toml         # Project config

CI/CD

  • CI: Runs on every push (lint, typecheck, test)

Performance Notes

  • First request: ~60-90s (model loading)
  • Subsequent: ~same as audio duration
  • VRAM usage: ~5-6GB for large-v3-turbo

Made with ❤️ by Namastex Labs

Star us on GitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisperx_api-2.0.1rc2.tar.gz (496.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whisperx_api-2.0.1rc2-py3-none-any.whl (27.0 kB view details)

Uploaded Python 3

File details

Details for the file whisperx_api-2.0.1rc2.tar.gz.

File metadata

  • Download URL: whisperx_api-2.0.1rc2.tar.gz
  • Upload date:
  • Size: 496.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for whisperx_api-2.0.1rc2.tar.gz
Algorithm Hash digest
SHA256 1f4aa2d15c7dafbb5bcec1930f38e2619405e3433aeba9910c8134fd308c87c6
MD5 9def95e5e40cc139d2d2adee6e1a6a95
BLAKE2b-256 a8cf5010f48d39af80aec28147064672f191cf8c489b1d8b7efd8e5859376f0f

See more details on using hashes here.

File details

Details for the file whisperx_api-2.0.1rc2-py3-none-any.whl.

File metadata

File hashes

Hashes for whisperx_api-2.0.1rc2-py3-none-any.whl
Algorithm Hash digest
SHA256 3ee11bde176696f19b205a58ef24fa5b0274ce6c6a1111dd5e8a6a00215e9a74
MD5 978944ea13b4466c026618f7201b70ee
BLAKE2b-256 e6d2b6dd23fa92243d1693c40fe8c93b71ce93b29d7875d6e0292d2a9d959ed2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page