Local WhisperX transcription API with speaker diarization

These details have not been verified by PyPI

Project links

Project description

WhisperX API

GPU-powered transcription API in one command

Python 3.12

Features • Quick Start • API • Config • Development

Turn any audio into text with speaker labels. No cloud. No limits. Just run:

uvx whisperx-api

WhisperX API wraps WhisperX in a REST API with speaker diarization, word-level timestamps, and multiple export formats. Self-hosted alternative to AssemblyAI, Deepgram, and Rev.ai.

Features

Speaker Diarization - Identify who said what with pyannote
Word-Level Timestamps - Precise alignment for every word
Multiple Export Formats - SRT, WebVTT, TXT, JSON
Webhook Callbacks - Get notified when transcription completes
GPU Model Caching - Fast subsequent transcriptions
Background Processing - Non-blocking async jobs
Progress Tracking - Poll for real-time status

Quick Start

Prerequisites

NVIDIA GPU with 6GB+ VRAM (or CPU mode for testing)
CUDA 12.x drivers installed

Option A: One-Liner Install (Recommended)

curl -fsSL https://raw.githubusercontent.com/namastexlabs/whisperx-api/main/get-whisperx.sh | bash

This installs Python 3.12, uv, checks CUDA, and sets up whisperx-api.

Option B: Direct Run (if dependencies met)

uvx whisperx-api

Option C: pip install

pip install whisperx-api
whisperx-api

The API starts at http://localhost:8880. Swagger docs at /docs.

First Transcription

# Default API key is "namastex888" - works out of the box
curl -X POST http://localhost:8880/v1/transcript \
  -H "Authorization: namastex888" \
  -F "file=@audio.mp3"

# Check status (replace {id} with returned transcript ID)
curl http://localhost:8880/v1/transcript/{id} \
  -H "Authorization: namastex888"

API Reference

Method	Endpoint	Description
`POST`	`/v1/transcript`	Submit transcription job
`GET`	`/v1/transcript/{id}`	Get transcript status/result
`GET`	`/v1/transcript/{id}/srt`	Export as SRT subtitles
`GET`	`/v1/transcript/{id}/vtt`	Export as WebVTT
`GET`	`/v1/transcript/{id}/txt`	Export as plain text
`GET`	`/v1/transcript/{id}/json`	Export as JSON
`DELETE`	`/v1/transcript/{id}`	Delete transcript
`GET`	`/health`	Health check (no auth)

Submit Transcription

File upload:

curl -X POST http://localhost:8880/v1/transcript \
  -H "Authorization: namastex888" \
  -F "file=@audio.mp3"

URL download:

curl -X POST http://localhost:8880/v1/transcript \
  -H "Authorization: namastex888" \
  -F "audio_url=https://example.com/audio.mp3"

With speaker diarization:

curl -X POST http://localhost:8880/v1/transcript \
  -H "Authorization: namastex888" \
  -F "file=@audio.mp3" \
  -F "speaker_labels=true" \
  -F "speakers_expected=2"

Response Format

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "text": "Hello world, this is a transcription.",
  "words": [
    {"text": "Hello", "start": 0, "end": 500, "confidence": 0.98, "speaker": "A"}
  ],
  "utterances": [
    {"speaker": "A", "text": "Hello world...", "start": 0, "end": 3000}
  ],
  "language_code": "en"
}

Status values: queued → processing → completed (or error)

Configuration

All settings via environment variables with WHISPERX_ prefix. Everything has sensible defaults - no .env file needed for local use.

Variable	Default	Description
`WHISPERX_API_KEY`	`namastex888`	API authentication key
`WHISPERX_HOST`	`0.0.0.0`	Server bind address
`WHISPERX_PORT`	`8880`	Server port
`WHISPERX_MODEL`	`large-v3-turbo`	WhisperX model
`WHISPERX_DATA_DIR`	`./data`	SQLite database location
`WHISPERX_HF_TOKEN`	-	HuggingFace token (for diarization)
`WHISPERX_DEVICE`	`0`	GPU device index

Speaker Diarization Setup

To enable speaker_labels=true:

Accept license at pyannote/speaker-diarization
Get token at huggingface.co/settings/tokens

Add to config:

echo "WHISPERX_HF_TOKEN=hf_xxx" >> ~/.config/whisperx-api/.env

Troubleshooting

CUDA not available:

# Check NVIDIA driver
nvidia-smi

# Check PyTorch CUDA
python -c "import torch; print(torch.cuda.is_available())"

Out of VRAM:

Use smaller model: WHISPERX_MODEL=medium
Reduce batch size: WHISPERX_BATCH_SIZE=8

Diarization fails:

Verify HF token: echo $WHISPERX_HF_TOKEN
Accept license at HuggingFace (link above)

Built On

This project wraps the incredible WhisperX by @m-bain - fast automatic speech recognition with word-level timestamps and speaker diarization.

Development

Setup

git clone https://github.com/namastexlabs/whisperx-api.git
cd whisperx-api
uv sync

Run Tests

uv run pytest tests/ -v

Code Quality

uv run ruff check .
uv run ruff format .
uv run mypy src/

Project Structure

whisperx-api/
├── src/whisperx_api/
│   ├── server.py          # FastAPI application
│   ├── transcriber.py     # WhisperX pipeline
│   ├── model_manager.py   # GPU model caching
│   ├── database.py        # SQLite persistence
│   ├── config.py          # Settings management
│   ├── auth.py            # API authentication
│   ├── models.py          # Pydantic schemas
│   ├── deps.py            # Dependency checks
│   └── main.py            # CLI entry point
├── tests/                 # Test suite
├── get-whisperx.sh        # One-liner installer
└── pyproject.toml         # Project config

CI/CD

CI: Runs on every push (lint, typecheck, test)

Performance Notes

First request: ~60-90s (model loading)
Subsequent: ~same as audio duration
VRAM usage: ~5-6GB for large-v3-turbo

Made with ❤️ by Namastex Labs

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.0.1rc2 pre-release

Dec 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisperx_api-2.0.1rc2.tar.gz (496.9 kB view details)

Uploaded Dec 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

whisperx_api-2.0.1rc2-py3-none-any.whl (27.0 kB view details)

Uploaded Dec 12, 2025 Python 3

File details

Details for the file whisperx_api-2.0.1rc2.tar.gz.

File metadata

Download URL: whisperx_api-2.0.1rc2.tar.gz
Upload date: Dec 12, 2025
Size: 496.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for whisperx_api-2.0.1rc2.tar.gz
Algorithm	Hash digest
SHA256	`1f4aa2d15c7dafbb5bcec1930f38e2619405e3433aeba9910c8134fd308c87c6`
MD5	`9def95e5e40cc139d2d2adee6e1a6a95`
BLAKE2b-256	`a8cf5010f48d39af80aec28147064672f191cf8c489b1d8b7efd8e5859376f0f`

See more details on using hashes here.

File details

Details for the file whisperx_api-2.0.1rc2-py3-none-any.whl.

File metadata

Download URL: whisperx_api-2.0.1rc2-py3-none-any.whl
Upload date: Dec 12, 2025
Size: 27.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for whisperx_api-2.0.1rc2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3ee11bde176696f19b205a58ef24fa5b0274ce6c6a1111dd5e8a6a00215e9a74`
MD5	`978944ea13b4466c026618f7201b70ee`
BLAKE2b-256	`e6d2b6dd23fa92243d1693c40fe8c93b71ce93b29d7875d6e0292d2a9d959ed2`

See more details on using hashes here.

whisperx-api 2.0.1rc2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

WhisperX API

Features

Quick Start

Prerequisites

Option A: One-Liner Install (Recommended)

Option B: Direct Run (if dependencies met)

Option C: pip install

First Transcription

API Reference

Submit Transcription

Response Format

Configuration

Speaker Diarization Setup

Troubleshooting

Built On

Development

Setup

Run Tests

Code Quality

Project Structure

CI/CD

Performance Notes

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes