Time-Accurate Automatic Speech Recognition using Whisper.
Project description
whisper(ml)x
Fast, accurate speech recognition on Apple Silicon — powered by MLX.
A fork of WhisperX with the inference backend replaced by mlx-whisper, running natively on Apple Silicon via MLX. Word-level timestamps, speaker diarization, and VAD are all retained.
- ⚡️ MLX inference — runs on Apple Silicon GPU via unified memory
- 🎯 Word-level timestamps via wav2vec2 forced alignment
- 👥 Speaker diarization via pyannote-audio
- 🗣️ VAD preprocessing via pyannote or silero
Installation
pip install whispermlx
Or with uv:
uv add whispermlx
Usage
CLI
# Auto-downloads mlx-community/whisper-large-v3-mlx on first run
whispermlx audio.mp3 --model large-v3
# With speaker diarization
whispermlx audio.mp3 --model large-v3 --diarize --hf_token YOUR_TOKEN
# Use any mlx-community model directly
whispermlx audio.mp3 --model mlx-community/whisper-large-v3-turbo
Python
import whispermlx
# Short name — auto-maps to mlx-community/whisper-large-v3-mlx
model = whispermlx.load_model("large-v3", device="cpu")
result = model.transcribe("audio.mp3")
print(result["segments"])
# With alignment
model_a, metadata = whispermlx.load_align_model(language_code=result["language"], device="cpu")
result = whispermlx.align(result["segments"], model_a, metadata, "audio.mp3", device="cpu")
# With diarization
from whispermlx.diarize import DiarizationPipeline
diarize_model = DiarizationPipeline(token="YOUR_HF_TOKEN", device="cpu")
diarize_segments = diarize_model("audio.mp3")
result = whispermlx.assign_word_speakers(diarize_segments, result)
Model Names
Short names are automatically mapped to their mlx-community equivalents. Full HF repo IDs also work.
| Short name | HF repo |
|---|---|
tiny, base, small, medium |
mlx-community/whisper-{name}-mlx |
large-v3 |
mlx-community/whisper-large-v3-mlx |
large-v3-turbo / turbo |
mlx-community/whisper-large-v3-turbo |
Speaker Diarization
Requires a Hugging Face access token and acceptance of the pyannote speaker-diarization-community-1 model agreement.
Acknowledgements
Built on top of WhisperX by Max Bain et al., mlx-whisper, pyannote-audio, and OpenAI Whisper.
@article{bain2022whisperx,
title={WhisperX: Time-Accurate Speech Transcription of Long-Form Audio},
author={Bain, Max and Huh, Jaesung and Han, Tengda and Zisserman, Andrew},
journal={INTERSPEECH 2023},
year={2023}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file whispermlx-3.11.1.tar.gz.
File metadata
- Download URL: whispermlx-3.11.1.tar.gz
- Upload date:
- Size: 16.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e2a5bec23051178dbb3d377b42b46aaf27d656f35e6f1d8faa7ab5a40e63faf
|
|
| MD5 |
4d695311f49ce86907bff21599288e76
|
|
| BLAKE2b-256 |
d1848036ae0eaee3800081dfce23e521605734d2b6c094dd946c88aa346b8d17
|
File details
Details for the file whispermlx-3.11.1-py3-none-any.whl.
File metadata
- Download URL: whispermlx-3.11.1-py3-none-any.whl
- Upload date:
- Size: 16.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d61f1e55a19b5ee79fd5a872e0448641eff6a6d64caf954c3d6f546580ab4352
|
|
| MD5 |
7c48b2a542694b4e73b3419292880daa
|
|
| BLAKE2b-256 |
d0eb0dd4410bb41bd490666326f0deb919c3c54d64a0985419c02d24b3b0b9cc
|