Time-Accurate Automatic Speech Recognition using Whisper.

Project description

whisper(ml)x

Fast, accurate speech recognition on Apple Silicon — powered by MLX.

A fork of WhisperX with the inference backend replaced by mlx-whisper, running natively on Apple Silicon via MLX. Word-level timestamps, speaker diarization, and VAD are all retained.

⚡️ MLX inference — runs on Apple Silicon GPU via unified memory
🎯 Word-level timestamps via wav2vec2 forced alignment
👥 Speaker diarization via pyannote-audio
🗣️ VAD preprocessing via pyannote or silero

Installation

pip install whispermlx

Or with uv:

uv add whispermlx

Usage

CLI

# Auto-downloads mlx-community/whisper-large-v3-mlx on first run
whispermlx audio.mp3 --model large-v3

# With speaker diarization
whispermlx audio.mp3 --model large-v3 --diarize --hf_token YOUR_TOKEN

# Use any mlx-community model directly
whispermlx audio.mp3 --model mlx-community/whisper-large-v3-turbo

Python

import whispermlx

# Short name — auto-maps to mlx-community/whisper-large-v3-mlx
model = whispermlx.load_model("large-v3", device="cpu")
result = model.transcribe("audio.mp3")
print(result["segments"])

# With alignment
model_a, metadata = whispermlx.load_align_model(language_code=result["language"], device="cpu")
result = whispermlx.align(result["segments"], model_a, metadata, "audio.mp3", device="cpu")

# With diarization
from whispermlx.diarize import DiarizationPipeline
diarize_model = DiarizationPipeline(token="YOUR_HF_TOKEN", device="cpu")
diarize_segments = diarize_model("audio.mp3")
result = whispermlx.assign_word_speakers(diarize_segments, result)

Model Names

Short names are automatically mapped to their mlx-community equivalents. Full HF repo IDs also work.

Short name	HF repo
`tiny`, `base`, `small`, `medium`	`mlx-community/whisper-{name}-mlx`
`large-v3`	`mlx-community/whisper-large-v3-mlx`
`large-v3-turbo` / `turbo`	`mlx-community/whisper-large-v3-turbo`

Speaker Diarization

Requires a Hugging Face access token and acceptance of the pyannote speaker-diarization-community-1 model agreement.

Acknowledgements

Built on top of WhisperX by Max Bain et al., mlx-whisper, pyannote-audio, and OpenAI Whisper.

@article{bain2022whisperx,
  title={WhisperX: Time-Accurate Speech Transcription of Long-Form Audio},
  author={Bain, Max and Huh, Jaesung and Han, Tengda and Zisserman, Andrew},
  journal={INTERSPEECH 2023},
  year={2023}
}

Project details

Release history Release notifications | RSS feed

3.12.1

Apr 7, 2026

3.12.0

Apr 4, 2026

3.11.2

Apr 4, 2026

3.11.1

Apr 2, 2026

3.11.0

Mar 31, 2026

3.10.1

Mar 24, 2026

3.10.0

Mar 24, 2026

3.9.3

Mar 23, 2026

This version

3.9.2

Mar 23, 2026

3.9.1

Mar 23, 2026

3.9.0

Mar 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whispermlx-3.9.2.tar.gz (16.5 MB view details)

Uploaded Mar 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

whispermlx-3.9.2-py3-none-any.whl (16.5 MB view details)

Uploaded Mar 23, 2026 Python 3

File details

Details for the file whispermlx-3.9.2.tar.gz.

File metadata

Download URL: whispermlx-3.9.2.tar.gz
Upload date: Mar 23, 2026
Size: 16.5 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for whispermlx-3.9.2.tar.gz
Algorithm	Hash digest
SHA256	`6ddc1243ccd38e9ed34f9160b440d7acc961b0e32256025da2b40db9338279b2`
MD5	`91156b324976a38e53e4ad3bd58a94ca`
BLAKE2b-256	`5c9f7b55e9d8391a934a4624f344f7b36527934ff5e3eda0a61cae8d6aba7b60`

See more details on using hashes here.

File details

Details for the file whispermlx-3.9.2-py3-none-any.whl.

File metadata

Download URL: whispermlx-3.9.2-py3-none-any.whl
Upload date: Mar 23, 2026
Size: 16.5 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for whispermlx-3.9.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2cdd9b8c1dd9e98d0840462171d047fe415148102dd8baaa337b05f6972f8115`
MD5	`7f04625e25ccc54928160abe9b5d8a11`
BLAKE2b-256	`0c5b4a619fe30ce35b5a9718fabfd37a04516ad6e80346bb6ade951a8a21b9d3`

See more details on using hashes here.

whispermlx 3.9.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

whisper(ml)x

Installation

Usage

CLI

Python

Model Names

Speaker Diarization

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes