Skip to main content

Time-Accurate Automatic Speech Recognition using Whisper.

Project description

whisper(ml)x

Fast, accurate speech recognition on Apple Silicon — powered by MLX.

A fork of WhisperX with the inference backend replaced by mlx-whisper, running natively on Apple Silicon via MLX. Word-level timestamps, speaker diarization, and VAD are all retained.

  • ⚡️ MLX inference — runs on Apple Silicon GPU via unified memory
  • 🎯 Word-level timestamps via wav2vec2 forced alignment
  • 👥 Speaker diarization via pyannote-audio
  • 🗣️ VAD preprocessing via pyannote or silero

Installation

pip install whispermlx

Or with uv:

uv add whispermlx

Usage

CLI

# Auto-downloads mlx-community/whisper-large-v3-mlx on first run
whispermlx audio.mp3 --model large-v3

# With speaker diarization
whispermlx audio.mp3 --model large-v3 --diarize --hf_token YOUR_TOKEN

# Use any mlx-community model directly
whispermlx audio.mp3 --model mlx-community/whisper-large-v3-turbo

Python

import whispermlx

# Short name — auto-maps to mlx-community/whisper-large-v3-mlx
model = whispermlx.load_model("large-v3", device="cpu")
result = model.transcribe("audio.mp3")
print(result["segments"])

# With alignment
model_a, metadata = whispermlx.load_align_model(language_code=result["language"], device="cpu")
result = whispermlx.align(result["segments"], model_a, metadata, "audio.mp3", device="cpu")

# With diarization
from whispermlx.diarize import DiarizationPipeline
diarize_model = DiarizationPipeline(token="YOUR_HF_TOKEN", device="cpu")
diarize_segments = diarize_model("audio.mp3")
result = whispermlx.assign_word_speakers(diarize_segments, result)

Model Names

Short names are automatically mapped to their mlx-community equivalents. Full HF repo IDs also work.

Short name HF repo
tiny, base, small, medium mlx-community/whisper-{name}-mlx
large-v3 mlx-community/whisper-large-v3-mlx
large-v3-turbo / turbo mlx-community/whisper-large-v3-turbo

Speaker Diarization

Requires a Hugging Face access token and acceptance of the pyannote speaker-diarization-community-1 model agreement.

Acknowledgements

Built on top of WhisperX by Max Bain et al., mlx-whisper, pyannote-audio, and OpenAI Whisper.

@article{bain2022whisperx,
  title={WhisperX: Time-Accurate Speech Transcription of Long-Form Audio},
  author={Bain, Max and Huh, Jaesung and Han, Tengda and Zisserman, Andrew},
  journal={INTERSPEECH 2023},
  year={2023}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whispermlx-3.9.2.tar.gz (16.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whispermlx-3.9.2-py3-none-any.whl (16.5 MB view details)

Uploaded Python 3

File details

Details for the file whispermlx-3.9.2.tar.gz.

File metadata

  • Download URL: whispermlx-3.9.2.tar.gz
  • Upload date:
  • Size: 16.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for whispermlx-3.9.2.tar.gz
Algorithm Hash digest
SHA256 6ddc1243ccd38e9ed34f9160b440d7acc961b0e32256025da2b40db9338279b2
MD5 91156b324976a38e53e4ad3bd58a94ca
BLAKE2b-256 5c9f7b55e9d8391a934a4624f344f7b36527934ff5e3eda0a61cae8d6aba7b60

See more details on using hashes here.

File details

Details for the file whispermlx-3.9.2-py3-none-any.whl.

File metadata

  • Download URL: whispermlx-3.9.2-py3-none-any.whl
  • Upload date:
  • Size: 16.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for whispermlx-3.9.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2cdd9b8c1dd9e98d0840462171d047fe415148102dd8baaa337b05f6972f8115
MD5 7f04625e25ccc54928160abe9b5d8a11
BLAKE2b-256 0c5b4a619fe30ce35b5a9718fabfd37a04516ad6e80346bb6ade951a8a21b9d3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page