OpenAI Whisper on Apple silicon with MLX and the Hugging Face Hub
Project description
Whisper
Speech recognition with Whisper in MLX. Whisper is a set of open source speech recognition models from OpenAI, ranging from 39 million to 1.5 billion parameters.[^1]
Setup
Install ffmpeg
:
# on macOS using Homebrew (https://brew.sh/)
brew install ffmpeg
Install the mlx-whisper
package with:
pip install mlx-whisper
Run
CLI
At its simplest:
mlx_whisper audio_file.mp3
This will make a text file audio_file.txt
with the results.
Use -f
to specify the output format and --model
to specify the model. There
are many other supported command line options. To see them all, run
mlx_whisper -h
.
API
Transcribe audio with:
import mlx_whisper
text = mlx_whisper.transcribe(speech_file)["text"]
The default model is "mlx-community/whisper-tiny". Choose the model by
setting path_or_hf_repo
. For example:
result = mlx_whisper.transcribe(speech_file, path_or_hf_repo="models/large")
This will load the model contained in models/large
. The path_or_hf_repo
can
also point to an MLX-style Whisper model on the Hugging Face Hub. In this case,
the model will be automatically downloaded. A collection of pre-converted
Whisper
models
are in the Hugging Face MLX Community.
The transcribe
function also supports word-level timestamps. You can generate
these with:
output = mlx_whisper.transcribe(speech_file, word_timestamps=True)
print(output["segments"][0]["words"])
To see more transcription options use:
>>> help(mlx_whisper.transcribe)
Converting models
[!TIP] Skip the conversion step by using pre-converted checkpoints from the Hugging Face Hub. There are a few available in the MLX Community organization.
To convert a model, first clone the MLX Examples repo:
git clone https://github.com/ml-explore/mlx-examples.git
Then run convert.py
from mlx-examples/whisper
. For example, to convert the
tiny
model use:
python convert.py --torch-name-or-path tiny --mlx-path mlx_models/tiny
Note you can also convert a local PyTorch checkpoint which is in the original OpenAI format.
To generate a 4-bit quantized model, use -q
. For a full list of options:
python convert.py --help
By default, the conversion script will make the directory mlx_models
and save the converted weights.npz
and config.json
there.
Each time it is run, convert.py
will overwrite any model in the provided
path. To save different models, make sure to set --mlx-path
to a unique
directory for each converted model. For example:
model="tiny"
python convert.py --torch-name-or-path ${model} --mlx-path mlx_models/${model}_fp16
python convert.py --torch-name-or-path ${model} --dtype float32 --mlx-path mlx_models/${model}_fp32
python convert.py --torch-name-or-path ${model} -q --q_bits 4 --mlx-path mlx_models/${model}_quantized_4bits
[^1]: Refer to the arXiv paper, blog post, and code for more details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mlx_whisper-0.4.0.tar.gz
.
File metadata
- Download URL: mlx_whisper-0.4.0.tar.gz
- Upload date:
- Size: 778.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 48aed159892c20d89d1acdf8a7a323ce379af0a7670e55b841749ea30774a906 |
|
MD5 | 0ed8ac3bda646c3f46f0896c5b9db78a |
|
BLAKE2b-256 | 40012a19456b74ac0e4a8ca7e143df42c00f62b3ff2dfcee6a06c4ee14674e6b |
Provenance
File details
Details for the file mlx_whisper-0.4.0-py3-none-any.whl
.
File metadata
- Download URL: mlx_whisper-0.4.0-py3-none-any.whl
- Upload date:
- Size: 782.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9fed024d7be32a693d885d1f65012133137a7cc00c113a8e0396c3912bc69ab |
|
MD5 | 9f6a549cba514f9bd6527a8cc827777f |
|
BLAKE2b-256 | 3bc0836d5e9b118fb557dce3fd9848050a52d749af16885b8c8d56d299c8053c |