Vayu - The fastest Whisper implementation on Apple Silicon
Project description
Vayu (وایو)
The fastest Whisper implementation on Apple Silicon.
Vayu (وایو) is the ancient Persian god of wind — the swiftest force in nature. In Zoroastrian mythology, Vayu represents the divine wind that moves faster than any earthly creature. We chose this name because this implementation outperforms even "lightning-fast" alternatives, making Vayu the most fitting name for the fastest Whisper on Apple Silicon.
Acknowledgments
This project builds upon the excellent work of others. We are grateful to:
- Apple MLX Team - For the MLX framework and the original Whisper MLX implementation with CLI support, output writers, and numerical stability improvements
- Mustafa Aljadery - For the lightning-fast batched decoding implementation that significantly improves throughput
- Siddharth Sharma - Co-author of lightning-whisper-mlx
- OpenAI - For creating the original Whisper model and making it open source
This unified implementation combines the best of both worlds:
- ml-explore/mlx-examples/whisper - Newer APIs, CLI support, output writers, numerical stability
- lightning-whisper-mlx - Batched decoding for higher throughput
Features
- Batched decoding - Process multiple audio segments in parallel for 3-5x faster transcription
- Multiple output formats - txt, vtt, srt, tsv, json
- Word-level timestamps - Extract precise word timings
- Multiple model support - tiny, base, small, medium, large-v3, turbo, distil variants
- Quantization - 4-bit and 8-bit quantized models for reduced memory usage
- Simple API - Easy-to-use
LightningWhisperMLXwrapper class
Installation
# Clone the repository
git clone <repo-url>
cd vayu
# Install the package
pip install -e .
# Download required assets (mel filters and tokenizer vocabularies)
python -m whisper_mlx.assets.download_assets
Requirements
- macOS with Apple Silicon (M1/M2/M3)
- Python 3.10+
- MLX 0.11+
Quick Start
Simple API
from whisper_mlx import LightningWhisperMLX
# Initialize with batched decoding
whisper = LightningWhisperMLX(model="distil-large-v3", batch_size=12)
# Transcribe
result = whisper.transcribe("audio.mp3")
print(result["text"])
# With options
result = whisper.transcribe(
"audio.mp3",
language="en",
word_timestamps=True,
)
Full API
from whisper_mlx import transcribe
result = transcribe(
"audio.mp3",
path_or_hf_repo="mlx-community/whisper-turbo",
batch_size=6,
language="en",
word_timestamps=True,
)
print(result["text"])
for segment in result["segments"]:
print(f"[{segment['start']:.2f} -> {segment['end']:.2f}] {segment['text']}")
CLI
# Basic transcription
vayu audio.mp3
# With batched decoding (faster)
vayu audio.mp3 --batch-size 12
# Specify model and output format
vayu audio.mp3 --model mlx-community/distil-whisper-large-v3 --output-format srt
# Multiple files
vayu audio1.mp3 audio2.mp3 --output-dir ./transcripts
# With word timestamps
vayu audio.mp3 --word-timestamps True
# Translate to English
vayu audio.mp3 --task translate
Available Models
| Model | HuggingFace Repo | Size | Speed |
|---|---|---|---|
| tiny | mlx-community/whisper-tiny-mlx | 39M | Fastest |
| base | mlx-community/whisper-base-mlx | 74M | Fast |
| small | mlx-community/whisper-small-mlx | 244M | Medium |
| medium | mlx-community/whisper-medium-mlx | 769M | Slow |
| large-v3 | mlx-community/whisper-large-v3-mlx | 1.5B | Slowest |
| turbo | mlx-community/whisper-turbo | 809M | Fast |
| distil-large-v3 | mlx-community/distil-whisper-large-v3 | 756M | Fast |
Quantized Models
For reduced memory usage, use quantized models:
whisper = LightningWhisperMLX(model="distil-large-v3", quant="4bit")
Batch Size Recommendations
| Model | Recommended batch_size | Memory Usage |
|---|---|---|
| tiny/base | 24-32 | Low |
| small | 16-24 | Medium |
| medium | 8-12 | High |
| large/turbo | 4-8 | High |
| distil-large-v3 | 12-16 | Medium |
Higher batch sizes improve throughput but require more memory. Start with the recommended values and adjust based on your hardware.
API Reference
transcribe()
def transcribe(
audio: Union[str, np.ndarray, mx.array],
*,
path_or_hf_repo: str = "mlx-community/whisper-turbo",
batch_size: int = 1,
verbose: Optional[bool] = None,
temperature: Union[float, Tuple[float, ...]] = (0.0, 0.2, 0.4, 0.6, 0.8, 1.0),
compression_ratio_threshold: Optional[float] = 2.4,
logprob_threshold: Optional[float] = -1.0,
no_speech_threshold: Optional[float] = 0.6,
condition_on_previous_text: bool = True,
initial_prompt: Optional[str] = None,
word_timestamps: bool = False,
**decode_options,
) -> dict
LightningWhisperMLX
class LightningWhisperMLX:
def __init__(
self,
model: str = "distil-large-v3",
batch_size: int = 12,
quant: str = None,
)
def transcribe(
self,
audio_path: str,
language: str = None,
task: str = "transcribe",
verbose: bool = False,
word_timestamps: bool = False,
**kwargs,
) -> dict
License
MIT License - see LICENSE file for details.
Author
Behnam Ebrahimi - Unified implementation, security improvements, and maintenance
Credits
This project would not be possible without:
| Project | Author(s) | Contribution |
|---|---|---|
| mlx-examples/whisper | Apple Inc. | MLX framework, Whisper port, CLI, output writers |
| lightning-whisper-mlx | Mustafa Aljadery, Siddharth Sharma | Batched decoding for 3-5x speedup |
| Whisper | OpenAI | Original model architecture and weights |
Thank you to all contributors who make open source AI accessible and fast on Apple Silicon.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vayu_whisper-1.0.0.tar.gz.
File metadata
- Download URL: vayu_whisper-1.0.0.tar.gz
- Upload date:
- Size: 790.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6b35b1f8e9bb6d08edca7e1074d127e2f03d35b4bbe4e2287c99fb475d9f1a3
|
|
| MD5 |
60c2ffe3015c18567662b873db6ecb0f
|
|
| BLAKE2b-256 |
0c071e106f4bd747d05847909c777cfcf578c24bb925e304efb1e5eccb12981a
|
File details
Details for the file vayu_whisper-1.0.0-py3-none-any.whl.
File metadata
- Download URL: vayu_whisper-1.0.0-py3-none-any.whl
- Upload date:
- Size: 794.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8095821105ac31f394f226247f639dd723927c9c4c134270981bae524af9f7f7
|
|
| MD5 |
6756302d1cad776946f0c1397ade5e1f
|
|
| BLAKE2b-256 |
ad44be1b266cdfce696a968a0d6b408d9ef4d3d71736200535612847ee663409
|