Skip to main content

Vayu - The fastest Whisper implementation on Apple Silicon

Project description

Vayu (وایو)

Python 3.10+ License: MIT Platform: macOS Apple Silicon MLX Version

The fastest Whisper implementation on Apple Silicon.

Vayu (وایو) is the ancient Persian god of wind — the swiftest force in nature. In Zoroastrian mythology, Vayu represents the divine wind that moves faster than any earthly creature. We chose this name because this implementation outperforms even "lightning-fast" alternatives, making Vayu the most fitting name for the fastest Whisper on Apple Silicon.

Acknowledgments

This project builds upon the excellent work of others. We are grateful to:

  • Apple MLX Team - For the MLX framework and the original Whisper MLX implementation with CLI support, output writers, and numerical stability improvements
  • Mustafa Aljadery - For the lightning-fast batched decoding implementation that significantly improves throughput
  • Siddharth Sharma - Co-author of lightning-whisper-mlx
  • OpenAI - For creating the original Whisper model and making it open source

This unified implementation combines the best of both worlds:

  • ml-explore/mlx-examples/whisper - Newer APIs, CLI support, output writers, numerical stability
  • lightning-whisper-mlx - Batched decoding for higher throughput

Features

  • Batched decoding - Process multiple audio segments in parallel for 3-5x faster transcription
  • Multiple output formats - txt, vtt, srt, tsv, json
  • Word-level timestamps - Extract precise word timings
  • Multiple model support - tiny, base, small, medium, large-v3, turbo, distil variants
  • Quantization - 4-bit and 8-bit quantized models for reduced memory usage
  • Simple API - Easy-to-use LightningWhisperMLX wrapper class

Installation

# Clone the repository
git clone <repo-url>
cd vayu

# Install the package
pip install -e .

# Download required assets (mel filters and tokenizer vocabularies)
python -m whisper_mlx.assets.download_assets

Requirements

  • macOS with Apple Silicon (M1/M2/M3)
  • Python 3.10+
  • MLX 0.11+

Quick Start

Simple API

from whisper_mlx import LightningWhisperMLX

# Initialize with batched decoding
whisper = LightningWhisperMLX(model="distil-large-v3", batch_size=12)

# Transcribe
result = whisper.transcribe("audio.mp3")
print(result["text"])

# With options
result = whisper.transcribe(
    "audio.mp3",
    language="en",
    word_timestamps=True,
)

Full API

from whisper_mlx import transcribe

result = transcribe(
    "audio.mp3",
    path_or_hf_repo="mlx-community/whisper-turbo",
    batch_size=6,
    language="en",
    word_timestamps=True,
)

print(result["text"])
for segment in result["segments"]:
    print(f"[{segment['start']:.2f} -> {segment['end']:.2f}] {segment['text']}")

CLI

# Basic transcription
vayu audio.mp3

# With batched decoding (faster)
vayu audio.mp3 --batch-size 12

# Specify model and output format
vayu audio.mp3 --model mlx-community/distil-whisper-large-v3 --output-format srt

# Multiple files
vayu audio1.mp3 audio2.mp3 --output-dir ./transcripts

# With word timestamps
vayu audio.mp3 --word-timestamps True

# Translate to English
vayu audio.mp3 --task translate

Available Models

Model HuggingFace Repo Size Speed
tiny mlx-community/whisper-tiny-mlx 39M Fastest
base mlx-community/whisper-base-mlx 74M Fast
small mlx-community/whisper-small-mlx 244M Medium
medium mlx-community/whisper-medium-mlx 769M Slow
large-v3 mlx-community/whisper-large-v3-mlx 1.5B Slowest
turbo mlx-community/whisper-turbo 809M Fast
distil-large-v3 mlx-community/distil-whisper-large-v3 756M Fast

Quantized Models

For reduced memory usage, use quantized models:

whisper = LightningWhisperMLX(model="distil-large-v3", quant="4bit")

Batch Size Recommendations

Model Recommended batch_size Memory Usage
tiny/base 24-32 Low
small 16-24 Medium
medium 8-12 High
large/turbo 4-8 High
distil-large-v3 12-16 Medium

Higher batch sizes improve throughput but require more memory. Start with the recommended values and adjust based on your hardware.

API Reference

transcribe()

def transcribe(
    audio: Union[str, np.ndarray, mx.array],
    *,
    path_or_hf_repo: str = "mlx-community/whisper-turbo",
    batch_size: int = 1,
    verbose: Optional[bool] = None,
    temperature: Union[float, Tuple[float, ...]] = (0.0, 0.2, 0.4, 0.6, 0.8, 1.0),
    compression_ratio_threshold: Optional[float] = 2.4,
    logprob_threshold: Optional[float] = -1.0,
    no_speech_threshold: Optional[float] = 0.6,
    condition_on_previous_text: bool = True,
    initial_prompt: Optional[str] = None,
    word_timestamps: bool = False,
    **decode_options,
) -> dict

LightningWhisperMLX

class LightningWhisperMLX:
    def __init__(
        self,
        model: str = "distil-large-v3",
        batch_size: int = 12,
        quant: str = None,
    )

    def transcribe(
        self,
        audio_path: str,
        language: str = None,
        task: str = "transcribe",
        verbose: bool = False,
        word_timestamps: bool = False,
        **kwargs,
    ) -> dict

License

MIT License - see LICENSE file for details.

Author

Behnam Ebrahimi - Unified implementation, security improvements, and maintenance

Credits

This project would not be possible without:

Project Author(s) Contribution
mlx-examples/whisper Apple Inc. MLX framework, Whisper port, CLI, output writers
lightning-whisper-mlx Mustafa Aljadery, Siddharth Sharma Batched decoding for 3-5x speedup
Whisper OpenAI Original model architecture and weights

Thank you to all contributors who make open source AI accessible and fast on Apple Silicon.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vayu_whisper-1.0.0.tar.gz (790.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vayu_whisper-1.0.0-py3-none-any.whl (794.7 kB view details)

Uploaded Python 3

File details

Details for the file vayu_whisper-1.0.0.tar.gz.

File metadata

  • Download URL: vayu_whisper-1.0.0.tar.gz
  • Upload date:
  • Size: 790.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for vayu_whisper-1.0.0.tar.gz
Algorithm Hash digest
SHA256 f6b35b1f8e9bb6d08edca7e1074d127e2f03d35b4bbe4e2287c99fb475d9f1a3
MD5 60c2ffe3015c18567662b873db6ecb0f
BLAKE2b-256 0c071e106f4bd747d05847909c777cfcf578c24bb925e304efb1e5eccb12981a

See more details on using hashes here.

File details

Details for the file vayu_whisper-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: vayu_whisper-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 794.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for vayu_whisper-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8095821105ac31f394f226247f639dd723927c9c4c134270981bae524af9f7f7
MD5 6756302d1cad776946f0c1397ade5e1f
BLAKE2b-256 ad44be1b266cdfce696a968a0d6b408d9ef4d3d71736200535612847ee663409

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page