Skip to main content

An implementation of the Nvidia's Parakeet models for Apple Silicon using MLX.

Project description

Parakeet MLX

An implementation of the Parakeet models - Nvidia's ASR(Automatic Speech Recognition) models - for Apple Silicon using MLX.

Currently only for Parakeet TDT models. Support for additional Parakeet model variants is planned.

Installation

Using uv - recommended way:

uv add parakeet-mlx -U

Or, for the CLI:

uv tool install parakeet-mlx -U

Using pip:

pip install parakeet-mlx -U

CLI Quick Start

parakeet-mlx <audio_files> [OPTIONS]

Arguments

  • audio_files: One or more audio files to transcribe (WAV, MP3, etc.)

Options

  • --model (default: senstella/parakeet-tdt-0.6b-v2-mlx)

    • Hugging Face repository of the model to use
  • --output-dir (default: current directory)

    • Directory to save transcription outputs
  • --output-format (default: srt)

    • Output format (txt/srt/vtt/json/all)
  • --output-template (default: {filename})

    • Template for output filenames, {filename}, {index}, {date} is supported.
  • --highlight-words (default: False)

    • Enable word-level timestamps in SRT/VTT outputs
  • --verbose / -v (default: False)

    • Print detailed progress information
  • --fp32 / --bf16 (default: bf16)

    • Determinate the precision to use

Examples

# Basic transcription
parakeet-mlx audio.mp3

# Multiple files with word-level timestamps of VTT subtitle
parakeet-mlx *.mp3 --output-format vtt --highlight-words

# Generate all output formats
parakeet-mlx audio.mp3 --output-format all

Python API Quick Start

Transcribe a file:

from parakeet_mlx import from_pretrained

model = from_pretrained("senstella/parakeet-tdt-0.6b-v2-mlx")

result = model.transcribe("audio_file.wav")

print(result.text)

Check timestamps:

from parakeet_mlx import from_pretrained

model = from_pretrained("senstella/parakeet-tdt-0.6b-v2-mlx")

result = model.transcribe("audio_file.wav")

print(result.sentences)
# [AlignedSentence(text="Hello World.", start=1.01, end=2.04, duration=1.03, tokens=[...])]

Timestamp Result

  • AlignedResult: Top-level result containing the full text and sentences
    • text: Full transcribed text
    • sentences: List of AlignedSentence
  • AlignedSentence: Sentence-level alignments with start/end times
    • text: Sentence text
    • start: Start time in seconds
    • end: End time in seconds
    • duration: Between start and end.
    • tokens: List of AlignedToken
  • AlignedToken: Word/token-level alignments with precise timestamps
    • text: Token text
    • start: Start time in seconds
    • end: End time in seconds
    • duration: Between start and end.

Low-Level API

To transcribe log-mel spectrum directly, you can do the following:

import mlx.core as mx
from parakeet_mlx.audio import get_logmel, load_audio

# Load and preprocess audio manually
audio = load_audio("audio.wav", model.preprocessor_config.sample_rate)
mel = get_logmel(audio, model.preprocessor_config)

# Generate transcription with alignments
# Accepts both [batch, sequence, feat] and [sequence, feat]
# `alignments` is list of AlignedResult. (no matter you fed batch dimension or not!)
alignments = model.generate(mel)

Todo

  • Add CLI for better usability
  • Streaming input (Although RTFx is MUCH higher than 1 currently - it should be much sufficient to stream with current state)
  • Compiling for RNNT decoder
  • Add support for other Parakeet varients
  • Remove librosa dependency

Acknowledgments

  • Thanks to Nvidia for training this awesome models and writing cool papers and providing nice implementation.
  • Thanks to MLX project for providing the framework that made this implementation possible.
  • Thanks to audiofile and audresample, numpy, librosa for audio processing.
  • Thanks to dacite for config management.

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parakeet_mlx-0.1.3.tar.gz (19.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parakeet_mlx-0.1.3-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file parakeet_mlx-0.1.3.tar.gz.

File metadata

  • Download URL: parakeet_mlx-0.1.3.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.8

File hashes

Hashes for parakeet_mlx-0.1.3.tar.gz
Algorithm Hash digest
SHA256 0a1a564eef3d320601fa9cb66b1c14eae67dc7bad676d67fb479bbe4275865ce
MD5 91b7948de425cf3d5191a1082ba4698b
BLAKE2b-256 f7d5bb3109f0e9592f1d1b995fe6654d4f71ea9d373a3316849a3781a88572c3

See more details on using hashes here.

File details

Details for the file parakeet_mlx-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for parakeet_mlx-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 27906413d5e965577844e76be0ceb1d74f801c87f932d79ca2041ddd213c61a1
MD5 39d8b9e870b0d4a43f270f53a039bbf0
BLAKE2b-256 e7a2c2433b17da6d9796384f8cc92098f8d40b8c94f7026e08646acdfef26f30

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page