An implementation of the Nvidia's Parakeet models for Apple Silicon using MLX.
Project description
Parakeet MLX
An implementation of the Parakeet models - Nvidia's ASR(Automatic Speech Recognition) models - for Apple Silicon using MLX.
Installation
Using uv - recommended way:
uv add parakeet-mlx -U
Or, for the CLI:
uv tool install parakeet-mlx -U
Using pip:
pip install parakeet-mlx -U
CLI Quick Start
parakeet-mlx <audio_files> [OPTIONS]
Arguments
audio_files: One or more audio files to transcribe (WAV, MP3, etc.)
Options
-
--model(default:mlx-community/parakeet-tdt-0.6b-v2)- Hugging Face repository of the model to use
-
--output-dir(default: current directory)- Directory to save transcription outputs
-
--output-format(default: srt)- Output format (txt/srt/vtt/json/all)
-
--output-template(default:{filename})- Template for output filenames,
{filename},{index},{date}is supported.
- Template for output filenames,
-
--highlight-words(default: False)- Enable word-level timestamps in SRT/VTT outputs
-
--verbose/-v(default: False)- Print detailed progress information
-
--fp32/--bf16(default:bf16)- Determinate the precision to use
Examples
# Basic transcription
parakeet-mlx audio.mp3
# Multiple files with word-level timestamps of VTT subtitle
parakeet-mlx *.mp3 --output-format vtt --highlight-words
# Generate all output formats
parakeet-mlx audio.mp3 --output-format all
Python API Quick Start
Transcribe a file:
from parakeet_mlx import from_pretrained
model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v2")
result = model.transcribe("audio_file.wav")
print(result.text)
Check timestamps:
from parakeet_mlx import from_pretrained
model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v2")
result = model.transcribe("audio_file.wav")
print(result.sentences)
# [AlignedSentence(text="Hello World.", start=1.01, end=2.04, duration=1.03, tokens=[...])]
Timestamp Result
AlignedResult: Top-level result containing the full text and sentencestext: Full transcribed textsentences: List ofAlignedSentence
AlignedSentence: Sentence-level alignments with start/end timestext: Sentence textstart: Start time in secondsend: End time in secondsduration: Betweenstartandend.tokens: List ofAlignedToken
AlignedToken: Word/token-level alignments with precise timestampstext: Token textstart: Start time in secondsend: End time in secondsduration: Betweenstartandend.
Low-Level API
To transcribe log-mel spectrum directly, you can do the following:
import mlx.core as mx
from parakeet_mlx.audio import get_logmel, load_audio
# Load and preprocess audio manually
audio = load_audio("audio.wav", model.preprocessor_config.sample_rate)
mel = get_logmel(audio, model.preprocessor_config)
# Generate transcription with alignments
# Accepts both [batch, sequence, feat] and [sequence, feat]
# `alignments` is list of AlignedResult. (no matter you fed batch dimension or not!)
alignments = model.generate(mel)
Todo
- Add CLI for better usability
- Streaming input (Although RTFx is MUCH higher than 1 currently - it should be much sufficient to stream with current state)
- Compiling for RNNT decoder
- Add support for other Parakeet varients
- Remove librosa dependency
Acknowledgments
- Thanks to Nvidia for training this awesome models and writing cool papers and providing nice implementation.
- Thanks to MLX project for providing the framework that made this implementation possible.
- Thanks to audiofile and audresample, numpy, librosa for audio processing.
- Thanks to dacite for config management.
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file parakeet_mlx-0.2.2.tar.gz.
File metadata
- Download URL: parakeet_mlx-0.2.2.tar.gz
- Upload date:
- Size: 21.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3dee48b67f5790eca437ab579e1f961d0a9f4eeb80cea5cab233bae34220435c
|
|
| MD5 |
1f743188ecd8297779e1d6bbc0f689df
|
|
| BLAKE2b-256 |
5ac640ac89a73439c1d2ea5a24a0bc2668091a37d94d4c3d684a438d961805fb
|
File details
Details for the file parakeet_mlx-0.2.2-py3-none-any.whl.
File metadata
- Download URL: parakeet_mlx-0.2.2-py3-none-any.whl
- Upload date:
- Size: 22.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e8d263dc48f456a1ae1bafb9e0583ccd94abb56895667230faa2975b010ad1d
|
|
| MD5 |
8b2fdb812bbcf930ed18e9a617f6656b
|
|
| BLAKE2b-256 |
e893f38b0e835860af38d538c10020eb3e97668ea4e96c52f74dbe52d19676e6
|