Fast local transcription for large lectures with NVIDIA Parakeet ONNX
Project description
fast-transcript
fast-transcript is a local lecture transcription CLI built to beat the usual Apple Silicon tradeoff: either fast but flaky, or accurate but painfully slow.
On the development machine, this project handled 30 minutes in 2!* while staying around 2.51 GB RSS on the long run. In the same local test set, it beat mlx-whisper, insanely-fast-whisper, and parakeet-mlx.
* Benchmark run on a MacBook Pro M1. The exact long-run measurement was 29m47s of Portuguese lecture audio transcribed in about 2m14s (13.38x real-time).
The CLI binary is called fscript:
fscript lecture.mp3
That is the whole point of this project. One command. Large audio. No babysitting.
Why this exists
I wanted a tool for transcribing long classes and lectures quickly on a laptop while still using the computer for normal work.
The existing options I tested had clear problems for this use case:
insanely-fast-whisperwas far too slow on this Mac once it fell back to CPUmlx-whisperwas solid, but slower than I wanted for long lecture workflowsparakeet-mlxhad excellent memory numbers, but drifted into English on longer Portuguese segments unless heavily tuned
fast-transcript packages the ONNX Parakeet path that held up best in practice.
What it does
- downloads the default Parakeet TDT 0.6B v3 int8 model automatically if it is missing
- stores the extracted model in a persistent per-user application data directory
- keeps the downloaded tarball in the user cache directory
- accepts
mp3,wav, and other audio formats supported byffmpeg - accepts remote
http(s)video/audio URLs supported byyt-dlp - prefers platform-provided manual subtitles for remote URLs when available
- falls back to downloading remote audio and transcribing locally when only auto-captions exist or no captions exist
- auto-converts unsupported audio to 16 kHz mono PCM16 WAV
- uses 120s chunks with 2s overlap by default
- writes
<audio>.transcript.jsonnext to the input unless you choose a different output path - stays quiet by default: concise progress in the terminal, transcript JSON on disk
- shows a spinner and chunk progress bar on interactive terminals
Install
Requirements
ffmpegffprobeyt-dlpfor remote URLs, oruvx yt-dlp
Install with Homebrew
brew install brenorb/fast-transcript/fast-transcript
On Apple Silicon macOS, Homebrew now installs fast-transcript from a proper bottle.
On Linux x86_64, the formula still installs from the published release binary.
PyPI / uv
The PyPI package name for this project is fscript so the target UX is:
uvx fscript lecture.mp3
uv tool install fscript
The repo already includes platform wheel builds for:
- macOS arm64
- Linux x86_64
PyPI publishing is currently enabled for:
- macOS arm64
See docs/pypi-publishing.md for the release workflow details.
Install a prebuilt binary directly
Download the archive for your platform from the GitHub Releases page, then put fscript on your PATH.
Build from source
cargo install --git https://github.com/brenorb/fast-transcript
Or from a local clone:
cargo install --path .
Quick start
fscript lecture.mp3
fscript https://www.youtube.com/watch?v=QSdh8Gj0mEg
This will:
- ensure the default model exists
- normalize the audio if needed
- transcribe with the default chunking strategy
- write
lecture.transcript.json
For remote URLs, the default flow is:
- inspect the URL with
yt-dlp - use manual subtitles directly when the platform provides them
- otherwise download the remote audio and run the normal local transcription pipeline
Usage
fscript <audio-or-url> [output.json]
fscript <audio-or-url> --stdout
fscript <audio-or-url> -
fscript --version
Optional overrides:
fscript lecture.wav custom-output.json
fscript lecture.wav --stdout
fscript lecture.wav --chunk-seconds 180 --chunk-overlap-seconds 3
fscript lecture.wav --chunk-seconds 0
fscript lecture.wav --model-dir ./models/parakeet/custom-copy
fscript lecture.wav --model-package ./models/parakeet-v3-int8.tar.gz
fscript lecture.wav --model-url https://example.com/parakeet-v3-int8.tar.gz
fscript https://www.youtube.com/watch?v=QSdh8Gj0mEg
fscript https://www.youtube.com/watch?v=QSdh8Gj0mEg --prefer-local-for-remote
Environment overrides:
FSCRIPT_MODEL_DIRFSCRIPT_MODEL_PACKAGEFSCRIPT_MODEL_URL
Defaults
- model dir:
- macOS:
~/Library/Application Support/fast-transcript/models/parakeet-tdt-0.6b-v3-int8 - Linux:
~/.local/share/fast-transcript/models/parakeet-tdt-0.6b-v3-int8
- macOS:
- model package cache:
- macOS:
~/Library/Caches/fast-transcript/parakeet-v3-int8.tar.gz - Linux:
~/.cache/fast-transcript/parakeet-v3-int8.tar.gz
- macOS:
- model URL:
https://huggingface.co/brenorb/parakeet-tdt-0.6b-v3-int8-onnx-bundle/resolve/main/parakeet-v3-int8.tar.gz?download=1 - chunk seconds:
120 - chunk overlap seconds:
2 - output path:
<audio>.transcript.json
Benchmarks
These are local development benchmarks, not universal claims. They were run on the same Apple Silicon Mac used during development, using a Portuguese lecture clip and the same broader workflow comparison.
2-minute lecture clip
| Engine | Setup | Speed | Peak RSS | Notes |
|---|---|---|---|---|
| fast-transcript | Parakeet ONNX | 13.06x real-time | 2.25 GB | Best balance of speed and reliability |
mlx-whisper |
whisper-large-v3-turbo |
5.25x |
1.70 GB |
Good quality, slower |
parakeet-mlx |
tuned for quality | 4.92x |
1.29 GB |
Needed substantial tuning |
parakeet-mlx |
raw greedy | 10.16x |
0.57 GB |
Faster on short audio, drifted into English on longer PT-BR |
insanely-fast-whisper |
whisper-large-v3 CPU |
0.30x |
6.18 GB |
Accurate, but too slow here |
insanely-fast-whisper |
MPS + fallback | 0.31x |
3.04 GB |
Small gain, same general problem |
Long lecture run
| Engine | Audio | Speed | Peak RSS | Notes |
|---|---|---|---|---|
| fast-transcript | 29m47s lecture |
13.38x real-time | 2.51 GB | Stable long run with default chunking |
Practical reading
fast-transcriptwas not the absolute fastest thing we saw in every synthetic case- it was the best result once long Portuguese lecture audio, transcript quality, and unattended runs all mattered at the same time
- that is the target workload for this repo
Output format
The output is JSON and includes:
- merged transcript text
- model path
- original input path
- prepared WAV path
- whether a remote URL used manual subtitles or the local model
- whether
ffmpegnormalization was used - load time
- transcribe time
- chunk configuration
- per-chunk timing
Motivation
This project is optimized for large lectures and classes, including files in the 30-minute to 2-hour range, where:
- startup friction matters
- background CPU usage matters
- memory spikes matter
- brittle hand-tuned command lines become a tax
The design goal is not “highest benchmark on a cherry-picked GPU server”. The goal is “transcribe big local lecture audio fast enough that you actually keep using it”.
Inspiration
This project was heavily informed by:
In particular, the ONNX Parakeet path here was shaped by the packaging and implementation ideas used in Handy and GLaDOS.
Default model bundle
The default auto-download bundle is published in our own Hugging Face model repository:
This keeps the default install path tied to the exact validated tarball instead of an app-specific blob host.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fscript-0.2.8-py3-none-macosx_10_13_universal2.whl.
File metadata
- Download URL: fscript-0.2.8-py3-none-macosx_10_13_universal2.whl
- Upload date:
- Size: 9.7 MB
- Tags: Python 3, macOS 10.13+ universal2 (ARM64, x86-64)
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8fe421e8df26f8ddf0b01b72784763b7541fed72a062ac007a98d936a6d97e34
|
|
| MD5 |
fbe354dc295e1da6f1c8760f7cf72fa6
|
|
| BLAKE2b-256 |
fc2776cbf8bd201c9a450d4fe4acaf896c5abc5e7fcd7e40c3b0db0a86f275ee
|
Provenance
The following attestation bundles were made for fscript-0.2.8-py3-none-macosx_10_13_universal2.whl:
Publisher:
release.yml on brenorb/fast-transcript
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fscript-0.2.8-py3-none-macosx_10_13_universal2.whl -
Subject digest:
8fe421e8df26f8ddf0b01b72784763b7541fed72a062ac007a98d936a6d97e34 - Sigstore transparency entry: 1440048972
- Sigstore integration time:
-
Permalink:
brenorb/fast-transcript@9af884ce91c86dd429d82f478c19925d5dc1453f -
Branch / Tag:
refs/tags/v0.2.8 - Owner: https://github.com/brenorb
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9af884ce91c86dd429d82f478c19925d5dc1453f -
Trigger Event:
push
-
Statement type: