Skip to main content

Fast local transcription for large lectures with NVIDIA Parakeet ONNX

Project description

fast-transcript

fast-transcript is a local lecture transcription CLI built to beat the usual Apple Silicon tradeoff: either fast but flaky, or accurate but painfully slow.

On the development machine, this project handled 30 minutes in 2!* while staying around 2.51 GB RSS on the long run. In the same local test set, it beat mlx-whisper, insanely-fast-whisper, and parakeet-mlx.

* Benchmark run on a MacBook Pro M1. The exact long-run measurement was 29m47s of Portuguese lecture audio transcribed in about 2m14s (13.38x real-time).

The CLI binary is called fscript:

fscript lecture.mp3

That is the whole point of this project. One command. Large audio. No babysitting.

Why this exists

I wanted a tool for transcribing long classes and lectures quickly on a laptop while still using the computer for normal work.

The existing options I tested had clear problems for this use case:

  • insanely-fast-whisper was far too slow on this Mac once it fell back to CPU
  • mlx-whisper was solid, but slower than I wanted for long lecture workflows
  • parakeet-mlx had excellent memory numbers, but drifted into English on longer Portuguese segments unless heavily tuned

fast-transcript packages the ONNX Parakeet path that held up best in practice.

What it does

  • downloads the default Parakeet TDT 0.6B v3 int8 model automatically if it is missing
  • stores the extracted model in a persistent per-user application data directory
  • keeps the downloaded tarball in the user cache directory
  • accepts mp3, wav, and other audio formats supported by ffmpeg
  • accepts remote http(s) video/audio URLs supported by yt-dlp
  • prefers platform-provided manual subtitles for remote URLs when available
  • falls back to downloading remote audio and transcribing locally when only auto-captions exist or no captions exist
  • auto-converts unsupported audio to 16 kHz mono PCM16 WAV
  • uses 120s chunks with 2s overlap by default
  • writes <audio>.transcript.json next to the input unless you choose a different output path
  • stays quiet by default: concise progress in the terminal, transcript JSON on disk
  • shows a spinner and chunk progress bar on interactive terminals

Install

Requirements

  • ffmpeg
  • ffprobe
  • yt-dlp for remote URLs, or uvx yt-dlp

Install with Homebrew

brew install brenorb/fast-transcript/fast-transcript

On Apple Silicon macOS, Homebrew now installs fast-transcript from a proper bottle. On Linux x86_64, the formula still installs from the published release binary.

PyPI / uv

The PyPI package name for this project is fscript so the target UX is:

uvx fscript lecture.mp3
uv tool install fscript

The repo already includes platform wheel builds for:

  • macOS arm64
  • Linux x86_64

PyPI publishing is currently enabled for:

  • macOS arm64

See docs/pypi-publishing.md for the release workflow details.

Install a prebuilt binary directly

Download the archive for your platform from the GitHub Releases page, then put fscript on your PATH.

Build from source

cargo install --git https://github.com/brenorb/fast-transcript

Or from a local clone:

cargo install --path .

Quick start

fscript lecture.mp3
fscript https://www.youtube.com/watch?v=QSdh8Gj0mEg

This will:

  1. ensure the default model exists
  2. normalize the audio if needed
  3. transcribe with the default chunking strategy
  4. write lecture.transcript.json

For remote URLs, the default flow is:

  1. inspect the URL with yt-dlp
  2. use manual subtitles directly when the platform provides them
  3. otherwise download the remote audio and run the normal local transcription pipeline

Usage

fscript <audio-or-url> [output.json]
fscript <audio-or-url> --stdout
fscript <audio-or-url> -
fscript --version

Optional overrides:

fscript lecture.wav custom-output.json
fscript lecture.wav --stdout
fscript lecture.wav --chunk-seconds 180 --chunk-overlap-seconds 3
fscript lecture.wav --chunk-seconds 0
fscript lecture.wav --model-dir ./models/parakeet/custom-copy
fscript lecture.wav --model-package ./models/parakeet-v3-int8.tar.gz
fscript lecture.wav --model-url https://example.com/parakeet-v3-int8.tar.gz
fscript https://www.youtube.com/watch?v=QSdh8Gj0mEg
fscript https://www.youtube.com/watch?v=QSdh8Gj0mEg --prefer-local-for-remote

Environment overrides:

  • FSCRIPT_MODEL_DIR
  • FSCRIPT_MODEL_PACKAGE
  • FSCRIPT_MODEL_URL

Defaults

  • model dir:
    • macOS: ~/Library/Application Support/fast-transcript/models/parakeet-tdt-0.6b-v3-int8
    • Linux: ~/.local/share/fast-transcript/models/parakeet-tdt-0.6b-v3-int8
  • model package cache:
    • macOS: ~/Library/Caches/fast-transcript/parakeet-v3-int8.tar.gz
    • Linux: ~/.cache/fast-transcript/parakeet-v3-int8.tar.gz
  • model URL: https://huggingface.co/brenorb/parakeet-tdt-0.6b-v3-int8-onnx-bundle/resolve/main/parakeet-v3-int8.tar.gz?download=1
  • chunk seconds: 120
  • chunk overlap seconds: 2
  • output path: <audio>.transcript.json

Benchmarks

These are local development benchmarks, not universal claims. They were run on the same Apple Silicon Mac used during development, using a Portuguese lecture clip and the same broader workflow comparison.

2-minute lecture clip

Engine Setup Speed Peak RSS Notes
fast-transcript Parakeet ONNX 13.06x real-time 2.25 GB Best balance of speed and reliability
mlx-whisper whisper-large-v3-turbo 5.25x 1.70 GB Good quality, slower
parakeet-mlx tuned for quality 4.92x 1.29 GB Needed substantial tuning
parakeet-mlx raw greedy 10.16x 0.57 GB Faster on short audio, drifted into English on longer PT-BR
insanely-fast-whisper whisper-large-v3 CPU 0.30x 6.18 GB Accurate, but too slow here
insanely-fast-whisper MPS + fallback 0.31x 3.04 GB Small gain, same general problem

Long lecture run

Engine Audio Speed Peak RSS Notes
fast-transcript 29m47s lecture 13.38x real-time 2.51 GB Stable long run with default chunking

Practical reading

  • fast-transcript was not the absolute fastest thing we saw in every synthetic case
  • it was the best result once long Portuguese lecture audio, transcript quality, and unattended runs all mattered at the same time
  • that is the target workload for this repo

Output format

The output is JSON and includes:

  • merged transcript text
  • model path
  • original input path
  • prepared WAV path
  • whether a remote URL used manual subtitles or the local model
  • whether ffmpeg normalization was used
  • load time
  • transcribe time
  • chunk configuration
  • per-chunk timing

Motivation

This project is optimized for large lectures and classes, including files in the 30-minute to 2-hour range, where:

  • startup friction matters
  • background CPU usage matters
  • memory spikes matter
  • brittle hand-tuned command lines become a tax

The design goal is not “highest benchmark on a cherry-picked GPU server”. The goal is “transcribe big local lecture audio fast enough that you actually keep using it”.

Inspiration

This project was heavily informed by:

In particular, the ONNX Parakeet path here was shaped by the packaging and implementation ideas used in Handy and GLaDOS.

Default model bundle

The default auto-download bundle is published in our own Hugging Face model repository:

This keeps the default install path tied to the exact validated tarball instead of an app-specific blob host.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fscript-0.2.8-py3-none-macosx_10_13_universal2.whl (9.7 MB view details)

Uploaded Python 3macOS 10.13+ universal2 (ARM64, x86-64)

File details

Details for the file fscript-0.2.8-py3-none-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for fscript-0.2.8-py3-none-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 8fe421e8df26f8ddf0b01b72784763b7541fed72a062ac007a98d936a6d97e34
MD5 fbe354dc295e1da6f1c8760f7cf72fa6
BLAKE2b-256 fc2776cbf8bd201c9a450d4fe4acaf896c5abc5e7fcd7e40c3b0db0a86f275ee

See more details on using hashes here.

Provenance

The following attestation bundles were made for fscript-0.2.8-py3-none-macosx_10_13_universal2.whl:

Publisher: release.yml on brenorb/fast-transcript

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page