Fully-local YouTube → transcript pipeline using yt-dlp, ffmpeg, and whisper.cpp. No API keys.
Project description
localcaption
Paste a YouTube URL, get a transcript. Fully local, no API keys.
localcaption is a tiny orchestrator over three battle-tested tools:
| Stage | Tool |
|---|---|
| Download bestaudio | yt-dlp |
| Re-encode to 16 kHz mono WAV | ffmpeg |
| Transcribe locally | whisper.cpp |
Nothing is uploaded to a third-party service. No OpenAI / Google / DeepL keys required. Runs happily on a laptop.
Install
Prerequisites
- Python 3.10+
git,ffmpeg,cmakeon your$PATH(macOS:brew install ffmpeg cmake)
Quick install (recommended for end users)
One command. Installs localcaption system-wide via pipx
and bootstraps whisper.cpp + a default model. After this you can run
localcaption <url> from any directory.
curl -fsSL https://raw.githubusercontent.com/jatinkrmalik/localcaption/main/scripts/install.sh | bash
What it does:
- Verifies prerequisites (
python3,git,ffmpeg,cmake) and installspipx+cmakeif missing (viabreworapt). pipx install localcaption— isolated venv, console script on$PATH.- Clones & builds
whisper.cppinto~/.local/share/localcaption/whisper.cpp/(XDG-compliant). - Downloads the default
base.enggml model.
Override the default model with WHISPER_MODEL=small.en bash install.sh.
After install, verify everything is wired up:
localcaption doctor
Sample output:
localcaption 0.1.0
System tools:
✅ python (3.12.3)
✅ ffmpeg (/opt/homebrew/bin/ffmpeg)
✅ git (/opt/homebrew/bin/git)
Python dependencies:
✅ yt-dlp (2025.10.14)
whisper.cpp:
searching: /Users/you/.local/share/localcaption/whisper.cpp
✅ directory exists
✅ binary built (.../build/bin/whisper-cli)
✅ models present (ggml-base.en.bin)
All checks passed. You're good to go: localcaption <url>
Dev install (contributors)
If you're hacking on localcaption itself, install editable from a clone:
git clone https://github.com/jatinkrmalik/localcaption
cd localcaption
./scripts/setup.sh # creates .venv, pip install -e .[dev], clones+builds whisper.cpp HERE
source .venv/bin/activate
pytest # 14 tests, all should pass
The dev setup keeps whisper.cpp/ inside the repo (so you can poke at it),
and editable-installs the package so source edits take effect immediately.
Usage
CLI
localcaption "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
| flag | default | what it does |
|---|---|---|
-m, --model |
base.en |
whisper model name (tiny.en, base.en, small.en, medium.en, large-v3, …) |
-o, --out |
./transcripts |
output directory |
-l, --language |
auto |
ISO language code, or auto to let whisper detect it |
--whisper-dir |
auto-detect¹ | path to a built whisper.cpp checkout |
--keep-audio |
off | keep the downloaded audio + intermediate WAV in <out>/.work/ |
--no-print |
off | don't echo the transcript to stdout |
¹ --whisper-dir resolution order:
- The explicit flag value, if given.
$LOCALCAPTION_WHISPER_DIRenv var../whisper.cpp(dev checkout).~/.local/share/localcaption/whisper.cpp(whereinstall.shputs it).
Outputs <videoId>.txt, .srt, .vtt, and .json in the chosen directory.
You can also invoke it as a module: python -m localcaption <url>.
Subcommands
| Subcommand | What it does |
|---|---|
(default) localcaption <url> |
Transcribe a single URL. |
localcaption doctor |
Diagnose your install: prereqs, whisper.cpp, available models. Useful before filing a bug. |
Python API
from pathlib import Path
from localcaption.pipeline import transcribe_url
result = transcribe_url(
"https://www.youtube.com/watch?v=dQw4w9WgXcQ",
out_dir=Path("transcripts"),
whisper_dir=Path("whisper.cpp"),
model="base.en",
)
print(result.transcripts.txt.read_text())
Architecture
localcaption is intentionally tiny: an orchestrator (pipeline.py) drives
three single-responsibility stages, each wrapping one external tool. The
modules are split this way so that a contributor can swap, say, whisper.cpp
for faster-whisper without touching download.py or audio.py.
Module map
| Layer | Files | Responsibility |
|---|---|---|
| Entry points | cli.py, __main__.py |
argparse, exit codes, stdout formatting |
| Orchestration | pipeline.py |
public Python API: transcribe_url(...) |
| Pipeline stages | download.py, audio.py, whisper.py |
one external tool each |
| Support | errors.py, _logging.py |
exception hierarchy, tiny logger |
Runtime sequence
End-to-end call flow for a single localcaption <url> invocation, including
the subprocess hops to yt-dlp, ffmpeg, and whisper.cpp. The intermediate
.work/ directory is cleaned up at the end unless --keep-audio is passed.
Diagrams live in
docs/diagrams/as Mermaid.mmdsource files alongside the rendered PNGs. Regenerate with:mmdc -i docs/diagrams/<name>.mmd -o docs/diagrams/<name>.png \ -t default -b transparent --width 1600 --scale 2
Benchmarks
Wall-clock times for the complete pipeline (yt-dlp download → ffmpeg
re-encode → whisper.cpp transcription), measured with the default base.en
model. Numbers will vary with your network speed and CPU/GPU; treat them as
order-of-magnitude reference, not a competitive benchmark.
| Video | Length | Wall-clock | Speed vs. realtime | Hardware |
|---|---|---|---|---|
| TED-Ed — How does your immune system work? | 5:23 | 7.5 s | ~43× | MacBook Pro M4 Pro, 48 GB |
| 3Blue1Brown — But what is a Neural Network? | 18:40 | 19.3 s | ~58× | MacBook Pro M4 Pro, 48 GB |
| Hasan Minhaj × Neil deGrasse Tyson — Why AI is Overrated | 54:17 | 49.8 s | ~65× | MacBook Pro M4 Pro, 48 GB |
Reproduce
# Apple Silicon, macOS, whisper.cpp built with Metal,
# model: ggml-base.en, language: auto, no other heavy processes.
time localcaption --no-print -o /tmp/lc-bench-1 \
"https://www.youtube.com/watch?v=PSRJfaAYkW4"
time localcaption --no-print -o /tmp/lc-bench-2 \
"https://www.youtube.com/watch?v=aircAruvnKk"
time localcaption --no-print -o /tmp/lc-bench-3 \
"https://www.youtube.com/watch?v=BYizgB2FcAQ"
If you'd like to contribute numbers from a different machine (Linux + CUDA, Windows + WSL, x86 macOS, etc.), open a PR adding a row above with your hardware in the Hardware column.
Notes
- Bigger models = better quality but slower.
base.enis a good default; trysmall.enif you have the patience andtiny.enfor instant results. - Apple Silicon: whisper.cpp's CMake build uses Metal automatically — you'll
see
ggml_metal_initin the logs. - The pipeline accepts any URL
yt-dlpsupports (Vimeo, Twitch VODs, podcast pages, etc.), not just YouTube. - If you hit
HTTP 403 Forbidden, youryt-dlpis probably stale —pip install -U yt-dlpusually fixes it.
Roadmap
The roadmap lives on GitHub Issues so it's easy to track, comment on, and contribute to:
A snapshot of what's planned (click through for full descriptions, acceptance criteria, and discussion):
| # | Item | Labels |
|---|---|---|
| #1 | Switch default model from base.en to small.en |
good first issue |
| #2 | Batch mode (--batch urls.txt) |
enhancement |
| #3 | Local auto-summary via Ollama (--summary) |
enhancement |
| #4 | Speaker diarization with pyannote.audio (--diarize) |
stretch, help wanted |
| #5 | YouTube chapters & grep-able search index | enhancement |
| #6 | Pluggable transcription backends (faster-whisper / MLX) | help wanted |
Have an idea? Open a feature request — or jump into Discussions if you want to chat about it first.
Related projects
localcaption deliberately stays tiny. If you want more, check out:
whishper— full web UI for local transcription with translation and editing.transcribe-anything— multi-backend, Mac-arm optimised, supports URLs.WhisperX— word-level timestamps and diarisation on top of openai-whisper.
Contributing
Pull requests welcome — see CONTRIBUTING.md. By participating you agree to abide by our Code of Conduct.
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file localcaption-0.1.0.tar.gz.
File metadata
- Download URL: localcaption-0.1.0.tar.gz
- Upload date:
- Size: 703.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3380c2c2044b629d828b41c16d3d5ad1178be8c41fb71da5243b50efa298874
|
|
| MD5 |
301731066c34f8e768941a4e1cb4bb95
|
|
| BLAKE2b-256 |
b630c4d51e9088ba88150b0a955fa16f108f8d3d6e519ca1bd3ceda06e5cd0a2
|
Provenance
The following attestation bundles were made for localcaption-0.1.0.tar.gz:
Publisher:
release.yml on jatinkrmalik/localcaption
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
localcaption-0.1.0.tar.gz -
Subject digest:
c3380c2c2044b629d828b41c16d3d5ad1178be8c41fb71da5243b50efa298874 - Sigstore transparency entry: 1354125058
- Sigstore integration time:
-
Permalink:
jatinkrmalik/localcaption@19b795ab105feaaba39f3d367674d87bb066f15a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/jatinkrmalik
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@19b795ab105feaaba39f3d367674d87bb066f15a -
Trigger Event:
push
-
Statement type:
File details
Details for the file localcaption-0.1.0-py3-none-any.whl.
File metadata
- Download URL: localcaption-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c6ad9d64800642ed542d5086ed0521e78ae1673281747076eb778b27defde04
|
|
| MD5 |
ac86c581d11e97b04be231924a572e02
|
|
| BLAKE2b-256 |
8a3df0e91898b043293099453057e5a53af8d54083c0fd04de7cdbc8320e2f4e
|
Provenance
The following attestation bundles were made for localcaption-0.1.0-py3-none-any.whl:
Publisher:
release.yml on jatinkrmalik/localcaption
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
localcaption-0.1.0-py3-none-any.whl -
Subject digest:
5c6ad9d64800642ed542d5086ed0521e78ae1673281747076eb778b27defde04 - Sigstore transparency entry: 1354125170
- Sigstore integration time:
-
Permalink:
jatinkrmalik/localcaption@19b795ab105feaaba39f3d367674d87bb066f15a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/jatinkrmalik
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@19b795ab105feaaba39f3d367674d87bb066f15a -
Trigger Event:
push
-
Statement type: