Skip to main content

Fully-local YouTube → transcript pipeline using yt-dlp, ffmpeg, and whisper.cpp. No API keys.

Project description

localcaption

Paste a YouTube URL, get a transcript. Fully local, no API keys.

PyPI version Python versions PyPI downloads License: MIT

CI Release Code style: ruff Hatch project

GitHub release GitHub stars Open issues Last commit PRs welcome

localcaption is a tiny orchestrator over three battle-tested tools:

Stage Tool
Download bestaudio yt-dlp
Re-encode to 16 kHz mono WAV ffmpeg
Transcribe locally whisper.cpp

Nothing is uploaded to a third-party service. No OpenAI / Google / DeepL keys required. Runs happily on a laptop.

Pipeline overview

Install

Prerequisites

  • Python 3.10+
  • git, ffmpeg, cmake on your $PATH (macOS: brew install ffmpeg cmake)

Quick install (recommended for end users)

One command. Installs localcaption system-wide via pipx and bootstraps whisper.cpp + a default model. After this you can run localcaption <url> from any directory.

curl -fsSL https://raw.githubusercontent.com/jatinkrmalik/localcaption/main/scripts/install.sh | bash

What it does:

  1. Verifies prerequisites (python3, git, ffmpeg, cmake) and installs pipx + cmake if missing (via brew or apt).
  2. pipx install localcaption — isolated venv, console script on $PATH.
  3. Clones & builds whisper.cpp into ~/.local/share/localcaption/whisper.cpp/ (XDG-compliant).
  4. Downloads the default base.en ggml model.

Override the default model with WHISPER_MODEL=small.en bash install.sh.

After install, verify everything is wired up:

localcaption doctor

Sample output:

localcaption 0.1.0

System tools:
  ✅ python  (3.12.3)
  ✅ ffmpeg  (/opt/homebrew/bin/ffmpeg)
  ✅ git     (/opt/homebrew/bin/git)

Python dependencies:
  ✅ yt-dlp  (2025.10.14)

whisper.cpp:
  searching: /Users/you/.local/share/localcaption/whisper.cpp
  ✅ directory exists
  ✅ binary built  (.../build/bin/whisper-cli)
  ✅ models present  (ggml-base.en.bin)

All checks passed. You're good to go: localcaption <url>

Dev install (contributors)

If you're hacking on localcaption itself, install editable from a clone:

git clone https://github.com/jatinkrmalik/localcaption
cd localcaption
./scripts/setup.sh           # creates .venv, pip install -e .[dev], clones+builds whisper.cpp HERE
source .venv/bin/activate
pytest                        # 14 tests, all should pass

The dev setup keeps whisper.cpp/ inside the repo (so you can poke at it), and editable-installs the package so source edits take effect immediately.

Usage

CLI

localcaption "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
flag default what it does
-m, --model base.en whisper model name (tiny.en, base.en, small.en, medium.en, large-v3, …)
-o, --out ./transcripts output directory
-l, --language auto ISO language code, or auto to let whisper detect it
--whisper-dir auto-detect¹ path to a built whisper.cpp checkout
--keep-audio off keep the downloaded audio + intermediate WAV in <out>/.work/
--no-print off don't echo the transcript to stdout

¹ --whisper-dir resolution order:

  1. The explicit flag value, if given.
  2. $LOCALCAPTION_WHISPER_DIR env var.
  3. ./whisper.cpp (dev checkout).
  4. ~/.local/share/localcaption/whisper.cpp (where install.sh puts it).

Outputs <videoId>.txt, .srt, .vtt, and .json in the chosen directory.

You can also invoke it as a module: python -m localcaption <url>.

Subcommands

Subcommand What it does
(default) localcaption <url> Transcribe a single URL.
localcaption doctor Diagnose your install: prereqs, whisper.cpp, available models. Useful before filing a bug.

Python API

from pathlib import Path
from localcaption.pipeline import transcribe_url

result = transcribe_url(
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    out_dir=Path("transcripts"),
    whisper_dir=Path("whisper.cpp"),
    model="base.en",
)
print(result.transcripts.txt.read_text())

Architecture

localcaption is intentionally tiny: an orchestrator (pipeline.py) drives three single-responsibility stages, each wrapping one external tool. The modules are split this way so that a contributor can swap, say, whisper.cpp for faster-whisper without touching download.py or audio.py.

Module map

Module architecture

Layer Files Responsibility
Entry points cli.py, __main__.py argparse, exit codes, stdout formatting
Orchestration pipeline.py public Python API: transcribe_url(...)
Pipeline stages download.py, audio.py, whisper.py one external tool each
Support errors.py, _logging.py exception hierarchy, tiny logger

Runtime sequence

End-to-end call flow for a single localcaption <url> invocation, including the subprocess hops to yt-dlp, ffmpeg, and whisper.cpp. The intermediate .work/ directory is cleaned up at the end unless --keep-audio is passed.

Sequence diagram

Diagrams live in docs/diagrams/ as Mermaid .mmd source files alongside the rendered PNGs. Regenerate with:

mmdc -i docs/diagrams/<name>.mmd -o docs/diagrams/<name>.png \
  -t default -b transparent --width 1600 --scale 2

Benchmarks

Wall-clock times for the complete pipeline (yt-dlp download → ffmpeg re-encode → whisper.cpp transcription), measured with the default base.en model. Numbers will vary with your network speed and CPU/GPU; treat them as order-of-magnitude reference, not a competitive benchmark.

Video Length Wall-clock Speed vs. realtime Hardware
TED-Ed — How does your immune system work? 5:23 7.5 s ~43× MacBook Pro M4 Pro, 48 GB
3Blue1Brown — But what is a Neural Network? 18:40 19.3 s ~58× MacBook Pro M4 Pro, 48 GB
Hasan Minhaj × Neil deGrasse Tyson — Why AI is Overrated 54:17 49.8 s ~65× MacBook Pro M4 Pro, 48 GB
Reproduce
# Apple Silicon, macOS, whisper.cpp built with Metal,
# model: ggml-base.en, language: auto, no other heavy processes.

time localcaption --no-print -o /tmp/lc-bench-1 \
  "https://www.youtube.com/watch?v=PSRJfaAYkW4"

time localcaption --no-print -o /tmp/lc-bench-2 \
  "https://www.youtube.com/watch?v=aircAruvnKk"

time localcaption --no-print -o /tmp/lc-bench-3 \
  "https://www.youtube.com/watch?v=BYizgB2FcAQ"

If you'd like to contribute numbers from a different machine (Linux + CUDA, Windows + WSL, x86 macOS, etc.), open a PR adding a row above with your hardware in the Hardware column.

Notes

  • Bigger models = better quality but slower. base.en is a good default; try small.en if you have the patience and tiny.en for instant results.
  • Apple Silicon: whisper.cpp's CMake build uses Metal automatically — you'll see ggml_metal_init in the logs.
  • The pipeline accepts any URL yt-dlp supports (Vimeo, Twitch VODs, podcast pages, etc.), not just YouTube.
  • If you hit HTTP 403 Forbidden, your yt-dlp is probably stale — pip install -U yt-dlp usually fixes it.

Roadmap

The roadmap lives on GitHub Issues so it's easy to track, comment on, and contribute to:

👉 Open roadmap items

A snapshot of what's planned (click through for full descriptions, acceptance criteria, and discussion):

# Item Labels
#1 Switch default model from base.en to small.en good first issue
#2 Batch mode (--batch urls.txt) enhancement
#3 Local auto-summary via Ollama (--summary) enhancement
#4 Speaker diarization with pyannote.audio (--diarize) stretch, help wanted
#5 YouTube chapters & grep-able search index enhancement
#6 Pluggable transcription backends (faster-whisper / MLX) help wanted

Have an idea? Open a feature request — or jump into Discussions if you want to chat about it first.

Related projects

localcaption deliberately stays tiny. If you want more, check out:

  • whishper — full web UI for local transcription with translation and editing.
  • transcribe-anything — multi-backend, Mac-arm optimised, supports URLs.
  • WhisperX — word-level timestamps and diarisation on top of openai-whisper.

Contributing

Pull requests welcome — see CONTRIBUTING.md. By participating you agree to abide by our Code of Conduct.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

localcaption-0.1.0.tar.gz (703.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

localcaption-0.1.0-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file localcaption-0.1.0.tar.gz.

File metadata

  • Download URL: localcaption-0.1.0.tar.gz
  • Upload date:
  • Size: 703.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for localcaption-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c3380c2c2044b629d828b41c16d3d5ad1178be8c41fb71da5243b50efa298874
MD5 301731066c34f8e768941a4e1cb4bb95
BLAKE2b-256 b630c4d51e9088ba88150b0a955fa16f108f8d3d6e519ca1bd3ceda06e5cd0a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for localcaption-0.1.0.tar.gz:

Publisher: release.yml on jatinkrmalik/localcaption

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file localcaption-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: localcaption-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for localcaption-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5c6ad9d64800642ed542d5086ed0521e78ae1673281747076eb778b27defde04
MD5 ac86c581d11e97b04be231924a572e02
BLAKE2b-256 8a3df0e91898b043293099453057e5a53af8d54083c0fd04de7cdbc8320e2f4e

See more details on using hashes here.

Provenance

The following attestation bundles were made for localcaption-0.1.0-py3-none-any.whl:

Publisher: release.yml on jatinkrmalik/localcaption

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page