Fully-local video → transcript pipeline using yt-dlp, ffmpeg, and whisper.cpp. Supports YouTube, Vimeo, Twitch, and 1000+ sites. No API keys.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jatinkrmalik

These details have not been verified by PyPI

Project description

localcaption

Paste a video URL, get a transcript. Fully local, no API keys.

Works with YouTube, Vimeo, Twitch, Twitter/X, and 1000+ other sites via yt-dlp.

localcaption is a tiny orchestrator over three battle-tested tools:

Stage	Tool
Download best audio	`yt-dlp` (YouTube, Vimeo, Twitch, 1000+ sites)
Re-encode to 16 kHz mono WAV	`ffmpeg`
Transcribe locally	`whisper.cpp`

Nothing is uploaded to a third-party service. No OpenAI / Google / DeepL keys required. Runs happily on a laptop.

Pipeline overview

Install

Prerequisites

Python 3.10+
git, ffmpeg, cmake on your $PATH (macOS: brew install ffmpeg cmake)

Recommended: pipx (one line)

The most Pythonic install. pipx creates an isolated virtualenv for localcaption and drops the console script on your $PATH, so you can run localcaption <url> from anywhere without polluting your system Python.

pipx install localcaption

The first time you run localcaption <url> it will tell you it can't find whisper.cpp. The fastest way to set it up is to let localcaption do it itself: clone, build, and download the default model in one shot:

localcaption doctor --fix          # ~2 min on an M-series Mac

doctor --fix is idempotent and end-to-end: it installs missing system tools (ffmpeg/cmake via brew/apt), clones + builds whisper.cpp at the canonical XDG location, downloads the default model, and re-runs the diagnostics to confirm everything works. Pick a different model with --model small.en.

Prefer to do it yourself? Two equivalent options:

# Option A: bootstrap script (also installs pipx + the localcaption package):
curl -fsSL https://raw.githubusercontent.com/jatinkrmalik/localcaption/main/scripts/install.sh | bash

# Option B: DIY, anywhere you like:
git clone https://github.com/ggerganov/whisper.cpp /path/to/whisper.cpp
cd /path/to/whisper.cpp && cmake -B build && cmake --build build -j --config Release
bash models/download-ggml-model.sh base.en
export LOCALCAPTION_WHISPER_DIR=/path/to/whisper.cpp   # add to your shell rc

💡 The install.sh bootstrap is just pipx install localcaption followed by localcaption doctor --fix, same logic, single source of truth. Override the default model with WHISPER_MODEL=small.en bash install.sh.

After install, verify everything is wired up:

localcaption doctor                # read-only diagnostic
localcaption doctor --fix          # diagnostic + auto-repair anything missing

Uninstall

To completely remove localcaption and everything it installed (the binary, whisper.cpp build, and ggml models (about 200 MB total):

# pipx + whisper.cpp + models, with confirmation prompts:
curl -fsSL https://raw.githubusercontent.com/jatinkrmalik/localcaption/main/scripts/uninstall.sh | bash

# Or, if you cloned the repo:
bash scripts/uninstall.sh

Useful flags: --dry-run (preview), --yes (skip prompts), --keep-models (uninstall the binary but keep the 200 MB whisper.cpp + models cache for next time).

Sample output:

localcaption 0.2.0

System tools:
  ✅ python  (3.12.3)
  ✅ ffmpeg  (/opt/homebrew/bin/ffmpeg)
  ✅ cmake   (/opt/homebrew/bin/cmake)
  ✅ git     (/opt/homebrew/bin/git)

Python dependencies:
  ✅ yt-dlp  (2025.10.14)

whisper.cpp:
  searching: /Users/you/.local/share/localcaption/whisper.cpp
  ✅ directory exists
  ✅ binary built  (.../build/bin/whisper-cli)
  ✅ models present  (ggml-base.en.bin)

All checks passed. You're good to go: localcaption <url>

If anything is missing, re-run with --fix and localcaption will install the missing system deps (via brew/apt), clone+build whisper.cpp, and download the default model, then re-verify:

localcaption doctor --fix                      # repair everything
localcaption doctor --fix --model small.en     # …with a specific model

Dev install (contributors)

If you're hacking on localcaption itself, install editable from a clone:

git clone https://github.com/jatinkrmalik/localcaption
cd localcaption
./scripts/setup.sh           # creates .venv, pip install -e .[dev], clones+builds whisper.cpp HERE
source .venv/bin/activate
pytest                        # 14 tests, all should pass

The dev setup keeps whisper.cpp/ inside the repo (so you can poke at it), and editable-installs the package so source edits take effect immediately.

Usage

CLI

# YouTube
localcaption "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

# Vimeo, Twitch, Twitter/X, and 1000+ other sites work too
localcaption "https://vimeo.com/148751763"

flag	default	what it does
`-m`, `--model`	`base.en`	whisper model name (`tiny.en`, `base.en`, `small.en`, `medium.en`, `large-v3`, …)
`-o`, `--out`	`./transcripts`	output directory
`-l`, `--language`	`auto`	ISO language code, or `auto` to let whisper detect it
`--whisper-dir`	auto-detect¹	path to a built whisper.cpp checkout
`--keep-audio`	off	keep the downloaded audio + intermediate WAV in `<out>/.work/`
`--no-print`	off	don't echo the transcript to stdout

¹ --whisper-dir resolution order:

The explicit flag value, if given.
$LOCALCAPTION_WHISPER_DIR env var.
./whisper.cpp (dev checkout).
~/.local/share/localcaption/whisper.cpp (where install.sh puts it).

Outputs <videoId>.txt, .srt, .vtt, and .json in the chosen directory.

You can also invoke it as a module: python -m localcaption <url>.

Subcommands

Subcommand	What it does
(default) `localcaption <url>`	Transcribe a single URL.
`localcaption doctor`	Read-only diagnostic: prereqs, whisper.cpp, available models. Useful before filing a bug.
`localcaption doctor --fix`	Self-heal: install missing system deps, clone+build whisper.cpp, download the default model, then re-verify. Idempotent.
`localcaption model list`	List every supported whisper model with size + install status.
`localcaption model info <name>`	Show metadata about a single model.
`localcaption model download <name>`	Download a model with progress bar + atomic writes.
`localcaption model rm <name>`	Remove an installed model to free disk space.

Managing models

localcaption ships with a default base.en model (~142 MB). For better quality or non-English audio, switch models with --model <name>. If the model isn't already installed, you'll be prompted to download it:

$ localcaption --model small.en "https://www.youtube.com/watch?v=..."

Model 'small.en' is not installed (~466 MB).
  Download it now? [Y/n] y
  small.en       [████████████████████░░░░░░░░░░░░░░░░] 290.0/466.0 MB · 18.4 MB/s · ETA 9s

Or download/manage models explicitly:

localcaption model list                  # see what's available
localcaption model info small.en         # check size before committing
localcaption model download small.en     # ~466 MB, ~25 sec on a fast connection
localcaption model rm large-v3           # free 3 GB after experimenting

For scripted/CI use, pass --auto-download to skip the prompt:

localcaption --model small.en --auto-download "https://www.youtube.com/..."

Quick model picker:

Model	Size	Best for
`tiny.en`	75 MB	Quick drafts, English only, low-resource environments
`base.en`	142 MB	Current install default, fast & decent
`small.en`	466 MB	Recommended for English, great accuracy/speed balance
`medium.en`	1.5 GB	High accuracy English, ~3× slower than `small.en`
`large-v3`	3.0 GB	Best accuracy, multilingual, slow
`large-v3-turbo`	1.6 GB	Near-large quality at ~half the size, great compromise

Models without the .en suffix are multilingual (required for non-English audio).

Python API

from pathlib import Path
from localcaption.pipeline import transcribe_url

result = transcribe_url(
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    out_dir=Path("transcripts"),
    whisper_dir=Path("whisper.cpp"),
    model="base.en",
)
print(result.transcripts.txt.read_text())

Architecture

localcaption is intentionally tiny: an orchestrator (pipeline.py) drives three single-responsibility stages, each wrapping one external tool. The modules are split this way so that a contributor can swap, say, whisper.cpp for faster-whisper without touching download.py or audio.py.

Module map

Module architecture

Layer	Files	Responsibility
Entry points	`cli.py`, `__main__.py`	argparse, exit codes, stdout formatting
Orchestration	`pipeline.py`	public Python API: `transcribe_url(...)`
Pipeline stages	`download.py`, `audio.py`, `whisper.py`	one external tool each
Support	`errors.py`, `_logging.py`	exception hierarchy, tiny logger

Runtime sequence

End-to-end call flow for a single localcaption <url> invocation, including the subprocess hops to yt-dlp, ffmpeg, and whisper.cpp. The intermediate .work/ directory is cleaned up at the end unless --keep-audio is passed.

Sequence diagram

Diagrams live in docs/diagrams/ as Mermaid .mmd source files alongside the rendered PNGs. Regenerate with:
mmdc -i docs/diagrams/<name>.mmd -o docs/diagrams/<name>.png \
  -t default -b transparent --width 1600 --scale 2

Benchmarks

Wall-clock times for the complete pipeline (yt-dlp download → ffmpeg re-encode → whisper.cpp transcription), measured with the default base.en model. Numbers will vary with your network speed and CPU/GPU; treat them as order-of-magnitude reference, not a competitive benchmark.

Video	Length	Wall-clock	Speed vs. realtime	Hardware
TED-Ed: How does your immune system work?	5:23	7.5 s	~43×	MacBook Pro M4 Pro, 48 GB
3Blue1Brown: But what is a Neural Network?	18:40	19.3 s	~58×	MacBook Pro M4 Pro, 48 GB
Hasan Minhaj × Neil deGrasse Tyson: Why AI is Overrated	54:17	49.8 s	~65×	MacBook Pro M4 Pro, 48 GB

Reproduce

# Apple Silicon, macOS, whisper.cpp built with Metal,
# model: ggml-base.en, language: auto, no other heavy processes.

time localcaption --no-print -o /tmp/lc-bench-1 \
  "https://www.youtube.com/watch?v=PSRJfaAYkW4"

time localcaption --no-print -o /tmp/lc-bench-2 \
  "https://www.youtube.com/watch?v=aircAruvnKk"

time localcaption --no-print -o /tmp/lc-bench-3 \
  "https://www.youtube.com/watch?v=BYizgB2FcAQ"

If you'd like to contribute numbers from a different machine (Linux + CUDA, Windows + WSL, x86 macOS, etc.), open a PR adding a row above with your hardware in the Hardware column.

Notes

Bigger models = better quality but slower. base.en is a good default; try small.en if you have the patience and tiny.en for instant results.
Apple Silicon: whisper.cpp's CMake build uses Metal automatically, you'll see ggml_metal_init in the logs.
The pipeline accepts any URL yt-dlp supports (Vimeo, Twitch VODs, Twitter/X, podcast pages, and 1000+ more), not just YouTube.
If you hit HTTP 403 Forbidden, your yt-dlp is probably stale. pip install -U yt-dlp usually fixes it.

Roadmap

The roadmap lives on GitHub Issues so it's easy to track, comment on, and contribute to:

👉 Open roadmap items

A snapshot of what's planned (click through for full descriptions, acceptance criteria, and discussion):

#	Item	Labels
#7	`localcaption model {list,download,rm,info}` subcommand	shipped in v0.2.0 ✅
#2	Batch mode (`--batch urls.txt`)	`enhancement`
#3	Local auto-summary via Ollama (`--summary`)	`enhancement`
#4	Speaker diarization with pyannote.audio (`--diarize`)	`stretch`, `help wanted`
#5	YouTube chapters & grep-able search index	`enhancement`
#6	Pluggable transcription backends (faster-whisper / MLX)	`help wanted`
#1	~~Switch default model from `base.en` to `small.en`~~	superseded by #7

Have an idea? Open a feature request, or jump into Discussions if you want to chat about it first.

Related projects

localcaption deliberately stays tiny. If you want more, check out:

whishper: full web UI for local transcription with translation and editing.
transcribe-anything: multi-backend, Mac-arm optimised, supports URLs.
WhisperX: word-level timestamps and diarisation on top of openai-whisper.

Contributing

Pull requests welcome! See CONTRIBUTING.md. By participating you agree to abide by our Code of Conduct.

License

MIT.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jatinkrmalik

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Jun 7, 2026

0.1.0

Apr 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

localcaption-0.2.0.tar.gz (669.3 kB view details)

Uploaded Jun 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

localcaption-0.2.0-py3-none-any.whl (30.7 kB view details)

Uploaded Jun 7, 2026 Python 3

File details

Details for the file localcaption-0.2.0.tar.gz.

File metadata

Download URL: localcaption-0.2.0.tar.gz
Upload date: Jun 7, 2026
Size: 669.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for localcaption-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`bb9ae38f3cf1d290a6f053ed7f6af9c990a30b9ef443b9386a2a06304f3dbbca`
MD5	`cc5628ecaf3cab4671fdaea9873a2041`
BLAKE2b-256	`775cb329492e8dab7c4710e7d713f5413f475dc8bc944205bbc5cb71f44b1755`

See more details on using hashes here.

Provenance

The following attestation bundles were made for localcaption-0.2.0.tar.gz:

Publisher: release.yml on jatinkrmalik/localcaption

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: localcaption-0.2.0.tar.gz
- Subject digest: bb9ae38f3cf1d290a6f053ed7f6af9c990a30b9ef443b9386a2a06304f3dbbca
- Sigstore transparency entry: 1741959486
- Sigstore integration time: Jun 7, 2026
Source repository:
- Permalink: jatinkrmalik/localcaption@c85530ebd66b65e69468491f8f2cb05591d6cf5b
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/jatinkrmalik
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c85530ebd66b65e69468491f8f2cb05591d6cf5b
- Trigger Event: push

File details

Details for the file localcaption-0.2.0-py3-none-any.whl.

File metadata

Download URL: localcaption-0.2.0-py3-none-any.whl
Upload date: Jun 7, 2026
Size: 30.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for localcaption-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5ea4f775e66035b13b7539644741e5a7f7dd876db6e6b86dc63df41130b912db`
MD5	`a94de947ddf5c4cf9b10c439d22861fa`
BLAKE2b-256	`a711d7640ff9b5427fa3329cbedb6ded4762162bd1041e102fa9ccfcc9b6e262`

See more details on using hashes here.

Provenance

The following attestation bundles were made for localcaption-0.2.0-py3-none-any.whl:

Publisher: release.yml on jatinkrmalik/localcaption

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: localcaption-0.2.0-py3-none-any.whl
- Subject digest: 5ea4f775e66035b13b7539644741e5a7f7dd876db6e6b86dc63df41130b912db
- Sigstore transparency entry: 1741959500
- Sigstore integration time: Jun 7, 2026
Source repository:
- Permalink: jatinkrmalik/localcaption@c85530ebd66b65e69468491f8f2cb05591d6cf5b
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/jatinkrmalik
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c85530ebd66b65e69468491f8f2cb05591d6cf5b
- Trigger Event: push

localcaption 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

localcaption

Install

Prerequisites

Recommended: pipx (one line)

Uninstall

Dev install (contributors)

Usage

CLI

Subcommands

Managing models

Python API

Architecture

Module map

Runtime sequence

Benchmarks

Notes

Roadmap

Related projects

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance