Snap any video URL or local audio/video into a plaintext transcript. CPU-first, offline, single command.
Project description
yapsnap
Snap any video URL or audio file into plaintext. No GPU. No cloud. One command.
yapsnap "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
That's it. You get a .txt next to your shell, transcribed on your CPU, in less time than it took the video to play.
Why yapsnap
- ⚡ Fast on CPU. Streaming Zipformer transducer (Kroko English) chews through audio at several times realtime on a laptop. No CUDA. No M-series-only tricks. Plain old cores.
- 🌐 Any video URL, plus local files. YouTube. X. TikTok. Instagram Reels. Direct
.mp4/.mp3links. Or just point it at a file on disk. yt-dlp handles the fetch, ffmpeg handles the decode, the rest is yours. - 📴 Offline after first run. ~80 MB model downloads once to your cache and stays there. No API keys. No quotas. Your audio never leaves your machine.
- 🪶 One file, three deps.
sherpa-onnx,numpy,yt-dlp. The whole tool is a single Python module. - 🗣 Ten-plus languages. English out of the box; French, German, Spanish, Italian, Portuguese, Dutch, Swedish, Swiss German, Hebrew, and Turkish are a one-line
--modelswap away. See Other languages. - ⏱ Sentence-level timestamps when you want them.
--timestampsadds[MM:SS]per sentence using Kroko's built-in punctuation. Timing stays correct even when you transcribe at 2x.
Quickstart
# 1. ffmpeg on PATH (one-time, per OS — see below)
# 2. Install (from PyPI, or `pip install .` from a clone)
pip install yapsnap
# 3. Snap something
yapsnap https://www.tiktok.com/@user/video/7234567890123456789
yapsnap meeting.mp4 --timestamps
yapsnap podcast.mp3 -o ~/notes/episode.txt
The first run downloads the model (~80 MB). Every run after is offline.
What it handles
Any URL yt-dlp understands works. The big ones:
| Source | Example |
|---|---|
| YouTube | https://www.youtube.com/watch?v=... |
| YouTube Shorts | https://www.youtube.com/shorts/... |
| X / Twitter | https://x.com/user/status/.../video/1 |
| TikTok | https://www.tiktok.com/@user/video/... |
| Instagram Reels | https://www.instagram.com/reel/.../ |
| Direct media URL | https://example.com/clip.mp4 |
Plus any local file ffmpeg can decode: .mp3, .mp4, .m4a, .wav, .webm, .mov, .mkv, .aac, .opus, .ogg, .flac, and friends.
Install
1. ffmpeg
| OS | Command |
|---|---|
| macOS | brew install ffmpeg |
| Linux | sudo apt install ffmpeg or sudo dnf install ffmpeg |
| Windows | winget install ffmpeg or choco install ffmpeg |
2. yapsnap
From PyPI (recommended):
pip install yapsnap
From source:
git clone https://github.com/kouhxp/yapsnap
cd yapsnap
pip install .
Installs two equivalent commands on your PATH: yapsnap (canonical) and transcribe (alias, for when the name slips your mind).
Usage
# Local file
yapsnap path/to/audio.mp3
# Any video URL
yapsnap "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
# Sentence-level timestamps
yapsnap input.mp4 --timestamps
# Custom output path
yapsnap input.mp4 -o ./transcripts/talk.txt
# Don't speed audio up before transcribing (default is 1.5x, pitch preserved)
yapsnap input.mp4 --speed 1.0
# Keep the downloaded audio (URL inputs only)
yapsnap "https://..." --keep-audio
Output
Plaintext, UTF-8. Default location is ./transcripts/ (created if missing) under the current working directory; override with -o. For URL inputs the filename is derived from the video ID (dQw4w9WgXcQ_transcript.txt, etc.).
Without --timestamps — one paragraph of recognized text:
Welcome to the show. Today we're talking about transcription. Let's get started.
With --timestamps — one sentence per line, timed against the original audio:
[00:00] Welcome to the show.
[00:03] Today we're talking about transcription.
[00:08] Let's get started.
Timestamps stay in original-audio time even at --speed 1.5 or higher.
Flags
| Flag | Description |
|---|---|
-o, --output |
Output .txt path. Default: ./transcripts/<input>_transcript.txt. |
--timestamps |
Emit [MM:SS] sentence. lines instead of a single paragraph. |
--speed |
Pre-transcription speedup factor, pitch preserved. Default 1.5. |
--keep-audio |
Keep the downloaded audio (URL inputs only). |
--model |
Override the model directory. Also reads KROKO_MODEL env var. |
How it works
- Fetch. If the input is a URL,
yt-dlpgrabs the best audio-only stream to a temp directory. If it's a local path, this step is skipped. - Decode.
ffmpegpipes the media into 16 kHz mono PCM. The optionalatempofilter speeds it up without raising pitch. - Recognize. A streaming Zipformer2 transducer (Kroko English, INT8 ONNX, ~80 MB) eats the PCM in chunks. CPU-only. Greedy decode.
- Format. Plain text by default. With
--timestamps, token timestamps are grouped on.!?into sentences and scaled back to original-audio time.
No frame is sent anywhere. No state is kept between runs except the cached model.
Model & cache
The default Kroko English model is downloaded on first run to:
- macOS —
~/Library/Caches/yapsnap/ - Linux —
$XDG_CACHE_HOME/yapsnap/(or~/.cache/yapsnap/) - Windows —
%LOCALAPPDATA%\yapsnap\
To use a different streaming transducer (other languages, larger Kroko variants, etc.), point --model at a directory containing encoder(.int8).onnx, decoder(.int8).onnx, joiner(.int8).onnx, and tokens.txt. Or set KROKO_MODEL in your environment.
Other languages
The default model is English, but yapsnap isn't limited to it. To transcribe another language, just download the matching model and point yapsnap at it — no code changes, no reinstall.
Kroko publishes streaming models for a growing list of languages on Hugging Face: https://huggingface.co/Banafo/Kroko-ASR/tree/main. As of now that includes:
- Dutch
- French
- German
- Hebrew
- Italian
- Portuguese
- Spanish
- Swedish
- Swiss German
- Turkish
Download the one you need, unpack it into its own folder, and run:
# Per-run: pass the model folder explicitly
yapsnap interview.mp3 --model /path/to/kroko-french
# Or set it once as your default for the session
export KROKO_MODEL=/path/to/kroko-french
yapsnap interview.mp3
Each model is single-language, so to work across several languages keep them in separate folders and switch with --model (or re-export KROKO_MODEL) as you go. Any other sherpa-onnx streaming transducer with the standard encoder / decoder / joiner / tokens.txt layout works too, not just the Kroko ones.
Notes & limits
- The default model is English. For other languages, download a matching model and pass it with
--model— see Other languages for the current list and instructions. --speed 1.5shaves about a third off transcription time with minimal accuracy cost. Try2.0if you want it even faster, or1.0for noisy, mumbled, or fast-speech sources.- Some social-media URLs are geo-locked or login-walled;
yt-dlpwill say so explicitly. - This is a streaming model, so timestamps come from token positions in the recognized stream. They're accurate enough for navigation, not for subtitling-grade alignment.
License
Apache-2.0 for this project. The Kroko model is distributed under its own license — see https://huggingface.co/Banafo/Kroko-ASR. Powered by sherpa-onnx and yt-dlp.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yapsnap-0.1.0.tar.gz.
File metadata
- Download URL: yapsnap-0.1.0.tar.gz
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ecf1e6cc03fd1418352f2d359a76926eb56ee5e81d9a822a8d73c4ea00d51a7
|
|
| MD5 |
694a78a386385562a6ff3643291877ea
|
|
| BLAKE2b-256 |
c421f288f31419873dc0d933092c776b880b90fb8ae9828a2ad0ac4af97f45f9
|
File details
Details for the file yapsnap-0.1.0-py3-none-any.whl.
File metadata
- Download URL: yapsnap-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe5c27615c791aece2c75191ac8316a2ac399136b2e6f6d7e7bcacef97408d75
|
|
| MD5 |
4cbb31f93e1870f8a9d45a71ed2c35c3
|
|
| BLAKE2b-256 |
bf72511b00a0c920f9e9efefac151110c6ee810fea96a49943365039cfd6bd93
|