CLI to transcribe YouTube audio via Whisper (local) or Gemini (cloud)
Project description
TubeScribe (ytx) — YouTube Transcriber (Whisper / Metal via whisper.cpp)
CLI that downloads YouTube audio and produces transcripts and captions using:
- Local Whisper (faster-whisper / CTranslate2)
- Whisper.cpp (Metal acceleration on Apple Silicon)
Repository: https://github.com/prateekjain24/TubeScribe
Managed with venv+pip (recommended) or uv, using the src layout.
Features
- One command: URL → audio → normalized WAV → transcript JSON + SRT captions
- Engines:
whisper(faster-whisper) andwhispercpp(Metal via whisper.cpp) - Rich progress for download + transcription
- Deterministic JSON (orjson) and SRT line wrapping
Requirements
- Python >= 3.11
- FFmpeg installed and on PATH
- Check:
ffmpeg -version - macOS:
brew install ffmpeg - Ubuntu/Debian:
sudo apt-get update && sudo apt-get install -y ffmpeg - Fedora:
sudo dnf install -y ffmpeg - Arch:
sudo pacman -S ffmpeg - Windows:
winget install Gyan.FFmpegorchoco install ffmpeg
- Check:
Install (dev)
- Option A: venv + pip (recommended)
cd ytx && python3.11 -m venv .venv && source .venv/bin/activatepython -m pip install -U pip setuptools wheelpython -m pip install -e .ytx --help
- Option B: uv
cd ytx && uv syncuv run ytx --help
Running locally without installing
- From repo root:
export PYTHONPATH="$(pwd)/ytx/src"cd ytx && python3 -m ytx.cli --help- Example:
python3 -m ytx.cli summarize-file 0jpcFxY_38k.json --write
Note: Avoid running the ytx console script from inside the ytx/ folder; Python may shadow the installed package. Use the module form or run from repo root.
Usage (CLI)
- Whisper (CPU by default):
ytx transcribe <url> --engine whisper --model small
- Whisper (larger model):
ytx transcribe <url> --engine whisper --model large-v3-turbo
- Gemini (best‑effort timestamps):
ytx transcribe <url> --engine gemini --timestamps chunked --fallback
- Chapters + summaries:
ytx transcribe <url> --by-chapter --parallel-chapters --chapter-overlap 2.0 --summarize-chapters --summarize
- Engine options and timestamp policy:
ytx transcribe <url> --engine-opts '{"utterances":true}' --timestamps native
- Output dir:
ytx transcribe <url> --output-dir ./artifacts
- Verbose logging:
ytx --verbose transcribe <url> --engine whisper
- Health check:
ytx health(ffmpeg, API key presence, network)
- Summarize an existing transcript JSON:
ytx summarize-file /path/to/<video_id>.json --write
Metal (Apple Silicon) via whisper.cpp
- Build whisper.cpp with Metal:
make -j METAL=1 - Download a GGUF/GGML model (e.g., large-v3-turbo)
- Run with whisper.cpp engine by passing a model file path:
uv run ytx transcribe <url> --engine whispercpp --model /path/to/gguf-large-v3-turbo.bin
- Auto-prefer whisper.cpp when
device=metal(ifwhisper.cppbinary is available):- Set env
YTX_WHISPERCPP_BINto themainbinary path, and provide a model path as above
- Set env
- Tuning (env or .env):
YTX_WHISPERCPP_NGL(GPU layers, default 35),YTX_WHISPERCPP_THREADS(CPU threads)
Outputs
- JSON (
<video_id>.json): TranscriptDoc- keys:
video_id, source_url, title, duration, language, engine, model, created_at, segments[], chapters?, summary? - segment:
{id, start, end, text, confidence?}(seconds for time)
- keys:
- SRT (
<video_id>.srt): line-wrapped captions (2 lines max) - Cache artifacts (under XDG cache root):
meta.json,summary.json, transcript and captions.
Configuration (.env)
- Copy
.env.example→.env, then adjust:GEMINI_API_KEY(for Gemini)YTX_ENGINE(defaultwhisper),WHISPER_MODEL(e.g.,large-v3-turbo)YTX_WHISPERCPP_BINandYTX_WHISPERCPP_MODEL_PATHfor whisper.cpp- Optional:
YTX_CACHE_DIR,YTX_OUTPUT_DIR,YTX_ENGINE_OPTS(JSON), and timeouts (YTX_NETWORK_TIMEOUT, etc.)
Restricted videos & cookies
- Some videos are age/region restricted or private. The downloader supports cookies, but CLI flags are not yet wired.
- Workarounds: run yt-dlp manually, or use the Python API (pass
cookies_from_browser/cookies_fileto downloader). - Error messages suggest cookies usage when restrictions are detected.
Performance Tips
- faster‑whisper:
compute_type=autoresolves toint8on CPU,float16on CUDA. - Model sizing: start with
small/medium; uselarge-v3(-turbo)for best quality. - Metal (whisper.cpp): tune
-ngl(30–40 typical on M‑series) and threads to maximize throughput.
Development
- Structure: code in
src/ytx/, CLI insrc/ytx/cli.py, engines insrc/ytx/engines/, exporters insrc/ytx/exporters/. - Tests:
pytest -q(add tests underytx/tests/). - Lint/format (if configured):
ruff check ./ruff format ..
Roadmap
- Add VTT/TXT exporters, format selection (
--formats json,srt,vtt,txt) - OpenAI/Deepgram/ElevenLabs engines via shared cloud base
- More resilient chunking/alignment; diarization options where supported
- CI + tests; docs polish; performance tuning
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tubescribe-0.3.2.tar.gz.
File metadata
- Download URL: tubescribe-0.3.2.tar.gz
- Upload date:
- Size: 47.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c446f83c8bcc5fd7b660659b2465a111598dfb9b46ca0b9d3fda417531e12f52
|
|
| MD5 |
390b4e058d59831f9ff5f2d61aea0a77
|
|
| BLAKE2b-256 |
6048f658e7cca0edde77f098118dff23def1ea67170c9381d9b7b86db2a44abb
|
File details
Details for the file tubescribe-0.3.2-py3-none-any.whl.
File metadata
- Download URL: tubescribe-0.3.2-py3-none-any.whl
- Upload date:
- Size: 63.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bae168fabacba527f9e488b841751e07332a185e37416397a738321c2007702a
|
|
| MD5 |
2c9e402e2aff5ce417e39b6da9271ffd
|
|
| BLAKE2b-256 |
bd9f50107ede1d15e40de7e89428f74be1ccdb29d9f8f2c22e7f73b0bcced269
|