Skip to main content

CLI to transcribe YouTube audio via Whisper (local) or Gemini (cloud)

Project description

TubeScribe (ytx) — YouTube Transcriber (Whisper / Metal via whisper.cpp)

CLI that downloads YouTube audio and produces transcripts and captions using:

  • Local Whisper (faster-whisper / CTranslate2)
  • Whisper.cpp (Metal acceleration on Apple Silicon)

Repository: https://github.com/prateekjain24/TubeScribe

Managed with venv+pip (recommended) or uv, using the src layout.

Features

  • One command: URL → audio → normalized WAV → transcript JSON + SRT captions
  • Engines: whisper (faster-whisper) and whispercpp (Metal via whisper.cpp)
  • Rich progress for download + transcription
  • Deterministic JSON (orjson) and SRT line wrapping

Requirements

  • Python >= 3.11
  • FFmpeg installed and on PATH
    • Check: ffmpeg -version
    • macOS: brew install ffmpeg
    • Ubuntu/Debian: sudo apt-get update && sudo apt-get install -y ffmpeg
    • Fedora: sudo dnf install -y ffmpeg
    • Arch: sudo pacman -S ffmpeg
    • Windows: winget install Gyan.FFmpeg or choco install ffmpeg

Install (dev)

  • Option A: venv + pip (recommended)
    • cd ytx && python3.11 -m venv .venv && source .venv/bin/activate
    • python -m pip install -U pip setuptools wheel
    • python -m pip install -e .
    • ytx --help
  • Option B: uv
    • cd ytx && uv sync
    • uv run ytx --help

Running locally without installing

  • From repo root:
    • export PYTHONPATH="$(pwd)/ytx/src"
    • cd ytx && python3 -m ytx.cli --help
    • Example: python3 -m ytx.cli summarize-file 0jpcFxY_38k.json --write

Note: Avoid running the ytx console script from inside the ytx/ folder; Python may shadow the installed package. Use the module form or run from repo root.

Usage (CLI)

  • Whisper (CPU by default):
    • ytx transcribe <url> --engine whisper --model small
  • Whisper (larger model):
    • ytx transcribe <url> --engine whisper --model large-v3-turbo
  • Gemini (best‑effort timestamps):
    • ytx transcribe <url> --engine gemini --timestamps chunked --fallback
  • Chapters + summaries:
    • ytx transcribe <url> --by-chapter --parallel-chapters --chapter-overlap 2.0 --summarize-chapters --summarize
  • Engine options and timestamp policy:
    • ytx transcribe <url> --engine-opts '{"utterances":true}' --timestamps native
  • Output dir:
    • ytx transcribe <url> --output-dir ./artifacts
  • Verbose logging:
    • ytx --verbose transcribe <url> --engine whisper
  • Health check:
    • ytx health (ffmpeg, API key presence, network)
  • Summarize an existing transcript JSON:
    • ytx summarize-file /path/to/<video_id>.json --write

Metal (Apple Silicon) via whisper.cpp

  • Build whisper.cpp with Metal: make -j METAL=1
  • Download a GGUF/GGML model (e.g., large-v3-turbo)
  • Run with whisper.cpp engine by passing a model file path:
    • uv run ytx transcribe <url> --engine whispercpp --model /path/to/gguf-large-v3-turbo.bin
  • Auto-prefer whisper.cpp when device=metal (if whisper.cpp binary is available):
    • Set env YTX_WHISPERCPP_BIN to the main binary path, and provide a model path as above
  • Tuning (env or .env):
    • YTX_WHISPERCPP_NGL (GPU layers, default 35), YTX_WHISPERCPP_THREADS (CPU threads)

Outputs

  • JSON (<video_id>.json): TranscriptDoc
    • keys: video_id, source_url, title, duration, language, engine, model, created_at, segments[], chapters?, summary?
    • segment: {id, start, end, text, confidence?} (seconds for time)
  • SRT (<video_id>.srt): line-wrapped captions (2 lines max)
  • Cache artifacts (under XDG cache root): meta.json, summary.json, transcript and captions.

Configuration (.env)

  • Copy .env.example.env, then adjust:
    • GEMINI_API_KEY (for Gemini)
    • YTX_ENGINE (default whisper), WHISPER_MODEL (e.g., large-v3-turbo)
    • YTX_WHISPERCPP_BIN and YTX_WHISPERCPP_MODEL_PATH for whisper.cpp
    • Optional: YTX_CACHE_DIR, YTX_OUTPUT_DIR, YTX_ENGINE_OPTS (JSON), and timeouts (YTX_NETWORK_TIMEOUT, etc.)

Restricted videos & cookies

  • Some videos are age/region restricted or private. The downloader supports cookies, but CLI flags are not yet wired.
  • Workarounds: run yt-dlp manually, or use the Python API (pass cookies_from_browser / cookies_file to downloader).
  • Error messages suggest cookies usage when restrictions are detected.

Performance Tips

  • faster‑whisper: compute_type=auto resolves to int8 on CPU, float16 on CUDA.
  • Model sizing: start with small/medium; use large-v3(-turbo) for best quality.
  • Metal (whisper.cpp): tune -ngl (30–40 typical on M‑series) and threads to maximize throughput.

Development

  • Structure: code in src/ytx/, CLI in src/ytx/cli.py, engines in src/ytx/engines/, exporters in src/ytx/exporters/.
  • Tests: pytest -q (add tests under ytx/tests/).
  • Lint/format (if configured): ruff check . / ruff format ..

Roadmap

  • Add VTT/TXT exporters, format selection (--formats json,srt,vtt,txt)
  • OpenAI/Deepgram/ElevenLabs engines via shared cloud base
  • More resilient chunking/alignment; diarization options where supported
  • CI + tests; docs polish; performance tuning

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tubescribe-0.3.2.tar.gz (47.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tubescribe-0.3.2-py3-none-any.whl (63.5 kB view details)

Uploaded Python 3

File details

Details for the file tubescribe-0.3.2.tar.gz.

File metadata

  • Download URL: tubescribe-0.3.2.tar.gz
  • Upload date:
  • Size: 47.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for tubescribe-0.3.2.tar.gz
Algorithm Hash digest
SHA256 c446f83c8bcc5fd7b660659b2465a111598dfb9b46ca0b9d3fda417531e12f52
MD5 390b4e058d59831f9ff5f2d61aea0a77
BLAKE2b-256 6048f658e7cca0edde77f098118dff23def1ea67170c9381d9b7b86db2a44abb

See more details on using hashes here.

File details

Details for the file tubescribe-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: tubescribe-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 63.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for tubescribe-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 bae168fabacba527f9e488b841751e07332a185e37416397a738321c2007702a
MD5 2c9e402e2aff5ce417e39b6da9271ffd
BLAKE2b-256 bd9f50107ede1d15e40de7e89428f74be1ccdb29d9f8f2c22e7f73b0bcced269

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page