Skip to main content

Production-grade YouTube transcript extractor: single videos, batches, playlists, and entire channels.

Project description

yt-transcript-pro

CI Coverage Python License: MIT Ruff mypy: strict Security: bandit

The most advanced, production-grade YouTube transcript extractor. Single videos, multi-video batches, full playlists, or entire channels — with concurrency, retries, checkpointing, and six output formats. Zero lint warnings, zero security issues, 100% test coverage.

✨ Why this one

Feature yt-transcript-pro Typical alternatives
Single video, playlist, and entire channel Partial
Async concurrent fetching (configurable) Rare
Exponential-backoff retries (tenacity) Rare
Checkpointing / resume after interruption
6 output formats (TXT/JSON/SRT/VTT/MD/CSV) + all 1–2
Combined single-file output
Optional timestamps in plain-text output Sometimes
Rich CLI with live progress bar
Strict typing (mypy strict, py.typed)
100% branch coverage test suite
Zero bandit / ruff / mypy findings

🚀 Install

pip install yt-transcript-pro
# or from source:
git clone https://github.com/Avinashvelu03/yt-transcript-pro
cd yt-transcript-pro
pip install -e ".[dev]"

⚡ Quickstart

# Single video
yttp extract dQw4w9WgXcQ -o ./out

# Multiple videos, combined into one text file
yttp extract VID1 VID2 VID3 -o ./out -f txt --combine

# Entire playlist → SRT subtitles
yttp extract "https://www.youtube.com/playlist?list=PLabc" -f srt -o ./out

# Entire channel → markdown + JSON + TXT, 10 parallel workers
yttp extract "https://www.youtube.com/@SomeHandle" -f all -c 10 -o ./out

# From a file of URLs/IDs (one per line; # for comments)
yttp extract urls.txt -o ./out

# Preview only (no download)
yttp resolve "https://www.youtube.com/c/SomeChannel" -n 50

Full options:

yttp extract --help

🐍 Python API

import asyncio
from yt_transcript_pro import (
    Config, SourceResolver, TranscriptExtractor, FormatWriter,
)

async def main() -> None:
    cfg = Config(concurrency=8, output_format="txt", combine_into_single_file=True)
    videos = SourceResolver().resolve(["https://www.youtube.com/@SomeHandle"])
    results = await TranscriptExtractor(cfg).fetch_many(videos)
    FormatWriter(cfg).write_combined(results, "txt", "all_transcripts")

asyncio.run(main())

🧪 Development

pip install -e ".[dev]"
pytest                     # run tests
pytest --cov               # with coverage (must be 100%)
ruff check src tests       # lint
mypy src                   # strict type-check
bandit -c pyproject.toml -r src   # security scan

🛡️ Working around YouTube IP bans

YouTube blocks transcript requests from most cloud-provider IPs (AWS, GCP, Azure, sandbox/CI environments). If you see IpBlocked / RequestBlocked errors, you need to route through a proxy. yt-transcript-pro supports two approaches:

Option 1: Generic HTTP(S) proxy

yttp extract "https://www.youtube.com/@SomeHandle" \
  --proxy http://user:pass@proxy.example.com:8080 \
  -o ./out --combine

Option 2: Webshare rotating-residential proxies (recommended)

Sign up at https://www.webshare.io/ and grab your credentials:

yttp extract "https://www.youtube.com/@SomeHandle" \
  --webshare-user YOUR_USER --webshare-pass YOUR_PASS \
  -o ./out --combine

Both flags are also available via the Python API through the Config object (proxy=, webshare_proxy_username=, webshare_proxy_password=).

🐳 Docker

docker build -t yt-transcript-pro .
docker run --rm -v "$PWD/out:/app/out" yt-transcript-pro \
    extract "https://www.youtube.com/@SomeHandle" -o /app/out -f txt --combine

📄 License

MIT © Avinashvelu03

🙏 Credits

Built on top of youtube-transcript-api and yt-dlp. Thank you to those maintainers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yt_transcript_pro-1.0.0.tar.gz (25.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yt_transcript_pro-1.0.0-py3-none-any.whl (18.3 kB view details)

Uploaded Python 3

File details

Details for the file yt_transcript_pro-1.0.0.tar.gz.

File metadata

  • Download URL: yt_transcript_pro-1.0.0.tar.gz
  • Upload date:
  • Size: 25.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for yt_transcript_pro-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b3e1924acc2cdac45cc75026d3b3f58e8839748ad3dfc22889a2c617da3f93c7
MD5 0ab2f2b8d6072b76151a129ec43d57ee
BLAKE2b-256 cb55ccaf16b2399634ce765c610861997cfc5e69679ffdf8b4a9295a809f9552

See more details on using hashes here.

File details

Details for the file yt_transcript_pro-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for yt_transcript_pro-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cf028bc18e02b9e3051f3966c7639e84c67d1ecacca0d7e4d5f8215e99de5c2d
MD5 8b06dcfb5d2d23edff3a61567b426415
BLAKE2b-256 41d3fe7595a78c5b0ee055011e09c831e331b5e65690291ff4d3b80e7cf13a05

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page