Production-grade YouTube transcript extractor: single videos, batches, playlists, and entire channels.
Project description
yt-transcript-pro
The most advanced, production-grade YouTube transcript extractor. Single videos, multi-video batches, full playlists, or entire channels — with concurrency, retries, checkpointing, and six output formats. Zero lint warnings, zero security issues, 100% test coverage.
✨ Why this one
| Feature | yt-transcript-pro |
Typical alternatives |
|---|---|---|
| Single video, playlist, and entire channel | ✅ | Partial |
| Async concurrent fetching (configurable) | ✅ | Rare |
| Exponential-backoff retries (tenacity) | ✅ | Rare |
| Checkpointing / resume after interruption | ✅ | ❌ |
6 output formats (TXT/JSON/SRT/VTT/MD/CSV) + all |
✅ | 1–2 |
| Combined single-file output | ✅ | ❌ |
| Optional timestamps in plain-text output | ✅ | Sometimes |
| Rich CLI with live progress bar | ✅ | ❌ |
| Strict typing (mypy strict, py.typed) | ✅ | ❌ |
| 100% branch coverage test suite | ✅ | ❌ |
| Zero bandit / ruff / mypy findings | ✅ | ❌ |
🚀 Install
pip install yt-transcript-pro
# or from source:
git clone https://github.com/Avinashvelu03/yt-transcript-pro
cd yt-transcript-pro
pip install -e ".[dev]"
⚡ Quickstart
# Single video
yttp extract dQw4w9WgXcQ -o ./out
# Multiple videos, combined into one text file
yttp extract VID1 VID2 VID3 -o ./out -f txt --combine
# Entire playlist → SRT subtitles
yttp extract "https://www.youtube.com/playlist?list=PLabc" -f srt -o ./out
# Entire channel → markdown + JSON + TXT, 10 parallel workers
yttp extract "https://www.youtube.com/@SomeHandle" -f all -c 10 -o ./out
# From a file of URLs/IDs (one per line; # for comments)
yttp extract urls.txt -o ./out
# Preview only (no download)
yttp resolve "https://www.youtube.com/c/SomeChannel" -n 50
Full options:
yttp extract --help
🐍 Python API
import asyncio
from yt_transcript_pro import (
Config, SourceResolver, TranscriptExtractor, FormatWriter,
)
async def main() -> None:
cfg = Config(concurrency=8, output_format="txt", combine_into_single_file=True)
videos = SourceResolver().resolve(["https://www.youtube.com/@SomeHandle"])
results = await TranscriptExtractor(cfg).fetch_many(videos)
FormatWriter(cfg).write_combined(results, "txt", "all_transcripts")
asyncio.run(main())
🧪 Development
pip install -e ".[dev]"
pytest # run tests
pytest --cov # with coverage (must be 100%)
ruff check src tests # lint
mypy src # strict type-check
bandit -c pyproject.toml -r src # security scan
🛡️ Working around YouTube IP bans
YouTube blocks transcript requests from most cloud-provider IPs (AWS, GCP, Azure,
sandbox/CI environments). If you see IpBlocked / RequestBlocked errors, you
need to route through a proxy. yt-transcript-pro supports two approaches:
Option 1: Generic HTTP(S) proxy
yttp extract "https://www.youtube.com/@SomeHandle" \
--proxy http://user:pass@proxy.example.com:8080 \
-o ./out --combine
Option 2: Webshare rotating-residential proxies (recommended)
Sign up at https://www.webshare.io/ and grab your credentials:
yttp extract "https://www.youtube.com/@SomeHandle" \
--webshare-user YOUR_USER --webshare-pass YOUR_PASS \
-o ./out --combine
Both flags are also available via the Python API through the Config object
(proxy=, webshare_proxy_username=, webshare_proxy_password=).
🐳 Docker
docker build -t yt-transcript-pro .
docker run --rm -v "$PWD/out:/app/out" yt-transcript-pro \
extract "https://www.youtube.com/@SomeHandle" -o /app/out -f txt --combine
📄 License
MIT © Avinashvelu03
🙏 Credits
Built on top of youtube-transcript-api
and yt-dlp. Thank you to those maintainers.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yt_transcript_pro-1.0.0.tar.gz.
File metadata
- Download URL: yt_transcript_pro-1.0.0.tar.gz
- Upload date:
- Size: 25.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b3e1924acc2cdac45cc75026d3b3f58e8839748ad3dfc22889a2c617da3f93c7
|
|
| MD5 |
0ab2f2b8d6072b76151a129ec43d57ee
|
|
| BLAKE2b-256 |
cb55ccaf16b2399634ce765c610861997cfc5e69679ffdf8b4a9295a809f9552
|
File details
Details for the file yt_transcript_pro-1.0.0-py3-none-any.whl.
File metadata
- Download URL: yt_transcript_pro-1.0.0-py3-none-any.whl
- Upload date:
- Size: 18.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf028bc18e02b9e3051f3966c7639e84c67d1ecacca0d7e4d5f8215e99de5c2d
|
|
| MD5 |
8b06dcfb5d2d23edff3a61567b426415
|
|
| BLAKE2b-256 |
41d3fe7595a78c5b0ee055011e09c831e331b5e65690291ff4d3b80e7cf13a05
|