Production-grade YouTube transcript extractor: single videos, batches, playlists, and entire channels. v2 adds IP-block-resistant multi-backend cascade.

These details have not been verified by PyPI

Project links

Project description

yt-transcript-pro v2.0

The most advanced, production-grade YouTube transcript extractor. Single videos, multi-video batches, full playlists, or entire channels — with concurrency, retries, checkpointing, six output formats, and — new in v2 — four interchangeable extraction backends with automatic cascade fallback that bypass IP blocks without needing proxies.

🚀 What's new in v2.0

The old v1 used youtube-transcript-api exclusively, which hits YouTube's /api/timedtext endpoint directly. That endpoint is the first thing YouTube rate-limits — after ~250 rapid requests the IP gets RequestBlocked / IpBlocked errors for 1–24 hours.

v2 ships four independent backends that use completely different endpoints, and an auto backend that cascades through them per video:

Backend	Endpoint	Resilience	Speed
`auto` (default)	cascades `watch → ytdlp → api`	⭐⭐⭐⭐⭐	fast
`watch`	`GET /watch?v=<id>` HTML scrape	⭐⭐⭐⭐	fastest
`ytdlp`	yt-dlp player API with 7-client rotation	⭐⭐⭐⭐	fast
`api` (legacy v1)	`youtube-transcript-api`	⭐⭐	fast

Why this works without proxies: each backend hits a different YouTube surface with different rate-limit heuristics. When one backend is throttled, the cascade automatically switches to the next — and each one has its own independent adaptive back-off (rogues of rapid requests trigger per-backend slowdowns so other backends keep flowing).

Empirically verified on cloud IPs that youtube-transcript-api gets blocked on within seconds — v2 keeps producing transcripts.

Other v2 improvements

🛡️ Windows Unicode fix – no more cp1252 crashes on the ✓/✗ progress glyphs.
📝 Incremental combined-file writes – partial runs produce durable output; Ctrl-C doesn't lose data.
🔄 Per-backend adaptive throttling – cooperative sleeps slow just the backend being throttled, not the whole pool.
🎯 7-client rotation for ytdlp: android → android_vr → tv_simply → tv_embedded → mweb → web → ios. Each client has its own rate-limit pool and user-agent fingerprint.
🍪 Cookies.txt support for age-restricted / private videos (--cookies cookies.txt).
🕵️ Rotating modern User-Agents (Chrome, Firefox, Safari, Edge, Android Chrome).
📊 Non-TTY / nohup logging – progress is still visible in log files.
🧪 130 unit tests, zero network access required to run the suite.

⚡ Install

# Clone / unzip and install
cd yt-transcript-pro
pip install -e ".[dev]"

Requires Python ≥ 3.9. Dependencies: yt-dlp, youtube-transcript-api, pydantic, tenacity, typer, rich.

📖 Quickstart

Entire channel → one combined text file

# This is the exact command that extracts all 665 InnerCircleTrader videos:
yttp extract "https://www.youtube.com/@InnerCircleTrader" \
  --output-dir ./channel_extraction/ICT \
  --format txt \
  --combine \
  --combined-name InnerCircleTrader_all_transcripts \
  --concurrency 5 \
  --retries 5 \
  --resume

--backend defaults to auto — cascades watch → ytdlp → api.
--resume skips videos already completed (re-run without re-downloading).
--concurrency 5 is a safe default. Bump to 10 for residential IPs; drop to 2 for cloud IPs.

Single video

yttp extract dQw4w9WgXcQ -o ./out
yttp extract "https://youtu.be/dQw4w9WgXcQ" -o ./out

Playlist

yttp extract "https://www.youtube.com/playlist?list=PLxxxx" -o ./out -f srt

All channel playlists -> one playlist-organized text file

python extract_playlists.py

The playlist runner writes one consolidated file plus an index, report, and failure checkpoint under channel_extraction/ICT_playlists/. It reuses transcript blocks already present in ../InnerCircleTrader_all_transcripts.txt and caches each newly fetched video under channel_extraction/ICT_playlists/transcripts/ for clean resume behavior.

Batch from a file of URLs/IDs

cat > urls.txt <<EOF
# one URL or ID per line; # for comments
https://www.youtube.com/watch?v=AAAA
dQw4w9WgXcQ
https://youtu.be/XXXX
EOF

yttp extract urls.txt -o ./out --combine

🧰 Full CLI reference

yttp extract --help

Key flags:

Flag	Default	Description
`-o/--output-dir`	`output/`	Where to write files
`-f/--format`	`txt`	`txt`\|`json`\|`srt`\|`vtt`\|`md`\|`csv`\|`all`
`-C/--combine`	off	Combine all transcripts into one file
`--combined-name`	`combined`	Filename stem for combined output
`-c/--concurrency`	`5`	Parallel fetchers (1-64)
`-n/--max-videos`	unlimited	Cap total videos processed
`-l/--languages`	`en,en-US,en-GB`	Preferred language list
`--timestamps`	off	Prefix each line with `[HH:MM:SS]`
`--allow-generated`	on	Fall back to auto-captions
`--resume`	on	Skip already-completed videos
`--checkpoint`	`<out>/.yttp-checkpoint.json`	Checkpoint location
`--retries`	`4`	Max per-video retries on transient errors
`-b/--backend`	`auto`	`auto` \| `watch` \| `ytdlp` \| `api`
`--player-clients`	(built-in)	Override yt-dlp client order
`--cookies`	none	Netscape `cookies.txt` (age-restricted videos)
`--user-agent`	rotating	Fixed HTTP User-Agent
`--proxy`	none	`http://user:pass@host:port`
`--webshare-user/-pass`	none	Webshare rotating residential proxy
`-v/--verbose`	off	Debug logging

Windows users

If you were hitting UnicodeEncodeError: 'charmap' codec can't encode character '\\u2717' in v1 — that's fixed in v2. The CLI now forces UTF-8 output on Windows.

If you still see it (exotic terminal setup), set:

$env:PYTHONIOENCODING="utf-8"
chcp 65001

🐍 Python API

import asyncio
from yt_transcript_pro import (
    Config,
    SourceResolver,
    AutoTranscriptExtractor,   # ← new in v2, recommended
    FormatWriter,
)

async def main() -> None:
    cfg = Config(
        concurrency=5,
        output_format="txt",
        combine_into_single_file=True,
        output_dir=Path("out"),
    )
    videos = SourceResolver().resolve(["https://www.youtube.com/@InnerCircleTrader"])
    ext = AutoTranscriptExtractor(cfg)
    results = await ext.fetch_many(videos)

    writer = FormatWriter(cfg)
    for r in results:
        if r.success:
            writer.append_combined(r, "txt", filename="all")

asyncio.run(main())

You can also use any backend directly:

from yt_transcript_pro import YtDlpTranscriptExtractor, WatchPageTranscriptExtractor

watch = WatchPageTranscriptExtractor(cfg)          # scrape /watch HTML
ydl   = YtDlpTranscriptExtractor(cfg)              # yt-dlp player API
ydl   = YtDlpTranscriptExtractor(
    cfg,
    player_clients=["android_vr", "tv_simply"],    # custom client order
)

🛡️ Troubleshooting IP blocks

If you still get blocks (very rare with auto backend):

Wait out the cool-down. YouTube typically unblocks IPs after 1–24 hours. Your progress is saved in .yttp-checkpoint.json — just re-run with --resume and it picks up where it left off.
Drop concurrency to 2 or 1. The adaptive throttler will further slow down automatically, but a lower starting concurrency is gentler on heavily-flagged IPs.
Export your browser cookies (Netscape format) and pass --cookies cookies.txt. Authenticated requests have a much higher rate-limit ceiling.
Use a proxy as a last resort (--proxy or --webshare-user/-pass). Residential proxies work best; datacenter proxies are often pre-blocked.

🧪 Running the tests

pip install -e ".[dev]"
pytest -v          # 130 tests, no network, < 1s

📦 Project layout

src/yt_transcript_pro/
├── __init__.py
├── auto_extractor.py        # ← new: cascade over all backends
├── checkpoint.py
├── cli.py                   # ← updated: --backend flag
├── config.py                # ← updated: cookies_file, user_agent
├── extractor.py             # legacy youtube-transcript-api backend
├── models.py
├── resolver.py              # channel/playlist/video URL resolution
├── watch_extractor.py       # ← new: /watch HTML scraper
├── writers.py               # ← updated: append_combined()
└── ytdlp_extractor.py       # ← new: yt-dlp backend w/ 7-client rotation
tests/                       # 130 unit tests

📝 License

MIT (same as v1). See LICENSE.

🙏 Credits

Built on top of the excellent yt-dlp, youtube-transcript-api, typer, rich, and pydantic.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.2

Apr 24, 2026

2.0.1

Apr 24, 2026

This version

2.0.0

Apr 24, 2026

1.0.0

Apr 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yt_transcript_pro-2.0.0.tar.gz (61.0 kB view details)

Uploaded Apr 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

yt_transcript_pro-2.0.0-py3-none-any.whl (40.3 kB view details)

Uploaded Apr 24, 2026 Python 3

File details

Details for the file yt_transcript_pro-2.0.0.tar.gz.

File metadata

Download URL: yt_transcript_pro-2.0.0.tar.gz
Upload date: Apr 24, 2026
Size: 61.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for yt_transcript_pro-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`9e55c9ea5f8ef58c2e56314c9718dcfc09a98f2ebec846364f91af0cc3c7c5be`
MD5	`bae71172bb84a140a8beaeb4aa2967bc`
BLAKE2b-256	`9594a1a2181590b28f4c67872fdbdd20d88a0afc0c3cd6213ada652b82331012`

See more details on using hashes here.

File details

Details for the file yt_transcript_pro-2.0.0-py3-none-any.whl.

File metadata

Download URL: yt_transcript_pro-2.0.0-py3-none-any.whl
Upload date: Apr 24, 2026
Size: 40.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for yt_transcript_pro-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1569ec76991e93967e6c02b089379388f303ab1593445604cafeeba1eae27878`
MD5	`8ee656eb552717c004519f3aba943a82`
BLAKE2b-256	`c4fcf21a8c73acc01cb2cae5be82940c219a433229f5dbf461f715c9f064c3bb`

See more details on using hashes here.

yt-transcript-pro 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

yt-transcript-pro v2.0

🚀 What's new in v2.0

Other v2 improvements

⚡ Install

📖 Quickstart

Entire channel → one combined text file

Single video

Playlist

All channel playlists -> one playlist-organized text file

Batch from a file of URLs/IDs

🧰 Full CLI reference

Windows users

🐍 Python API

🛡️ Troubleshooting IP blocks

🧪 Running the tests

📦 Project layout

📝 License

🙏 Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes