CLI tool for extracting transcripts from YouTube videos, playlists, and channels
Project description
yt-transcripts 🎼
A Python CLI tool for extracting transcripts from YouTube videos, playlists, and channels.
Installation
pip install -e .
Or install dependencies directly:
pip install youtube-transcript-api yt-dlp
With AI Summarization
To enable AI-powered summarization:
pip install -e ".[summarize]"
Usage
yt-transcripts [OPTIONS] SOURCE...
Sources
The tool accepts multiple source types:
- Video URL:
https://www.youtube.com/watch?v=VIDEO_ID - Video ID:
dQw4w9WgXcQ - Channel URL:
https://www.youtube.com/@ChannelName - Playlist URL:
https://www.youtube.com/playlist?list=PLAYLIST_ID
Options
| Option | Description |
|---|---|
-f, --format |
Output format: text, json, srt, vtt (default: text) |
-l, --language |
Preferred language code(s), can be specified multiple times (default: en) |
-o, --output |
Output file or directory (default: stdout) |
--max-videos |
Maximum number of videos to process from channel/playlist |
--list-only |
Only list videos without extracting transcripts |
-v, --verbose |
Verbose output |
-h, --help |
Show help message |
-s, --summarize |
Summarize transcripts using AI |
--model |
LiteLLM model string (default: ollama/llama3.2) |
--api-key |
API key for cloud providers |
--ollama-host |
Ollama server URL (default: http://localhost:11434) |
Examples
Single Video
# By URL
yt-transcripts "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
# By video ID
yt-transcripts dQw4w9WgXcQ
Multiple Videos
yt-transcripts VIDEO_ID1 VIDEO_ID2 VIDEO_ID3
Output Formats
# Plain text (default)
yt-transcripts VIDEO_ID -f text
# JSON with timestamps and metadata
yt-transcripts VIDEO_ID -f json
# SRT subtitles
yt-transcripts VIDEO_ID -f srt
# WebVTT subtitles
yt-transcripts VIDEO_ID -f vtt
Save to File
# Single file
yt-transcripts VIDEO_ID -o transcript.txt
# Multiple videos to separate files in a directory
yt-transcripts VIDEO_ID1 VIDEO_ID2 -o ./transcripts/
Channels
# List all videos from a channel
yt-transcripts "https://www.youtube.com/@anthropic-ai" --list-only
# Extract transcripts from first 10 videos
yt-transcripts "https://www.youtube.com/@anthropic-ai" --max-videos 10
# Save channel transcripts to directory as JSON
yt-transcripts "https://www.youtube.com/@anthropic-ai" --max-videos 5 -f json -o ./transcripts/
Playlists
# List videos in a playlist
yt-transcripts "https://www.youtube.com/playlist?list=PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf" --list-only
# Extract all transcripts from playlist
yt-transcripts "https://www.youtube.com/playlist?list=PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf"
Language Selection
# Prefer Spanish, fall back to English
yt-transcripts VIDEO_ID -l es -l en
# Prefer French
yt-transcripts VIDEO_ID -l fr
AI Summarization
Summarize transcripts using LLMs. Supports Ollama (local), OpenAI, Anthropic, Gemini, and OpenRouter.
# Using local Ollama (default)
yt-transcripts -s VIDEO_ID
# Specify a model
yt-transcripts -s --model openai/gpt-4o-mini VIDEO_ID
# With API key
yt-transcripts -s --model anthropic/claude-sonnet-4-20250514 --api-key sk-ant-... VIDEO_ID
# Summarize multiple videos to a directory
yt-transcripts -s -o ./summaries/ VIDEO_ID1 VIDEO_ID2
# Summarize a playlist
yt-transcripts -s --max-videos 5 "https://www.youtube.com/playlist?list=PLAYLIST_ID"
Environment Variables
| Variable | Description | Default |
|---|---|---|
YT_SUMMARIZE_MODEL |
Default LiteLLM model | ollama/llama3.2 |
OLLAMA_HOST |
Ollama server URL | http://localhost:11434 |
OPENAI_API_KEY |
OpenAI API key | - |
ANTHROPIC_API_KEY |
Anthropic API key | - |
GEMINI_API_KEY |
Google Gemini API key | - |
OPENROUTER_API_KEY |
OpenRouter API key | - |
You can also use a .env file in your project directory.
Supported Models
- Ollama (local):
ollama/llama3.2,ollama/mistral, etc. - OpenAI:
openai/gpt-4o,openai/gpt-4o-mini - Anthropic:
anthropic/claude-sonnet-4-20250514,anthropic/claude-haiku - Gemini:
gemini/gemini-1.5-flash,gemini/gemini-1.5-pro - OpenRouter:
openrouter/meta-llama/llama-3-8b-instruct
Output Formats
Text
Plain text with all segments joined together:
We're no strangers to love You know the rules and so do I...
JSON
Structured data with metadata and timestamps:
{
"video_id": "dQw4w9WgXcQ",
"language": "en",
"is_generated": false,
"segments": [
{
"text": "We're no strangers to love",
"start": 18.64,
"duration": 3.24
}
]
}
SRT
Standard subtitle format:
1
00:00:18,640 --> 00:00:21,880
We're no strangers to love
2
00:00:22,640 --> 00:00:26,960
You know the rules and so do I
VTT
WebVTT subtitle format:
WEBVTT
00:00:18.640 --> 00:00:21.880
We're no strangers to love
00:00:22.640 --> 00:00:26.960
You know the rules and so do I
Error Handling
The tool gracefully handles common errors:
- Transcripts disabled: Reports when a video has transcripts turned off
- Video unavailable: Reports when a video is private or deleted
- No transcript found: Reports when no transcript exists in the requested language
Errors are included in the output rather than stopping execution, so batch processing continues even if some videos fail.
Dependencies
Core:
- youtube-transcript-api - Transcript extraction
- yt-dlp - Channel and playlist video listing
Summarization (optional):
- litellm - Unified LLM interface
- python-dotenv - Environment file loading
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yt_transcripts-0.2.0.tar.gz.
File metadata
- Download URL: yt_transcripts-0.2.0.tar.gz
- Upload date:
- Size: 10.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5bafc1cd8846ef5bd5a751062ec69274d04ab86b5fe49f79e6c753eca2dfff04
|
|
| MD5 |
3bae83907af0429f6bd18278843299d7
|
|
| BLAKE2b-256 |
276bb4643e9292fa79cb2fe2cbbfea41ce8002f73a7224b65716fc0ea98eac78
|
Provenance
The following attestation bundles were made for yt_transcripts-0.2.0.tar.gz:
Publisher:
python-publish.yml on yanndebray/yt-transcripts
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yt_transcripts-0.2.0.tar.gz -
Subject digest:
5bafc1cd8846ef5bd5a751062ec69274d04ab86b5fe49f79e6c753eca2dfff04 - Sigstore transparency entry: 790837179
- Sigstore integration time:
-
Permalink:
yanndebray/yt-transcripts@64a08a9f47ffbe56bd9c1a9c49de2eb56199b741 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/yanndebray
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@64a08a9f47ffbe56bd9c1a9c49de2eb56199b741 -
Trigger Event:
release
-
Statement type:
File details
Details for the file yt_transcripts-0.2.0-py3-none-any.whl.
File metadata
- Download URL: yt_transcripts-0.2.0-py3-none-any.whl
- Upload date:
- Size: 11.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97e78a3dd371df6889a1dd24f52dbf39d9f08ba9bb86d195078fdf8990e0582c
|
|
| MD5 |
e1aa8988451551e5e0ac7ffbb1f7f131
|
|
| BLAKE2b-256 |
7fbfa17b12457f67239fad757df7ba2680aca5d957e0618bb5b3bb8e496f67d1
|
Provenance
The following attestation bundles were made for yt_transcripts-0.2.0-py3-none-any.whl:
Publisher:
python-publish.yml on yanndebray/yt-transcripts
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yt_transcripts-0.2.0-py3-none-any.whl -
Subject digest:
97e78a3dd371df6889a1dd24f52dbf39d9f08ba9bb86d195078fdf8990e0582c - Sigstore transparency entry: 790837182
- Sigstore integration time:
-
Permalink:
yanndebray/yt-transcripts@64a08a9f47ffbe56bd9c1a9c49de2eb56199b741 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/yanndebray
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@64a08a9f47ffbe56bd9c1a9c49de2eb56199b741 -
Trigger Event:
release
-
Statement type: