Skip to main content

CLI tool for extracting transcripts from YouTube videos, playlists, and channels

Project description

yt-transcripts 🎼

A Python CLI tool for extracting transcripts from YouTube videos, playlists, and channels.

Installation

pip install -e .

Or install dependencies directly:

pip install youtube-transcript-api yt-dlp

With AI Summarization

To enable AI-powered summarization:

pip install -e ".[summarize]"

Usage

yt-transcripts [OPTIONS] SOURCE...

Sources

The tool accepts multiple source types:

  • Video URL: https://www.youtube.com/watch?v=VIDEO_ID
  • Video ID: dQw4w9WgXcQ
  • Channel URL: https://www.youtube.com/@ChannelName
  • Playlist URL: https://www.youtube.com/playlist?list=PLAYLIST_ID

Options

Option Description
-f, --format Output format: text, json, srt, vtt (default: text)
-l, --language Preferred language code(s), can be specified multiple times (default: en)
-o, --output Output file or directory (default: stdout)
--max-videos Maximum number of videos to process from channel/playlist
--list-only Only list videos without extracting transcripts
-v, --verbose Verbose output
-h, --help Show help message
-s, --summarize Summarize transcripts using AI
--model LiteLLM model string (default: ollama/llama3.2)
--api-key API key for cloud providers
--ollama-host Ollama server URL (default: http://localhost:11434)

Examples

Single Video

# By URL
yt-transcripts "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

# By video ID
yt-transcripts dQw4w9WgXcQ

Multiple Videos

yt-transcripts VIDEO_ID1 VIDEO_ID2 VIDEO_ID3

Output Formats

# Plain text (default)
yt-transcripts VIDEO_ID -f text

# JSON with timestamps and metadata
yt-transcripts VIDEO_ID -f json

# SRT subtitles
yt-transcripts VIDEO_ID -f srt

# WebVTT subtitles
yt-transcripts VIDEO_ID -f vtt

Save to File

# Single file
yt-transcripts VIDEO_ID -o transcript.txt

# Multiple videos to separate files in a directory
yt-transcripts VIDEO_ID1 VIDEO_ID2 -o ./transcripts/

Channels

# List all videos from a channel
yt-transcripts "https://www.youtube.com/@anthropic-ai" --list-only

# Extract transcripts from first 10 videos
yt-transcripts "https://www.youtube.com/@anthropic-ai" --max-videos 10

# Save channel transcripts to directory as JSON
yt-transcripts "https://www.youtube.com/@anthropic-ai" --max-videos 5 -f json -o ./transcripts/

Playlists

# List videos in a playlist
yt-transcripts "https://www.youtube.com/playlist?list=PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf" --list-only

# Extract all transcripts from playlist
yt-transcripts "https://www.youtube.com/playlist?list=PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf"

Language Selection

# Prefer Spanish, fall back to English
yt-transcripts VIDEO_ID -l es -l en

# Prefer French
yt-transcripts VIDEO_ID -l fr

AI Summarization

Summarize transcripts using LLMs. Supports Ollama (local), OpenAI, Anthropic, Gemini, and OpenRouter.

# Using local Ollama (default)
yt-transcripts -s VIDEO_ID

# Specify a model
yt-transcripts -s --model openai/gpt-4o-mini VIDEO_ID

# With API key
yt-transcripts -s --model anthropic/claude-sonnet-4-20250514 --api-key sk-ant-... VIDEO_ID

# Summarize multiple videos to a directory
yt-transcripts -s -o ./summaries/ VIDEO_ID1 VIDEO_ID2

# Summarize a playlist
yt-transcripts -s --max-videos 5 "https://www.youtube.com/playlist?list=PLAYLIST_ID"

Environment Variables

Variable Description Default
YT_SUMMARIZE_MODEL Default LiteLLM model ollama/llama3.2
OLLAMA_HOST Ollama server URL http://localhost:11434
OPENAI_API_KEY OpenAI API key -
ANTHROPIC_API_KEY Anthropic API key -
GEMINI_API_KEY Google Gemini API key -
OPENROUTER_API_KEY OpenRouter API key -

You can also use a .env file in your project directory.

Supported Models

  • Ollama (local): ollama/llama3.2, ollama/mistral, etc.
  • OpenAI: openai/gpt-4o, openai/gpt-4o-mini
  • Anthropic: anthropic/claude-sonnet-4-20250514, anthropic/claude-haiku
  • Gemini: gemini/gemini-1.5-flash, gemini/gemini-1.5-pro
  • OpenRouter: openrouter/meta-llama/llama-3-8b-instruct

Output Formats

Text

Plain text with all segments joined together:

We're no strangers to love You know the rules and so do I...

JSON

Structured data with metadata and timestamps:

{
  "video_id": "dQw4w9WgXcQ",
  "language": "en",
  "is_generated": false,
  "segments": [
    {
      "text": "We're no strangers to love",
      "start": 18.64,
      "duration": 3.24
    }
  ]
}

SRT

Standard subtitle format:

1
00:00:18,640 --> 00:00:21,880
We're no strangers to love

2
00:00:22,640 --> 00:00:26,960
You know the rules and so do I

VTT

WebVTT subtitle format:

WEBVTT

00:00:18.640 --> 00:00:21.880
We're no strangers to love

00:00:22.640 --> 00:00:26.960
You know the rules and so do I

Error Handling

The tool gracefully handles common errors:

  • Transcripts disabled: Reports when a video has transcripts turned off
  • Video unavailable: Reports when a video is private or deleted
  • No transcript found: Reports when no transcript exists in the requested language

Errors are included in the output rather than stopping execution, so batch processing continues even if some videos fail.

Dependencies

Core:

Summarization (optional):

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yt_transcripts-0.2.0.tar.gz (10.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yt_transcripts-0.2.0-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file yt_transcripts-0.2.0.tar.gz.

File metadata

  • Download URL: yt_transcripts-0.2.0.tar.gz
  • Upload date:
  • Size: 10.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yt_transcripts-0.2.0.tar.gz
Algorithm Hash digest
SHA256 5bafc1cd8846ef5bd5a751062ec69274d04ab86b5fe49f79e6c753eca2dfff04
MD5 3bae83907af0429f6bd18278843299d7
BLAKE2b-256 276bb4643e9292fa79cb2fe2cbbfea41ce8002f73a7224b65716fc0ea98eac78

See more details on using hashes here.

Provenance

The following attestation bundles were made for yt_transcripts-0.2.0.tar.gz:

Publisher: python-publish.yml on yanndebray/yt-transcripts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yt_transcripts-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: yt_transcripts-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 11.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yt_transcripts-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 97e78a3dd371df6889a1dd24f52dbf39d9f08ba9bb86d195078fdf8990e0582c
MD5 e1aa8988451551e5e0ac7ffbb1f7f131
BLAKE2b-256 7fbfa17b12457f67239fad757df7ba2680aca5d957e0618bb5b3bb8e496f67d1

See more details on using hashes here.

Provenance

The following attestation bundles were made for yt_transcripts-0.2.0-py3-none-any.whl:

Publisher: python-publish.yml on yanndebray/yt-transcripts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page