Skip to main content

CLI tool for extracting transcripts from YouTube videos, playlists, and channels

Project description

yt-transcripts 🎼

A Python CLI tool for extracting transcripts from YouTube videos, playlists, and channels.

demo

Installation

pip install -e .

Or install dependencies directly:

pip install youtube-transcript-api yt-dlp

With AI Summarization

To enable AI-powered summarization:

pip install -e ".[summarize]"

Usage

yt-transcripts [OPTIONS] SOURCE...

Sources

The tool accepts multiple source types:

  • Video URL: https://www.youtube.com/watch?v=VIDEO_ID
  • Video ID: dQw4w9WgXcQ
  • Channel URL: https://www.youtube.com/@ChannelName
  • Playlist URL: https://www.youtube.com/playlist?list=PLAYLIST_ID

Options

Option Description
-f, --format Output format: text, json, srt, vtt (default: text)
-l, --language Preferred language code(s), can be specified multiple times (default: en)
-o, --output Output file or directory (default: stdout)
--max-videos Maximum number of videos to process from channel/playlist
--list-only Only list videos without extracting transcripts
-v, --verbose Verbose output
-h, --help Show help message
-s, --summarize Summarize transcripts using AI
--model LiteLLM model string (default: ollama/llama3.2)
--api-key API key for cloud providers
--ollama-host Ollama server URL (default: http://localhost:11434)

Examples

Single Video

# By URL
yt-transcripts "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

# By video ID
yt-transcripts dQw4w9WgXcQ

Multiple Videos

yt-transcripts VIDEO_ID1 VIDEO_ID2 VIDEO_ID3

Output Formats

# Plain text (default)
yt-transcripts VIDEO_ID -f text

# JSON with timestamps and metadata
yt-transcripts VIDEO_ID -f json

# SRT subtitles
yt-transcripts VIDEO_ID -f srt

# WebVTT subtitles
yt-transcripts VIDEO_ID -f vtt

Save to File

# Single file
yt-transcripts VIDEO_ID -o transcript.txt

# Multiple videos to separate files in a directory
yt-transcripts VIDEO_ID1 VIDEO_ID2 -o ./transcripts/

Channels

# List all videos from a channel
yt-transcripts "https://www.youtube.com/@anthropic-ai" --list-only

# Extract transcripts from first 10 videos
yt-transcripts "https://www.youtube.com/@anthropic-ai" --max-videos 10

# Save channel transcripts to directory as JSON
yt-transcripts "https://www.youtube.com/@anthropic-ai" --max-videos 5 -f json -o ./transcripts/

Playlists

# List videos in a playlist
yt-transcripts "https://www.youtube.com/playlist?list=PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf" --list-only

# Extract all transcripts from playlist
yt-transcripts "https://www.youtube.com/playlist?list=PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf"

Language Selection

# Prefer Spanish, fall back to English
yt-transcripts VIDEO_ID -l es -l en

# Prefer French
yt-transcripts VIDEO_ID -l fr

AI Summarization

Summarize transcripts using LLMs. Supports Ollama (local), OpenAI, Anthropic, Gemini, and OpenRouter.

# Using local Ollama (default)
yt-transcripts -s VIDEO_ID

# Specify a model
yt-transcripts -s --model openai/gpt-4o-mini VIDEO_ID

# With API key
yt-transcripts -s --model anthropic/claude-sonnet-4-20250514 --api-key sk-ant-... VIDEO_ID

# Summarize multiple videos to a directory
yt-transcripts -s -o ./summaries/ VIDEO_ID1 VIDEO_ID2

# Summarize a playlist
yt-transcripts -s --max-videos 5 "https://www.youtube.com/playlist?list=PLAYLIST_ID"

Environment Variables

Variable Description Default
YT_SUMMARIZE_MODEL Default LiteLLM model ollama/llama3.2
OLLAMA_HOST Ollama server URL http://localhost:11434
OPENAI_API_KEY OpenAI API key -
ANTHROPIC_API_KEY Anthropic API key -
GEMINI_API_KEY Google Gemini API key -
OPENROUTER_API_KEY OpenRouter API key -

You can also use a .env file in your project directory.

Supported Models

  • Ollama (local): ollama/llama3.2, ollama/mistral, etc.
  • OpenAI: openai/gpt-4o, openai/gpt-4o-mini
  • Anthropic: anthropic/claude-sonnet-4-20250514, anthropic/claude-haiku
  • Gemini: gemini/gemini-1.5-flash, gemini/gemini-1.5-pro
  • OpenRouter: openrouter/meta-llama/llama-3-8b-instruct

Output Formats

Text

Plain text with all segments joined together:

We're no strangers to love You know the rules and so do I...

JSON

Structured data with metadata and timestamps:

{
  "video_id": "dQw4w9WgXcQ",
  "language": "en",
  "is_generated": false,
  "segments": [
    {
      "text": "We're no strangers to love",
      "start": 18.64,
      "duration": 3.24
    }
  ]
}

SRT

Standard subtitle format:

1
00:00:18,640 --> 00:00:21,880
We're no strangers to love

2
00:00:22,640 --> 00:00:26,960
You know the rules and so do I

VTT

WebVTT subtitle format:

WEBVTT

00:00:18.640 --> 00:00:21.880
We're no strangers to love

00:00:22.640 --> 00:00:26.960
You know the rules and so do I

Error Handling

The tool gracefully handles common errors:

  • Transcripts disabled: Reports when a video has transcripts turned off
  • Video unavailable: Reports when a video is private or deleted
  • No transcript found: Reports when no transcript exists in the requested language

Errors are included in the output rather than stopping execution, so batch processing continues even if some videos fail.

Dependencies

Core:

Summarization (optional):

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yt_transcripts-0.2.1.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yt_transcripts-0.2.1-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file yt_transcripts-0.2.1.tar.gz.

File metadata

  • Download URL: yt_transcripts-0.2.1.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yt_transcripts-0.2.1.tar.gz
Algorithm Hash digest
SHA256 b380176a97d4ed606f8caa68e3112a85e682a0c4051bed3068d1c8df4e9d4c40
MD5 aced4ba905288bf3b943a3b7d24fe241
BLAKE2b-256 1b414c2f7872d4b5add186aa4db33a31ceeed7241f89f2890cbc7c23369217d3

See more details on using hashes here.

Provenance

The following attestation bundles were made for yt_transcripts-0.2.1.tar.gz:

Publisher: python-publish.yml on yanndebray/yt-transcripts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yt_transcripts-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: yt_transcripts-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 11.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yt_transcripts-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1006577a30fc5fb1f33d798cfc05c9c127fe7a0e0ef18319005413e0eaea4584
MD5 f62e545f31d55572af4da092038e5f37
BLAKE2b-256 7187bb97cf10f5a538a14b536e2a03ffdf407aaac60d08a154543a4db1c0573e

See more details on using hashes here.

Provenance

The following attestation bundles were made for yt_transcripts-0.2.1-py3-none-any.whl:

Publisher: python-publish.yml on yanndebray/yt-transcripts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page