Skip to main content

Download YouTube video transcripts using yt-dlp

Project description

yt-transcript-dl

Download YouTube video transcripts in multiple formats using yt-dlp.

Features

  • Multiple output formats: TXT, SRT, VTT, JSON
  • Download from videos, channels, and playlists
  • Incremental sync with state tracking
  • Configuration file support (TOML)
  • Batch processing
  • Custom filename patterns
  • Video metadata inclusion
  • Language selection
  • Retry logic and rate limiting

Installation

pip install yt-transcript-dl

Or install from source:

git clone https://github.com/rk/yt-transcript-dl.git
cd yt-transcript-dl
pip install -e .

Quick Start

Download a single video transcript

yt-transcript-dl https://youtube.com/watch?v=VIDEO_ID

Download in SRT format

yt-transcript-dl https://youtube.com/watch?v=VIDEO_ID --format srt

Download entire channel

yt-transcript-dl https://youtube.com/@channelname -o ./transcripts

Usage

yt-transcript-dl [OPTIONS] [URL]

Output Formats

Use --format (or -f) to specify output format:

  • txt - Plain text (default)
  • srt - SubRip subtitle format
  • vtt - WebVTT subtitle format
  • json - JSON with segments and metadata
  • all - Generate all formats
# SRT format (for video players)
yt-transcript-dl URL --format srt

# All formats at once
yt-transcript-dl URL --format all

Incremental Sync

Skip already downloaded videos using sync state:

# First download
yt-transcript-dl https://youtube.com/@channel -o ./channel

# Later: only download new videos
yt-transcript-dl https://youtube.com/@channel -o ./channel --sync

Sync options:

  • --sync - Only download videos newer than last sync
  • --overwrite - Force re-download existing files
  • --force-full - Ignore sync state and download all

Configuration Files

Create a configuration file to set defaults:

# Generate sample config (global)
yt-transcript-dl --init-config ~/.config/yt-transcript-dl/config.toml

# Or create project-specific config
yt-transcript-dl --init-config .yt-transcript-dl.toml

Configuration locations (checked in order):

  1. ./.yt-transcript-dl.toml (project-specific, highest priority)
  2. ~/.config/yt-transcript-dl/config.toml (global user config)

CLI flags override config file settings.

Example config:

lang = "en"
format = "srt"
output_dir = "./transcripts"
include_metadata = true
embed_description = true
filename_pattern = "{channel}_{date}_{title}"
retry = 5
delay = 1.0

See CONFIG_EXAMPLES.md for comprehensive configuration examples and use cases.

Options:

  • --init-config PATH - Create sample configuration file at specified path
  • --no-config - Ignore all configuration files

Options

Basic Options

  -l, --lang TEXT          Language code for transcript (default: en)
  -o, --output-dir PATH    Output directory (default: current directory)
  -m, --include-metadata   Include video metadata in output file
  -d, --description        Save video description to separate file
  --embed-description      Include video description in transcript file (txt/json only)
  -p, --filename-pattern   Custom filename pattern (tokens: {title}, {channel}, {date}, {id})

Batch Processing

  -i, --input-file PATH    File containing list of URLs (one per line)

Output Formats

  -f, --format [txt|srt|vtt|json|all]
                          Output format (default: txt)

Sync Options

  --overwrite             Force re-download of existing files
  --sync                  Only download videos newer than last sync
  --force-full            Ignore sync state and download all videos

Advanced Options

  -v, --verbose           Enable verbose logging
  --log-file PATH         Save logs to file
  --retry INTEGER         Number of retry attempts for failed downloads (default: 3)
  --delay FLOAT           Delay in seconds between requests (default: 0)

Configuration

  --init-config PATH      Create sample configuration file
  --no-config             Ignore configuration files

Utility

  -V, --version           Show version and exit
  --help                  Show help message and exit

Examples

See examples/EXAMPLES.md for comprehensive examples.

Basic Examples

# Download with Spanish subtitles
yt-transcript-dl https://youtube.com/watch?v=xxxxx --lang es

# Save to specific directory with metadata
yt-transcript-dl https://youtube.com/watch?v=xxxxx -o ./transcripts -m

# Download playlist in SRT format
yt-transcript-dl "https://youtube.com/playlist?list=PLxxx" --format srt

# Batch process URLs with custom naming
yt-transcript-dl --input-file urls.txt \
  --filename-pattern "{channel}_{date}_{title}" \
  -o ./batch

Advanced Examples

# Archive channel with all formats and metadata
yt-transcript-dl https://youtube.com/@channel \
  --format all \
  --include-metadata \
  --description \
  --delay 1 \
  -o ./archive

# Incremental channel sync
yt-transcript-dl https://youtube.com/@channel -o ./channel --sync

Output

Plain Text (TXT)

Clean transcript text, optionally with metadata header.

SubRip (SRT)

Standard subtitle format with timing:

1
00:00:00,000 --> 00:00:05,000
First subtitle segment

2
00:00:05,000 --> 00:00:10,000
Second subtitle segment

WebVTT (VTT)

Web Video Text Tracks format:

WEBVTT

00:00:00.000 --> 00:00:05.000
First subtitle segment

00:00:05.000 --> 00:00:10.000
Second subtitle segment

JSON

Structured format with segments and metadata:

{
  "segments": [
    {
      "start": 0.0,
      "end": 5.0,
      "text": "First subtitle segment"
    }
  ],
  "metadata": {
    "title": "Video Title",
    "channel": "Channel Name",
    "url": "https://youtube.com/watch?v=...",
    "language": "en",
    "is_auto_generated": false
  }
}

Requirements

  • Python 3.10+
  • yt-dlp
  • click
  • tomli (Python <3.11 only)

Troubleshooting

No subtitles available

Some videos don't have captions. Try:

  • Using --lang auto for auto-generated subtitles (coming in future release)
  • Checking if the video has captions on YouTube

Rate limiting

If downloading many videos, use --delay:

yt-transcript-dl --input-file urls.txt --delay 2

Failed downloads

Increase retry attempts:

yt-transcript-dl URL --retry 5

Enable verbose logging to see detailed errors:

yt-transcript-dl URL --verbose

Development

Running Tests

pip install -e ".[dev]"
pytest

Project Structure

yt_transcript_dl/
├── cli.py           # Command-line interface
├── downloader.py    # Core download logic
├── formatters.py    # Output format handlers
├── sync_state.py    # Incremental sync tracking
├── config.py        # Configuration file support
└── utils.py         # Utility functions

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Changelog

See CHANGELOG.md for version history.

Related Projects

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ytdl_transcript-0.1.0.tar.gz (28.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ytdl_transcript-0.1.0-py3-none-any.whl (21.7 kB view details)

Uploaded Python 3

File details

Details for the file ytdl_transcript-0.1.0.tar.gz.

File metadata

  • Download URL: ytdl_transcript-0.1.0.tar.gz
  • Upload date:
  • Size: 28.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for ytdl_transcript-0.1.0.tar.gz
Algorithm Hash digest
SHA256 95030016a38c529b83174fe73d9faf5036f7474e2c7c585cfff034d0fb907910
MD5 408fd9a279904f50469c1e2b8f314557
BLAKE2b-256 4cd7598281313fcf3376826ba4af0a40ba5cb21aed0a2eb67d2843f333ef8f6e

See more details on using hashes here.

File details

Details for the file ytdl_transcript-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ytdl_transcript-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8864b4331286787f61eb79be5600629f03228d4a0496cc274df1c9108e7e63cd
MD5 89e04b2dd3410f14b335216d7f27c74a
BLAKE2b-256 afd13ac2c7f48d88d3b1a36387c918cb2e67d2e91b656f1d8e5e048e779d91c3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page