Skip to main content

Extract video information and transcripts from YouTube videos, playlists, and channels

Project description

yt-dlp-transcripts

A powerful Python tool for extracting video information and transcripts from YouTube videos, playlists, channels, and channel playlists. Built on top of yt-dlp and youtube-transcript-api.

Features

  • 📹 Single Video Processing - Extract metadata and transcripts from individual YouTube videos
  • 📚 Playlist Support - Process entire playlists with progress tracking
  • 📺 Channel Videos - Download information from all videos on a channel
  • 🗂️ Channel Playlists - Process all playlists from a channel
  • 🔄 Resume Capability - Automatically skip already processed videos
  • 🎯 Auto-Detection - Automatically detect URL type (video/playlist/channel)
  • 📊 Rich Metadata - Extract title, description, upload date, duration, view count, and more
  • 📝 Transcript Extraction - Get full video transcripts when available
  • 💾 CSV Export - Save all data in easily accessible CSV format

Installation

Via pip

pip install yt-dlp-transcripts

# As a command-line tool (after pip install)
yt-dlp-transcripts -u "https://www.youtube.com/watch?v=VIDEO_ID" -o video.csv

To run from source

git clone https://github.com/yourusername/yt-dlp-transcripts.git
cd yt-dlp-transcripts
poetry install
poetry shell

# With poetry (after poetry install and poetry shell)
python -m yt_dlp_transcripts -u "https://www.youtube.com/watch?v=VIDEO_ID" -o video.csv

Usage

yt-dlp-transcripts -u "https://www.youtube.com/watch?v=VIDEO_ID" -o output.csv
yt-dlp-transcripts -u "https://www.youtube.com/playlist?list=PLAYLIST_ID" -o output.csv
yt-dlp-transcripts -u "https://www.youtube.com/@channelname/videos" -o output.csv
yt-dlp-transcripts -u "https://www.youtube.com/@channelname/playlists" -o output.csv

Options

Option Short Description Example
--url -u YouTube URL (auto-detects type) https://youtube.com/...
--output -o Output CSV file path output.csv
--help Show help message (flag, no value)

Output Format

The tool exports data to CSV with the following fields:

Common Fields

  • video_id - YouTube video ID
  • title - Video title
  • url - Video URL
  • description - Video description
  • transcript - Full video transcript (when available)
  • upload_date - Upload date (YYYYMMDD format)
  • duration - Video duration in seconds
  • view_count - Number of views
  • channel - Channel name
  • channel_id - Channel ID

Additional Fields for Playlists

  • playlist_name - Name of the source playlist
  • playlist_url - URL of the source playlist

Additional Fields for Channel Videos

  • channel_source_url - URL of the channel page

Examples

Research and Analysis

# Analyze a conference talk playlist
yt-dlp-transcripts -u "https://www.youtube.com/playlist?list=PLconf2024" -o conference_talks.csv

# Extract all videos from an educational channel
yt-dlp-transcripts -u "https://www.youtube.com/@3blue1brown/videos" -o math_videos.csv

Content Creation

# Get transcripts from your competitor's channel
yt-dlp-transcripts -u "https://www.youtube.com/@competitor/videos" -o competitor_analysis.csv

# Archive your own channel's content
yt-dlp-transcripts -u "https://www.youtube.com/@yourchannel/videos" -o my_backup.csv

Academic Research

# Collect lecture series for analysis
yt-dlp-transcripts -u "https://www.youtube.com/playlist?list=PLlecture" -o lectures.csv

# Get transcripts from multiple related playlists
yt-dlp-transcripts -u "https://www.youtube.com/@university/playlists" -o all_courses.csv

Python API Usage

from yt_dlp_transcripts import (
    get_video_info,
    process_single_video,
    process_playlist,
    process_channel,
    detect_url_type
)

# Get video information as dictionary
video_data = get_video_info("https://www.youtube.com/watch?v=VIDEO_ID")
print(video_data['title'])
print(video_data['transcript'])
print(video_data['duration'])

# Process content and save to CSV
process_single_video("https://www.youtube.com/watch?v=VIDEO_ID", "output.csv")
process_playlist("https://www.youtube.com/playlist?list=PLAYLIST_ID", "output.csv")
process_channel("https://www.youtube.com/@channel/videos", "output.csv", mode='videos')

# Auto-detect URL type
url_type = detect_url_type("https://www.youtube.com/watch?v=VIDEO_ID")  # Returns: 'video'

Features in Detail

Resume Capability

The tool automatically tracks processed videos and skips them on subsequent runs. This allows you to:

  • Interrupt and resume large downloads
  • Update your dataset with only new videos
  • Avoid redundant API calls

Progress Tracking

When processing multiple videos, the tool shows:

  • Current video number and total count
  • Video title being processed
  • Success/skip status for each video

Error Handling

  • Gracefully handles missing transcripts
  • Continues processing even if individual videos fail
  • Provides clear error messages for troubleshooting

Rate Limiting

The tool respects YouTube's rate limits. If you encounter 429 errors:

  • The tool will continue processing and get available metadata
  • Transcripts may be unavailable during rate limiting
  • Consider adding delays or processing in smaller batches

Requirements

  • Python 3.9+
  • yt-dlp
  • youtube-transcript-api
  • click

Limitations

  • Transcript Availability: Not all videos have transcripts available
  • Rate Limiting: YouTube may rate limit requests with large datasets
  • Private Videos: Cannot access private or age-restricted content without authentication
  • API Changes: YouTube's API may change, affecting functionality

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Running Tests

pytest

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Author

Shawn Anderson

Support

If you encounter any issues or have questions:

  • Open an issue on GitHub
  • Check existing issues for solutions
  • Provide detailed error messages and URLs (when possible) for debugging

Changelog

v0.1.0 (2025-08-28)

  • Initial release
  • Support for videos, playlists, channels, and channel playlists
  • Auto-detection of URL types
  • Resume capability for interrupted downloads
  • CSV export with comprehensive metadata

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yt_dlp_transcripts-0.1.0.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yt_dlp_transcripts-0.1.0-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file yt_dlp_transcripts-0.1.0.tar.gz.

File metadata

  • Download URL: yt_dlp_transcripts-0.1.0.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.12 Linux/6.12.10-76061203-generic

File hashes

Hashes for yt_dlp_transcripts-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0db286281460015eff89df202157a646aaa8f37844b69c8c3943713598298211
MD5 08b1ad992e7246ac8832d634b727be8f
BLAKE2b-256 650dd579dd5c43e869afe5091f66896d0f68010e4cd2842385ed36858087469f

See more details on using hashes here.

File details

Details for the file yt_dlp_transcripts-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: yt_dlp_transcripts-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.12 Linux/6.12.10-76061203-generic

File hashes

Hashes for yt_dlp_transcripts-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f46e64ea5992c10fc14afc6499372ec5b0cef17b67318a212526991fa70e6a8f
MD5 a85a156586c2ed83ef07706436c04ef9
BLAKE2b-256 67d1dc1a622508fd0129c817c58e3cbd64689b85046bfaea22b1080c68891078

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page