Extract video information and transcripts from YouTube videos, playlists, and channels
Project description
yt-dlp-transcripts
A powerful Python tool for extracting video information and transcripts from YouTube videos, playlists, channels, and channel playlists. Built on top of yt-dlp and youtube-transcript-api.
Features
- 📹 Single Video Processing - Extract metadata and transcripts from individual YouTube videos
- 📚 Playlist Support - Process entire playlists with progress tracking
- 📺 Channel Videos - Download information from all videos on a channel
- 🗂️ Channel Playlists - Process all playlists from a channel
- 🔄 Resume Capability - Automatically skip already processed videos
- 🎯 Auto-Detection - Automatically detect URL type (video/playlist/channel)
- 📊 Rich Metadata - Extract title, description, upload date, duration, view count, and more
- 📝 Transcript Extraction - Get full video transcripts when available
- 💾 CSV Export - Save all data in easily accessible CSV format
Installation
Via pip
pip install yt-dlp-transcripts
# As a command-line tool (after pip install)
yt-dlp-transcripts -u "https://www.youtube.com/watch?v=VIDEO_ID" -o video.csv
To run from source
git clone https://github.com/yourusername/yt-dlp-transcripts.git
cd yt-dlp-transcripts
poetry install
poetry shell
# With poetry (after poetry install and poetry shell)
python -m yt_dlp_transcripts -u "https://www.youtube.com/watch?v=VIDEO_ID" -o video.csv
Usage
yt-dlp-transcripts -u "https://www.youtube.com/watch?v=VIDEO_ID" -o output.csv
yt-dlp-transcripts -u "https://www.youtube.com/playlist?list=PLAYLIST_ID" -o output.csv
yt-dlp-transcripts -u "https://www.youtube.com/@channelname/videos" -o output.csv
yt-dlp-transcripts -u "https://www.youtube.com/@channelname/playlists" -o output.csv
Options
| Option | Short | Description | Example |
|---|---|---|---|
--url |
-u |
YouTube URL (auto-detects type) | https://youtube.com/... |
--output |
-o |
Output CSV file path | output.csv |
--help |
Show help message | (flag, no value) |
Output Format
The tool exports data to CSV with the following fields:
Common Fields
video_id- YouTube video IDtitle- Video titleurl- Video URLdescription- Video descriptiontranscript- Full video transcript (when available)upload_date- Upload date (YYYYMMDD format)duration- Video duration in secondsview_count- Number of viewschannel- Channel namechannel_id- Channel ID
Additional Fields for Playlists
playlist_name- Name of the source playlistplaylist_url- URL of the source playlist
Additional Fields for Channel Videos
channel_source_url- URL of the channel page
Examples
Research and Analysis
# Analyze a conference talk playlist
yt-dlp-transcripts -u "https://www.youtube.com/playlist?list=PLconf2024" -o conference_talks.csv
# Extract all videos from an educational channel
yt-dlp-transcripts -u "https://www.youtube.com/@3blue1brown/videos" -o math_videos.csv
Content Creation
# Get transcripts from your competitor's channel
yt-dlp-transcripts -u "https://www.youtube.com/@competitor/videos" -o competitor_analysis.csv
# Archive your own channel's content
yt-dlp-transcripts -u "https://www.youtube.com/@yourchannel/videos" -o my_backup.csv
Academic Research
# Collect lecture series for analysis
yt-dlp-transcripts -u "https://www.youtube.com/playlist?list=PLlecture" -o lectures.csv
# Get transcripts from multiple related playlists
yt-dlp-transcripts -u "https://www.youtube.com/@university/playlists" -o all_courses.csv
Python API Usage
from yt_dlp_transcripts import (
get_video_info,
process_single_video,
process_playlist,
process_channel,
detect_url_type
)
# Get video information as dictionary
video_data = get_video_info("https://www.youtube.com/watch?v=VIDEO_ID")
print(video_data['title'])
print(video_data['transcript'])
print(video_data['duration'])
# Process content and save to CSV
process_single_video("https://www.youtube.com/watch?v=VIDEO_ID", "output.csv")
process_playlist("https://www.youtube.com/playlist?list=PLAYLIST_ID", "output.csv")
process_channel("https://www.youtube.com/@channel/videos", "output.csv", mode='videos')
# Auto-detect URL type
url_type = detect_url_type("https://www.youtube.com/watch?v=VIDEO_ID") # Returns: 'video'
Features in Detail
Resume Capability
The tool automatically tracks processed videos and skips them on subsequent runs. This allows you to:
- Interrupt and resume large downloads
- Update your dataset with only new videos
- Avoid redundant API calls
Progress Tracking
When processing multiple videos, the tool shows:
- Current video number and total count
- Video title being processed
- Success/skip status for each video
Error Handling
- Gracefully handles missing transcripts
- Continues processing even if individual videos fail
- Provides clear error messages for troubleshooting
Rate Limiting
The tool respects YouTube's rate limits. If you encounter 429 errors:
- The tool will continue processing and get available metadata
- Transcripts may be unavailable during rate limiting
- Consider adding delays or processing in smaller batches
Requirements
- Python 3.9+
- yt-dlp
- youtube-transcript-api
- click
Limitations
- Transcript Availability: Not all videos have transcripts available
- Rate Limiting: YouTube may rate limit requests with large datasets
- Private Videos: Cannot access private or age-restricted content without authentication
- API Changes: YouTube's API may change, affecting functionality
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Running Tests
pytest
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Built on top of yt-dlp
- Transcript extraction via youtube-transcript-api
- CLI interface powered by click
Author
Support
If you encounter any issues or have questions:
- Open an issue on GitHub
- Check existing issues for solutions
- Provide detailed error messages and URLs (when possible) for debugging
Changelog
v0.1.0 (2025-08-28)
- Initial release
- Support for videos, playlists, channels, and channel playlists
- Auto-detection of URL types
- Resume capability for interrupted downloads
- CSV export with comprehensive metadata
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yt_dlp_transcripts-0.1.1.tar.gz.
File metadata
- Download URL: yt_dlp_transcripts-0.1.1.tar.gz
- Upload date:
- Size: 10.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.10.12 Linux/6.12.10-76061203-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7420c97612fab8ceea5562abddbe074ea5890ca5e17295bbb46b475d86c75751
|
|
| MD5 |
52616ea59aa4f9b65c95d81b1f474cbc
|
|
| BLAKE2b-256 |
b94c2337bb4dcb6aa32bd1e0e5a0dabac9d97d0f07aab072611a845abeb69551
|
File details
Details for the file yt_dlp_transcripts-0.1.1-py3-none-any.whl.
File metadata
- Download URL: yt_dlp_transcripts-0.1.1-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.10.12 Linux/6.12.10-76061203-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c4307025905fa48c705778acf07b74ec3247ccf8009422290cc52dd91437c34
|
|
| MD5 |
8b216f49ae4d1adb1115a1a76e881621
|
|
| BLAKE2b-256 |
0c4213b10be9afc939dedf1e18cdb3a1a162363bc12c221d14ea6c1cd884e943
|