Download YouTube transcripts and manage channel archives
Project description
YTScriber
Download YouTube transcripts and manage channel archives with a unified CLI.
Features
- ๐น Extract videos from any YouTube channel
- ๐ Download transcripts with metadata (title, author, duration, etc.)
- ๐ Save as markdown files with YAML frontmatter for easy processing
- ๐ Track progress in CSV files to resume interrupted downloads
- Cross-platform data directories via
platformdirs - Unified CLI with subcommands (
ytscriber download,extract,sync-all)
Installation
From PyPI
pip install ytscriber
From source (recommended for development)
git clone https://github.com/dparedesi/YTScribe.git
cd YTScribe
pip install -e .
Development installation
pip install -e ".[dev]"
Quick Start
# Extract videos from a conference channel
ytscriber extract https://www.youtube.com/@AWSEventsChannel/videos \
--count 100 \
--folder aws-reinvent-2025 \
--register-channel
# Download transcripts
ytscriber download --folder aws-reinvent-2025
Usage
Extract videos from a channel
ytscriber extract <channel_url> --count <number> --folder <folder>
Examples:
# AWS re:Invent 2025
ytscriber extract https://www.youtube.com/@AWSEventsChannel/videos \
--count 100 \
--folder aws-reinvent-2025 \
--register-channel
# PyCon US
ytscriber extract https://www.youtube.com/@PyConUS \
--count 50 \
--folder pycon-2024 \
--register-channel
# KubeCon
ytscriber extract https://www.youtube.com/@cncf/videos \
--count 75 \
--folder kubecon-2024 \
--register-channel
Options:
| Option | Description | Default |
|---|---|---|
--count, -n |
Number of latest videos to extract | 10 |
--folder |
Folder under data dir (shorthand for CSV) | - |
--append-csv |
Create or append to CSV file | - |
--output, -o |
Save video IDs to text file | - |
--register-channel |
Add channel to channels.yaml | False |
--verbose, -v |
Enable verbose output | False |
Download transcripts
ytscriber download --folder <folder>
Examples:
# Download transcripts for AWS re:Invent
ytscriber download --folder aws-reinvent-2025
# With faster processing (shorter delay)
ytscriber download --folder pycon-2024 --delay 30
Options:
| Option | Description | Default |
|---|---|---|
--folder |
Folder under data dir (shorthand for CSV/output) | - |
--csv |
Input CSV file with video URLs | - |
--output-dir |
Directory for transcript files | outputs |
--delay |
Seconds between requests | 60 |
--languages, -l |
Language codes to try | en en-US en-GB |
--verbose, -v |
Enable verbose output | False |
Single video mode:
ytscriber download https://www.youtube.com/watch?v=VIDEO_ID --output transcript.md
Add a video to a collection
ytscriber add <youtube_url> --folder <collection>
Sync all channels
ytscriber sync-all
Download all transcripts
ytscriber download-all
View or edit config
ytscriber config
ytscriber config --set defaults.delay=45
Status
ytscriber status
Output Format
Transcript files (Markdown with YAML frontmatter)
Each transcript is saved as a Markdown file with rich metadata in the YAML frontmatter:
---
video_id: i_cskqmWA3U
video_url: https://www.youtube.com/watch?v=i_cskqmWA3U
title: Personalize ChatGPT with custom instructions
author: OpenAI
published_date: 2025-05-12
length_minutes: 2.17
views: 60882
description: "With custom instructions, ChatGPT becomes more relevant..."
is_generated: True
is_translatable: True
---
You can find memory and custom instructions by clicking on your profile
in the top right, clicking on settings, and then personalization...
Files are named with the publish date for easy sorting: 2025-05-12-i_cskqmWA3U.md
AI Summarization Setup
To use the AI summarization features, you need an API key from OpenRouter.
- Get an API Key: Sign up at OpenRouter and create a key.
- Configure Environment:
Create a
.envfile in the project root:cp .env.example .env
Add your key:OPENROUTER_API_KEY=sk-or-your-key-here
- Recommended Model:
By default, the tool uses
xiaomi/mimo-v2-flash:free, which is free and fast. You can change this using the--modelflag. You can also set defaults withytscriber config --set summarization.model=....
Summarize transcripts
ytscriber summarize <folder_name> [options]
Examples:
# Summarize random folder
ytscriber summarize random
# Summarize all folders
ytscriber summarize --all
# Dry run to preview changes
ytscriber summarize random --dry-run
Options:
| Option | Description | Default |
|---|---|---|
--all |
Process all folders | False |
--dry-run |
Preview changes without writing | False |
--force |
overwrite existing summaries | False |
--model |
OpenRouter model to use | xiaomi/mimo-v2-flash:free |
CSV tracking format
url,title,duration_minutes,view_count,description,transcript_downloaded,summary_done
https://youtube.com/watch?v=...,Talk Title,45.5,1234,Description...,success,
| Column | Description |
|---|---|
transcript_downloaded |
Status: success, error: <reason>, or empty |
summary_done |
Track if you've processed the transcript |
Project Structure
YTScribe/
โโโ src/
โ โโโ ytscriber/
โ โโโ __init__.py # Package exports
โ โโโ cli.py # Command-line interface
โ โโโ downloader.py # Transcript downloading
โ โโโ extractor.py # Channel video extraction
โ โโโ csv_handler.py # CSV operations
โ โโโ metadata.py # Video metadata extraction
โ โโโ models.py # Data models
โ โโโ exceptions.py # Custom exceptions
โ โโโ logging_config.py # Logging setup
โ โโโ utils.py # Utility functions
โโโ scripts/ # Automation scripts
โโโ tests/ # Unit tests
โโโ prompts/ # AI prompts for analysis
โโโ examples/ # Example shell scripts
โโโ pyproject.toml # Project configuration
โโโ README.md
Data Organization
By default, data is stored in:
- macOS/Windows:
~/Documents/YTScriber - Linux:
~/ytscriber
Example structure:
YTScriber/
โโโ aws-reinvent-2025/
โ โโโ videos.csv
โ โโโ transcripts/
โ โโโ 2025-12-03-abc123xyz.md
โ โโโ 2025-12-03-def456uvw.md
โโโ pycon-2024/
โ โโโ videos.csv
โ โโโ transcripts/
โโโ kubecon-eu-2024/
โโโ videos.csv
โโโ transcripts/
Migration (1.x to 2.0)
If you used the old transcript-* commands and a repo-local data/ folder:
- Move your existing
data/folder into the new data directory above. - Copy
channels.yamlinto the same data directory if you rely on sync-all. - Update commands:
transcript-extract->ytscriber extracttranscript-download->ytscriber downloadtranscript-add->ytscriber addtranscript-summarize->ytscriber summarize
- Optionally set defaults with
ytscriber config --set defaults.delay=....
Rate Limiting & Best Practices
YouTube may rate limit or block your IP if you make too many requests:
- Use reasonable delays: Default 60 seconds between requests is safe
- Resume capability: Script tracks progress in CSV, can resume after interruption
- Start small: Test with 10-20 videos before large batches
- Respect limits: If you get blocked, wait 30-60 minutes before retrying
Development
Setup
# Clone repository
git clone https://github.com/dparedesi/YTScribe.git
cd YTScribe
# Install with development dependencies
pip install -e ".[dev]"
# Install pre-commit hooks (optional)
pre-commit install
Running tests
# Run all tests
pytest
# Run with coverage
pytest --cov=ytscriber
# Run specific test file
pytest tests/test_utils.py
Code quality
# Format code
black src tests
# Lint code
ruff check src tests
# Type checking
mypy src
Programmatic Usage
from ytscriber import TranscriptDownloader, ChannelExtractor
# Extract videos from a channel
extractor = ChannelExtractor()
videos = extractor.extract_videos(
"https://www.youtube.com/@AWSEventsChannel/videos",
max_videos=10
)
# Download transcripts
downloader = TranscriptDownloader(
languages=["en", "en-US"],
delay=30,
output_dir="transcripts"
)
for video in videos:
result = downloader.download(video.video_id, video.url)
if result.success:
print(f"Downloaded: {video.title}")
else:
print(f"Failed: {result.error_message}")
Troubleshooting
"No transcript found"
- Video may not have captions/transcripts available
- Try with different language codes:
--languages en en-US auto
"IP blocked" or rate limiting
- Wait 30-60 minutes before retrying
- Increase delay:
--delay 120 - Use different network/IP if persistent
"Could not extract metadata"
- Transcript will still download, just without extra metadata
- Check if video is accessible and not private
Script interrupted
- Just run the same command again - it will skip already downloaded videos
- Progress is saved to CSV after each video
Requirements
- Python 3.9+
- youtube-transcript-api
- yt-dlp
- pytube
License
MIT License - see LICENSE for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Run tests and linting (
pytest && ruff check .) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ytscriber-1.2.0.tar.gz.
File metadata
- Download URL: ytscriber-1.2.0.tar.gz
- Upload date:
- Size: 42.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f455bd0a3c38c2aa9ed44ae88b869b675eb601f580ef0ccaaecd59b74a7d187
|
|
| MD5 |
feaf79ad29df4dd7a348e13ed30ce5ed
|
|
| BLAKE2b-256 |
15760435b39aaa1d6c24d6783cf201cbefb6b09bc2f6c679aeef745d12610872
|
File details
Details for the file ytscriber-1.2.0-py3-none-any.whl.
File metadata
- Download URL: ytscriber-1.2.0-py3-none-any.whl
- Upload date:
- Size: 36.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a9c19e0ed4a5e7d31c74fe414ccd7cee6ffd23b73099786391bd25ed2a975f3
|
|
| MD5 |
539d0f9e1b159a9a7825e2270d72e293
|
|
| BLAKE2b-256 |
19ac00ddeabc4956a34fdd5cc9e7c84580bcec7fe7552a917fa7176c22f021e4
|