Download YouTube transcripts and manage channel archives

These details have not been verified by PyPI

Project links

Project description

YTScriber

Download YouTube transcripts and manage channel archives with a unified CLI.

Features

📹 Extract videos from any YouTube channel
📝 Download transcripts with metadata (title, author, duration, etc.)
📄 Save as markdown files with YAML frontmatter for easy processing
🔄 Track progress in CSV files to resume interrupted downloads
Cross-platform data directories via platformdirs
Unified CLI with subcommands (ytscriber download, extract, sync-all)

Installation

From PyPI

pip install ytscriber

From source (recommended for development)

git clone https://github.com/dparedesi/YTScribe.git
cd YTScribe
pip install -e .

Development installation

pip install -e ".[dev]"

Quick Start

# Extract videos from a conference channel
ytscriber extract https://www.youtube.com/@AWSEventsChannel/videos \
  --count 100 \
  --folder aws-reinvent-2025 \
  --register-channel

# Download transcripts
ytscriber download --folder aws-reinvent-2025

Usage

Extract videos from a channel

ytscriber extract <channel_url> --count <number> --folder <folder>

Examples:

# AWS re:Invent 2025
ytscriber extract https://www.youtube.com/@AWSEventsChannel/videos \
  --count 100 \
  --folder aws-reinvent-2025 \
  --register-channel

# PyCon US
ytscriber extract https://www.youtube.com/@PyConUS \
  --count 50 \
  --folder pycon-2024 \
  --register-channel

# KubeCon
ytscriber extract https://www.youtube.com/@cncf/videos \
  --count 75 \
  --folder kubecon-2024 \
  --register-channel

Options:

Option	Description	Default
`--count, -n`	Number of latest videos to extract	10
`--folder`	Folder under data dir (shorthand for CSV)	-
`--append-csv`	Create or append to CSV file	-
`--output, -o`	Save video IDs to text file	-
`--register-channel`	Add channel to channels.yaml	False
`--verbose, -v`	Enable verbose output	False

Download transcripts

ytscriber download --folder <folder>

Examples:

# Download transcripts for AWS re:Invent
ytscriber download --folder aws-reinvent-2025

# With faster processing (shorter delay)
ytscriber download --folder pycon-2024 --delay 30

Options:

Option	Description	Default
`--folder`	Folder under data dir (shorthand for CSV/output)	-
`--csv`	Input CSV file with video URLs	-
`--output-dir`	Directory for transcript files	outputs
`--delay`	Seconds between requests	60
`--languages, -l`	Language codes to try	en en-US en-GB
`--verbose, -v`	Enable verbose output	False

Single video mode:

ytscriber download https://www.youtube.com/watch?v=VIDEO_ID --output transcript.md

Add a video to a collection

ytscriber add <youtube_url> --folder <collection>

Sync all channels

ytscriber sync-all

Download all transcripts

ytscriber download-all

View or edit config

ytscriber config
ytscriber config --set defaults.delay=45

Status

ytscriber status

Output Format

Transcript files (Markdown with YAML frontmatter)

Each transcript is saved as a Markdown file with rich metadata in the YAML frontmatter:

---
video_id: i_cskqmWA3U
video_url: https://www.youtube.com/watch?v=i_cskqmWA3U
title: Personalize ChatGPT with custom instructions
author: OpenAI
published_date: 2025-05-12
length_minutes: 2.17
views: 60882
description: "With custom instructions, ChatGPT becomes more relevant..."
is_generated: True
is_translatable: True
---

You can find memory and custom instructions by clicking on your profile
in the top right, clicking on settings, and then personalization...

Files are named with the publish date for easy sorting: 2025-05-12-i_cskqmWA3U.md

AI Summarization Setup

To use the AI summarization features, you need an API key from OpenRouter.

Get an API Key: Sign up at OpenRouter and create a key.
Configure Environment: Create a .env file in the project root:
```
cp .env.example .env
```
Add your key:
```
OPENROUTER_API_KEY=sk-or-your-key-here
```
Recommended Model: By default, the tool uses xiaomi/mimo-v2-flash:free, which is free and fast. You can change this using the --model flag. You can also set defaults with ytscriber config --set summarization.model=....

Summarize transcripts

ytscriber summarize <folder_name> [options]

Examples:

# Summarize random folder
ytscriber summarize random

# Summarize all folders
ytscriber summarize --all

# Dry run to preview changes
ytscriber summarize random --dry-run

Options:

Option	Description	Default
`--all`	Process all folders	False
`--dry-run`	Preview changes without writing	False
`--force`	overwrite existing summaries	False
`--model`	OpenRouter model to use	`xiaomi/mimo-v2-flash:free`

CSV tracking format

url,title,duration_minutes,view_count,description,transcript_downloaded,summary_done
https://youtube.com/watch?v=...,Talk Title,45.5,1234,Description...,success,

Column	Description
`transcript_downloaded`	Status: `success`, `error: <reason>`, or empty
`summary_done`	Track if you've processed the transcript

Project Structure

YTScribe/
├── src/
│   └── ytscriber/
│       ├── __init__.py          # Package exports
│       ├── cli.py               # Command-line interface
│       ├── downloader.py        # Transcript downloading
│       ├── extractor.py         # Channel video extraction
│       ├── csv_handler.py       # CSV operations
│       ├── metadata.py          # Video metadata extraction
│       ├── models.py            # Data models
│       ├── exceptions.py        # Custom exceptions
│       ├── logging_config.py    # Logging setup
│       └── utils.py             # Utility functions
├── scripts/                     # Automation scripts
├── tests/                       # Unit tests
├── prompts/                     # AI prompts for analysis
├── examples/                    # Example shell scripts
├── pyproject.toml               # Project configuration
└── README.md

Data Organization

By default, data is stored in:

macOS/Windows: ~/Documents/YTScriber
Linux: ~/ytscriber

Example structure:

YTScriber/
├── aws-reinvent-2025/
│   ├── videos.csv
│   └── transcripts/
│       ├── 2025-12-03-abc123xyz.md
│       └── 2025-12-03-def456uvw.md
├── pycon-2024/
│   ├── videos.csv
│   └── transcripts/
└── kubecon-eu-2024/
    ├── videos.csv
    └── transcripts/

Migration (1.x to 2.0)

If you used the old transcript-* commands and a repo-local data/ folder:

Move your existing data/ folder into the new data directory above.
Copy channels.yaml into the same data directory if you rely on sync-all.
Update commands:
- transcript-extract -> ytscriber extract
- transcript-download -> ytscriber download
- transcript-add -> ytscriber add
- transcript-summarize -> ytscriber summarize
Optionally set defaults with ytscriber config --set defaults.delay=....

Rate Limiting & Best Practices

YouTube may rate limit or block your IP if you make too many requests:

Use reasonable delays: Default 60 seconds between requests is safe
Resume capability: Script tracks progress in CSV, can resume after interruption
Start small: Test with 10-20 videos before large batches
Respect limits: If you get blocked, wait 30-60 minutes before retrying

Development

Setup

# Clone repository
git clone https://github.com/dparedesi/YTScribe.git
cd YTScribe

# Install with development dependencies
pip install -e ".[dev]"

# Install pre-commit hooks (optional)
pre-commit install

Running tests

# Run all tests
pytest

# Run with coverage
pytest --cov=ytscriber

# Run specific test file
pytest tests/test_utils.py

Code quality

# Format code
black src tests

# Lint code
ruff check src tests

# Type checking
mypy src

Programmatic Usage

from ytscriber import TranscriptDownloader, ChannelExtractor

# Extract videos from a channel
extractor = ChannelExtractor()
videos = extractor.extract_videos(
    "https://www.youtube.com/@AWSEventsChannel/videos",
    max_videos=10
)

# Download transcripts
downloader = TranscriptDownloader(
    languages=["en", "en-US"],
    delay=30,
    output_dir="transcripts"
)

for video in videos:
    result = downloader.download(video.video_id, video.url)
    if result.success:
        print(f"Downloaded: {video.title}")
    else:
        print(f"Failed: {result.error_message}")

Troubleshooting

"No transcript found"

Video may not have captions/transcripts available
Try with different language codes: --languages en en-US auto

"IP blocked" or rate limiting

Wait 30-60 minutes before retrying
Increase delay: --delay 120
Use different network/IP if persistent

"Could not extract metadata"

Transcript will still download, just without extra metadata
Check if video is accessible and not private

Script interrupted

Just run the same command again - it will skip already downloaded videos
Progress is saved to CSV after each video

Requirements

Python 3.9+
youtube-transcript-api
yt-dlp
pytube

License

MIT License - see LICENSE for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Run tests and linting (pytest && ruff check .)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.4.0

May 10, 2026

1.2.2

Jan 11, 2026

1.2.1

Jan 11, 2026

This version

1.2.0

Jan 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ytscriber-1.2.0.tar.gz (42.9 kB view details)

Uploaded Jan 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ytscriber-1.2.0-py3-none-any.whl (36.2 kB view details)

Uploaded Jan 11, 2026 Python 3

File details

Details for the file ytscriber-1.2.0.tar.gz.

File metadata

Download URL: ytscriber-1.2.0.tar.gz
Upload date: Jan 11, 2026
Size: 42.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for ytscriber-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`8f455bd0a3c38c2aa9ed44ae88b869b675eb601f580ef0ccaaecd59b74a7d187`
MD5	`feaf79ad29df4dd7a348e13ed30ce5ed`
BLAKE2b-256	`15760435b39aaa1d6c24d6783cf201cbefb6b09bc2f6c679aeef745d12610872`

See more details on using hashes here.

File details

Details for the file ytscriber-1.2.0-py3-none-any.whl.

File metadata

Download URL: ytscriber-1.2.0-py3-none-any.whl
Upload date: Jan 11, 2026
Size: 36.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for ytscriber-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7a9c19e0ed4a5e7d31c74fe414ccd7cee6ffd23b73099786391bd25ed2a975f3`
MD5	`539d0f9e1b159a9a7825e2270d72e293`
BLAKE2b-256	`19ac00ddeabc4956a34fdd5cc9e7c84580bcec7fe7552a917fa7176c22f021e4`

See more details on using hashes here.

ytscriber 1.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

YTScriber

Features

Installation

From PyPI

From source (recommended for development)

Development installation

Quick Start

Usage

Extract videos from a channel

Download transcripts

Add a video to a collection

Sync all channels

Download all transcripts

View or edit config

Status

Output Format

Transcript files (Markdown with YAML frontmatter)

AI Summarization Setup

Summarize transcripts

CSV tracking format

Project Structure

Data Organization

Migration (1.x to 2.0)

Rate Limiting & Best Practices

Development

Setup

Running tests

Code quality

Programmatic Usage

Troubleshooting

"No transcript found"

"IP blocked" or rate limiting

"Could not extract metadata"

Script interrupted

Requirements

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes