A Python tool for downloading and transcribing videos from YouTube/Bilibili and local media files

Project description

ReadVideo

A modern Python-based video and audio transcription tool that extracts and transcribes content from YouTube, Bilibili, and local media files. This project is a complete rewrite of the original bash script with improved modularity, performance, and user experience.

🚀 Features

Multi-Platform Support

YouTube: Prioritizes existing subtitles, falls back to audio transcription
Bilibili: Automatically downloads and transcribes audio using BBDown
Local Files: Supports various audio and video file formats

Intelligent Processing

Subtitle Priority: YouTube videos prioritize youtube-transcript-api for existing subtitles
Multi-Language Support: Supports Chinese, English, and more with auto-detection or manual specification
Fallback Mechanism: Automatically falls back to audio transcription when subtitles are unavailable

High Performance

Tool Reuse: Directly calls installed whisper-cli for native performance
Model Reuse: Utilizes existing models in ~/.whisper-models/ directory
Efficient Processing: Smart temporary file management and cleanup

📦 Installation

Prerequisites

Python 3.11+
ffmpeg (system installation)
whisper-cli (from whisper.cpp)
yt-dlp (Python package, included)
BBDown (optional, for Bilibili support)

Install with uv

# Install dependencies
uv sync

# Or install the package directly
uv pip install -e .

System Dependencies

# macOS
brew install ffmpeg whisper-cpp

# Ubuntu/Debian
sudo apt install ffmpeg
# Install whisper.cpp from source: https://github.com/ggerganov/whisper.cpp

# Download Whisper model (if not already present)
mkdir -p ~/.whisper-models
# Download ggml-large-v3.bin to ~/.whisper-models/

🎯 Quick Start

Basic Usage

# YouTube video (prioritizes subtitles)
readvideo https://www.youtube.com/watch?v=abc123

# Auto language detection
readvideo --auto-detect https://www.youtube.com/watch?v=abc123

# Bilibili video
readvideo https://www.bilibili.com/video/BV1234567890

# Local audio file
readvideo ~/Music/podcast.mp3

# Local video file
readvideo ~/Videos/lecture.mp4

# Custom output directory
readvideo input.mp4 --output-dir ./transcripts

# Show information only
readvideo input.mp4 --info-only

Command Line Options

Options:
  --auto-detect              Enable automatic language detection (default: Chinese)
  --output-dir, -o PATH      Output directory (default: current directory or input file directory)
  --no-cleanup               Do not clean up temporary files
  --info-only                Show input information only, do not process
  --whisper-model PATH       Path to Whisper model file [default: ~/.whisper-models/ggml-large-v3.bin]
  --verbose, -v              Verbose output
  --proxy TEXT               HTTP proxy address (e.g., http://127.0.0.1:8080)
  --help                     Show this message and exit

🏗️ Architecture

Project Structure

readvideo/
├── pyproject.toml              # Project configuration
├── README.md                   # Project documentation
└── src/readvideo/
    ├── __init__.py             # Package initialization
    ├── cli.py                  # CLI entry point
    ├── core/                   # Core functionality modules
    │   ├── transcript_fetcher.py   # YouTube subtitle fetcher
    │   ├── whisper_wrapper.py      # whisper-cli wrapper
    │   └── audio_processor.py      # Audio processor
    └── platforms/              # Platform handlers
        ├── youtube.py          # YouTube handler
        ├── bilibili.py         # Bilibili handler
        └── local.py            # Local file handler

Core Dependencies

youtube-transcript-api: YouTube subtitle extraction
yt-dlp: YouTube video downloading
click: Command-line interface
rich: Beautiful console output
tenacity: Retry mechanisms
ffmpeg: Audio processing (system dependency)
whisper-cli: Speech transcription (system dependency)

🔧 How It Works

YouTube Processing

Subtitle Priority: Attempts to fetch existing subtitles using youtube-transcript-api
Language Preference: Prioritizes Chinese (zh, zh-Hans, zh-Hant), then English
Fallback: If no subtitles available, downloads audio with yt-dlp
Transcription: Converts audio to WAV and transcribes with whisper-cli

Bilibili Processing

Audio Download: Uses BBDown to extract audio from Bilibili videos
Format Conversion: Converts audio to WAV format using ffmpeg
Transcription: Processes audio with whisper-cli

Local File Processing

Format Detection: Automatically detects audio vs video files
Audio Extraction: Extracts audio tracks from video files using ffmpeg
Format Conversion: Converts to whisper-compatible WAV format
Transcription: Processes with whisper-cli

📋 Supported Formats

Audio Formats

MP3, M4A, WAV, FLAC, OGG, AAC, WMA

Video Formats

MP4, MKV, AVI, MOV, WMV, FLV, WEBM, M4V

🛠️ Configuration

Whisper Model Configuration

# Default model path
~/.whisper-models/ggml-large-v3.bin

# Custom model
readvideo input.mp4 --whisper-model /path/to/model.bin

Language Options

--auto-detect: Automatic language detection
Default: Chinese (zh)
YouTube subtitles support multi-language priority

🧪 Testing

Test Examples

# YouTube video with subtitles
readvideo "https://www.youtube.com/watch?v=JdKVJH3xmlU" --info-only

# Bilibili video
readvideo "https://www.bilibili.com/video/BV1Tjt9zJEdw" --info-only

# Test local file format support
echo "test" > test.txt
readvideo test.txt --info-only  # Should show format error

Debugging

# Verbose output
readvideo input.mp4 --verbose

# Keep temporary files
readvideo input.mp4 --no-cleanup --verbose

# Information only (no processing)
readvideo input.mp4 --info-only

⚡ Performance

Speed Comparison

Operation	Time	Notes
YouTube subtitle fetch	~3-5s	When subtitles available
YouTube audio download	~30s-2min	Depends on video length
Audio conversion	~5-15s	Depends on file size
Whisper transcription	~0.1-0.5x video length	Depends on model and audio length

Performance Features

Subtitle Priority: 10-100x faster than audio transcription for YouTube
Native Tools: Direct whisper-cli calls maintain original performance
Smart Caching: Reuses existing models and temporary files efficiently

🚨 Troubleshooting

Common Issues

1. whisper-cli not found

# Solution: Install whisper.cpp
brew install whisper-cpp  # macOS
# Or compile from source: https://github.com/ggerganov/whisper.cpp

2. ffmpeg not found

# Solution: Install ffmpeg
brew install ffmpeg      # macOS
sudo apt install ffmpeg  # Ubuntu/Debian

3. Model file missing

# Solution: Download whisper model
mkdir -p ~/.whisper-models
# Download ggml-large-v3.bin from whisper.cpp releases

4. YouTube IP restrictions

The tool automatically falls back to audio download when subtitle API is blocked
Consider using a proxy with --proxy option if needed
Wait some time and retry

5. BBDown not found (Bilibili only)

Download from BBDown GitHub
Ensure it's in your PATH

Error Handling

Graceful Fallbacks: YouTube subtitle failures automatically retry with audio transcription
Intelligent Retries: Network issues are retried automatically, but IP blocks are not
Clear Error Messages: Descriptive error messages with suggested solutions
Cleanup on Failure: Temporary files are cleaned up even if processing fails

🔒 Security Notes

Cookie Usage

Browser cookies are used only for video downloads (yt-dlp), not for subtitle API calls
This follows security recommendations from the youtube-transcript-api maintainer
Cookies help bypass some YouTube download restrictions

Privacy

No data is sent to external services except for downloading content
All processing happens locally on your machine
Temporary files are automatically cleaned up

🤝 Contributing

This project replaces a bash script with a modern Python implementation. Key design principles:

Maintain Compatibility: Same functionality as the original bash script
Improve Performance: Leverage existing tools efficiently
Better UX: Rich console output and clear error messages
Extensible: Modular design for easy platform additions

Adding New Platforms

Create a new handler in platforms/
Implement validate_url(), process(), and get_info() methods
Add detection logic in CLI

Adding New Formats

Update format lists in AudioProcessor
Add corresponding ffmpeg parameters
Test with sample files

📄 License

This project maintains compatibility with the original bash script while providing a modern Python implementation focused on performance, reliability, and user experience.

🙏 Acknowledgments

whisper.cpp for high-performance speech recognition
yt-dlp for robust video downloading
youtube-transcript-api for subtitle extraction
BBDown for Bilibili support

Project details

Release history Release notifications | RSS feed

0.1.1

Aug 15, 2025

This version

0.1.0

Aug 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

readvideo-0.1.0.tar.gz (17.5 kB view details)

Uploaded Aug 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

readvideo-0.1.0-py3-none-any.whl (25.4 kB view details)

Uploaded Aug 15, 2025 Python 3

File details

Details for the file readvideo-0.1.0.tar.gz.

File metadata

Download URL: readvideo-0.1.0.tar.gz
Upload date: Aug 15, 2025
Size: 17.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.8

File hashes

Hashes for readvideo-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`01f7cd7a1546956037f5cca64b7cc1da65daece4854225a8509a56445cb1dcc8`
MD5	`cccd3c3302f1bfbc2bba94ce64e3c81a`
BLAKE2b-256	`d8eb41e7bd21602860282e9ed393fc588e940e7ed070b33edc5b419acaba58ee`

See more details on using hashes here.

File details

Details for the file readvideo-0.1.0-py3-none-any.whl.

File metadata

Download URL: readvideo-0.1.0-py3-none-any.whl
Upload date: Aug 15, 2025
Size: 25.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.8

File hashes

Hashes for readvideo-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`832a7673c5ee0d1eff2f2527ca6b8021bde23f9642c251961283a8c9ffb07f21`
MD5	`699b89ad1fab11e38140b383a75708cc`
BLAKE2b-256	`8004f10500a281798789dba639d2e689131acdf78fee07c45d271d2b8c150772`

See more details on using hashes here.

readvideo 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

ReadVideo

🚀 Features

Multi-Platform Support

Intelligent Processing

High Performance

📦 Installation

Prerequisites

Install with uv

System Dependencies

🎯 Quick Start

Basic Usage

Command Line Options

🏗️ Architecture

Project Structure

Core Dependencies

🔧 How It Works

YouTube Processing

Bilibili Processing

Local File Processing

📋 Supported Formats

Audio Formats

Video Formats

🛠️ Configuration

Whisper Model Configuration

Language Options

🧪 Testing

Test Examples

Debugging

⚡ Performance

Speed Comparison

Performance Features

🚨 Troubleshooting

Common Issues

1. whisper-cli not found

2. ffmpeg not found

3. Model file missing

4. YouTube IP restrictions

5. BBDown not found (Bilibili only)

Error Handling

🔒 Security Notes

Cookie Usage

Privacy

🤝 Contributing

Adding New Platforms

Adding New Formats

📄 License

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes