A Python tool for downloading and transcribing videos from YouTube/Bilibili and local media files
Project description
ReadVideo
A modern Python-based video and audio transcription tool that extracts and transcribes content from YouTube, Bilibili, and local media files. This project is a complete rewrite of the original bash script with improved modularity, performance, and user experience.
๐ Features
Multi-Platform Support
- YouTube: Prioritizes existing subtitles, falls back to audio transcription
- Bilibili: Automatically downloads and transcribes audio using BBDown
- Local Files: Supports various audio and video file formats
Intelligent Processing
- Subtitle Priority: YouTube videos prioritize
youtube-transcript-apifor existing subtitles - Multi-Language Support: Supports Chinese, English, and more with auto-detection or manual specification
- Fallback Mechanism: Automatically falls back to audio transcription when subtitles are unavailable
High Performance
- Tool Reuse: Directly calls installed whisper-cli for native performance
- Model Reuse: Utilizes existing models in
~/.whisper-models/directory - Efficient Processing: Smart temporary file management and cleanup
๐ฆ Installation
Prerequisites
- Python 3.11+
- ffmpeg (system installation)
- whisper-cli (from whisper.cpp)
- yt-dlp (Python package, included)
- BBDown (optional, for Bilibili support)
Install with uv
# Install dependencies
uv sync
# Or install the package directly
uv pip install -e .
System Dependencies
# macOS
brew install ffmpeg whisper-cpp
# Ubuntu/Debian
sudo apt install ffmpeg
# Install whisper.cpp from source: https://github.com/ggerganov/whisper.cpp
# Download Whisper model (if not already present)
mkdir -p ~/.whisper-models
# Download ggml-large-v3.bin to ~/.whisper-models/
๐ฏ Quick Start
Basic Usage
# YouTube video (prioritizes subtitles)
readvideo https://www.youtube.com/watch?v=abc123
# Auto language detection
readvideo --auto-detect https://www.youtube.com/watch?v=abc123
# Bilibili video
readvideo https://www.bilibili.com/video/BV1234567890
# Local audio file
readvideo ~/Music/podcast.mp3
# Local video file
readvideo ~/Videos/lecture.mp4
# Custom output directory
readvideo input.mp4 --output-dir ./transcripts
# Show information only
readvideo input.mp4 --info-only
Command Line Options
Options:
--auto-detect Enable automatic language detection (default: Chinese)
--output-dir, -o PATH Output directory (default: current directory or input file directory)
--no-cleanup Do not clean up temporary files
--info-only Show input information only, do not process
--whisper-model PATH Path to Whisper model file [default: ~/.whisper-models/ggml-large-v3.bin]
--verbose, -v Verbose output
--proxy TEXT HTTP proxy address (e.g., http://127.0.0.1:8080)
--help Show this message and exit
๐๏ธ Architecture
Project Structure
readvideo/
โโโ pyproject.toml # Project configuration
โโโ README.md # Project documentation
โโโ src/readvideo/
โโโ __init__.py # Package initialization
โโโ cli.py # CLI entry point
โโโ core/ # Core functionality modules
โ โโโ transcript_fetcher.py # YouTube subtitle fetcher
โ โโโ whisper_wrapper.py # whisper-cli wrapper
โ โโโ audio_processor.py # Audio processor
โโโ platforms/ # Platform handlers
โโโ youtube.py # YouTube handler
โโโ bilibili.py # Bilibili handler
โโโ local.py # Local file handler
Core Dependencies
youtube-transcript-api: YouTube subtitle extractionyt-dlp: YouTube video downloadingclick: Command-line interfacerich: Beautiful console outputtenacity: Retry mechanismsffmpeg: Audio processing (system dependency)whisper-cli: Speech transcription (system dependency)
๐ง How It Works
YouTube Processing
- Subtitle Priority: Attempts to fetch existing subtitles using
youtube-transcript-api - Language Preference: Prioritizes Chinese (zh, zh-Hans, zh-Hant), then English
- Fallback: If no subtitles available, downloads audio with
yt-dlp - Transcription: Converts audio to WAV and transcribes with whisper-cli
Bilibili Processing
- Audio Download: Uses BBDown to extract audio from Bilibili videos
- Format Conversion: Converts audio to WAV format using ffmpeg
- Transcription: Processes audio with whisper-cli
Local File Processing
- Format Detection: Automatically detects audio vs video files
- Audio Extraction: Extracts audio tracks from video files using ffmpeg
- Format Conversion: Converts to whisper-compatible WAV format
- Transcription: Processes with whisper-cli
๐ Supported Formats
Audio Formats
- MP3, M4A, WAV, FLAC, OGG, AAC, WMA
Video Formats
- MP4, MKV, AVI, MOV, WMV, FLV, WEBM, M4V
๐ ๏ธ Configuration
Whisper Model Configuration
# Default model path
~/.whisper-models/ggml-large-v3.bin
# Custom model
readvideo input.mp4 --whisper-model /path/to/model.bin
Language Options
--auto-detect: Automatic language detection- Default: Chinese (
zh) - YouTube subtitles support multi-language priority
๐งช Testing
Test Examples
# YouTube video with subtitles
readvideo "https://www.youtube.com/watch?v=JdKVJH3xmlU" --info-only
# Bilibili video
readvideo "https://www.bilibili.com/video/BV1Tjt9zJEdw" --info-only
# Test local file format support
echo "test" > test.txt
readvideo test.txt --info-only # Should show format error
Debugging
# Verbose output
readvideo input.mp4 --verbose
# Keep temporary files
readvideo input.mp4 --no-cleanup --verbose
# Information only (no processing)
readvideo input.mp4 --info-only
โก Performance
Speed Comparison
| Operation | Time | Notes |
|---|---|---|
| YouTube subtitle fetch | ~3-5s | When subtitles available |
| YouTube audio download | ~30s-2min | Depends on video length |
| Audio conversion | ~5-15s | Depends on file size |
| Whisper transcription | ~0.1-0.5x video length | Depends on model and audio length |
Performance Features
- Subtitle Priority: 10-100x faster than audio transcription for YouTube
- Native Tools: Direct whisper-cli calls maintain original performance
- Smart Caching: Reuses existing models and temporary files efficiently
๐จ Troubleshooting
Common Issues
1. whisper-cli not found
# Solution: Install whisper.cpp
brew install whisper-cpp # macOS
# Or compile from source: https://github.com/ggerganov/whisper.cpp
2. ffmpeg not found
# Solution: Install ffmpeg
brew install ffmpeg # macOS
sudo apt install ffmpeg # Ubuntu/Debian
3. Model file missing
# Solution: Download whisper model
mkdir -p ~/.whisper-models
# Download ggml-large-v3.bin from whisper.cpp releases
4. YouTube IP restrictions
- The tool automatically falls back to audio download when subtitle API is blocked
- Consider using a proxy with
--proxyoption if needed - Wait some time and retry
5. BBDown not found (Bilibili only)
- Download from BBDown GitHub
- Ensure it's in your PATH
Error Handling
- Graceful Fallbacks: YouTube subtitle failures automatically retry with audio transcription
- Intelligent Retries: Network issues are retried automatically, but IP blocks are not
- Clear Error Messages: Descriptive error messages with suggested solutions
- Cleanup on Failure: Temporary files are cleaned up even if processing fails
๐ Security Notes
Cookie Usage
- Browser cookies are used only for video downloads (yt-dlp), not for subtitle API calls
- This follows security recommendations from the youtube-transcript-api maintainer
- Cookies help bypass some YouTube download restrictions
Privacy
- No data is sent to external services except for downloading content
- All processing happens locally on your machine
- Temporary files are automatically cleaned up
๐ค Contributing
This project replaces a bash script with a modern Python implementation. Key design principles:
- Maintain Compatibility: Same functionality as the original bash script
- Improve Performance: Leverage existing tools efficiently
- Better UX: Rich console output and clear error messages
- Extensible: Modular design for easy platform additions
Adding New Platforms
- Create a new handler in
platforms/ - Implement
validate_url(),process(), andget_info()methods - Add detection logic in CLI
Adding New Formats
- Update format lists in
AudioProcessor - Add corresponding ffmpeg parameters
- Test with sample files
๐ License
This project maintains compatibility with the original bash script while providing a modern Python implementation focused on performance, reliability, and user experience.
๐ Acknowledgments
- whisper.cpp for high-performance speech recognition
- yt-dlp for robust video downloading
- youtube-transcript-api for subtitle extraction
- BBDown for Bilibili support
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file readvideo-0.1.0.tar.gz.
File metadata
- Download URL: readvideo-0.1.0.tar.gz
- Upload date:
- Size: 17.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01f7cd7a1546956037f5cca64b7cc1da65daece4854225a8509a56445cb1dcc8
|
|
| MD5 |
cccd3c3302f1bfbc2bba94ce64e3c81a
|
|
| BLAKE2b-256 |
d8eb41e7bd21602860282e9ed393fc588e940e7ed070b33edc5b419acaba58ee
|
File details
Details for the file readvideo-0.1.0-py3-none-any.whl.
File metadata
- Download URL: readvideo-0.1.0-py3-none-any.whl
- Upload date:
- Size: 25.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
832a7673c5ee0d1eff2f2527ca6b8021bde23f9642c251961283a8c9ffb07f21
|
|
| MD5 |
699b89ad1fab11e38140b383a75708cc
|
|
| BLAKE2b-256 |
8004f10500a281798789dba639d2e689131acdf78fee07c45d271d2b8c150772
|