Skip to main content

A Python tool for downloading and transcribing videos from YouTube/Bilibili and local media files

Project description

ReadVideo

A modern Python-based video and audio transcription tool that extracts and transcribes content from YouTube, Bilibili, and local media files. This project is a complete rewrite of the original bash script with improved modularity, performance, and user experience.

๐Ÿš€ Features

Multi-Platform Support

  • YouTube: Prioritizes existing subtitles, falls back to audio transcription
  • Bilibili: Automatically downloads and transcribes audio using BBDown
  • Local Files: Supports various audio and video file formats

Intelligent Processing

  • Subtitle Priority: YouTube videos prioritize youtube-transcript-api for existing subtitles
  • Multi-Language Support: Supports Chinese, English, and more with auto-detection or manual specification
  • Fallback Mechanism: Automatically falls back to audio transcription when subtitles are unavailable

High Performance

  • Tool Reuse: Directly calls installed whisper-cli for native performance
  • Model Reuse: Utilizes existing models in ~/.whisper-models/ directory
  • Efficient Processing: Smart temporary file management and cleanup

๐Ÿ“ฆ Installation

Prerequisites

  • Python 3.11+
  • ffmpeg (system installation)
  • whisper-cli (from whisper.cpp)
  • yt-dlp (Python package, included)
  • BBDown (optional, for Bilibili support)

Install with uv

# Install dependencies
uv sync

# Or install the package directly
uv pip install -e .

System Dependencies

# macOS
brew install ffmpeg whisper-cpp

# Ubuntu/Debian
sudo apt install ffmpeg
# Install whisper.cpp from source: https://github.com/ggerganov/whisper.cpp

# Download Whisper model (if not already present)
mkdir -p ~/.whisper-models
# Download ggml-large-v3.bin to ~/.whisper-models/

๐ŸŽฏ Quick Start

Basic Usage

# YouTube video (prioritizes subtitles)
readvideo https://www.youtube.com/watch?v=abc123

# Auto language detection
readvideo --auto-detect https://www.youtube.com/watch?v=abc123

# Bilibili video
readvideo https://www.bilibili.com/video/BV1234567890

# Local audio file
readvideo ~/Music/podcast.mp3

# Local video file
readvideo ~/Videos/lecture.mp4

# Custom output directory
readvideo input.mp4 --output-dir ./transcripts

# Show information only
readvideo input.mp4 --info-only

Command Line Options

Options:
  --auto-detect              Enable automatic language detection (default: Chinese)
  --output-dir, -o PATH      Output directory (default: current directory or input file directory)
  --no-cleanup               Do not clean up temporary files
  --info-only                Show input information only, do not process
  --whisper-model PATH       Path to Whisper model file [default: ~/.whisper-models/ggml-large-v3.bin]
  --verbose, -v              Verbose output
  --proxy TEXT               HTTP proxy address (e.g., http://127.0.0.1:8080)
  --help                     Show this message and exit

๐Ÿ—๏ธ Architecture

Project Structure

readvideo/
โ”œโ”€โ”€ pyproject.toml              # Project configuration
โ”œโ”€โ”€ README.md                   # Project documentation
โ””โ”€โ”€ src/readvideo/
    โ”œโ”€โ”€ __init__.py             # Package initialization
    โ”œโ”€โ”€ cli.py                  # CLI entry point
    โ”œโ”€โ”€ core/                   # Core functionality modules
    โ”‚   โ”œโ”€โ”€ transcript_fetcher.py   # YouTube subtitle fetcher
    โ”‚   โ”œโ”€โ”€ whisper_wrapper.py      # whisper-cli wrapper
    โ”‚   โ””โ”€โ”€ audio_processor.py      # Audio processor
    โ””โ”€โ”€ platforms/              # Platform handlers
        โ”œโ”€โ”€ youtube.py          # YouTube handler
        โ”œโ”€โ”€ bilibili.py         # Bilibili handler
        โ””โ”€โ”€ local.py            # Local file handler

Core Dependencies

  • youtube-transcript-api: YouTube subtitle extraction
  • yt-dlp: YouTube video downloading
  • click: Command-line interface
  • rich: Beautiful console output
  • tenacity: Retry mechanisms
  • ffmpeg: Audio processing (system dependency)
  • whisper-cli: Speech transcription (system dependency)

๐Ÿ”ง How It Works

YouTube Processing

  1. Subtitle Priority: Attempts to fetch existing subtitles using youtube-transcript-api
  2. Language Preference: Prioritizes Chinese (zh, zh-Hans, zh-Hant), then English
  3. Fallback: If no subtitles available, downloads audio with yt-dlp
  4. Transcription: Converts audio to WAV and transcribes with whisper-cli

Bilibili Processing

  1. Audio Download: Uses BBDown to extract audio from Bilibili videos
  2. Format Conversion: Converts audio to WAV format using ffmpeg
  3. Transcription: Processes audio with whisper-cli

Local File Processing

  1. Format Detection: Automatically detects audio vs video files
  2. Audio Extraction: Extracts audio tracks from video files using ffmpeg
  3. Format Conversion: Converts to whisper-compatible WAV format
  4. Transcription: Processes with whisper-cli

๐Ÿ“‹ Supported Formats

Audio Formats

  • MP3, M4A, WAV, FLAC, OGG, AAC, WMA

Video Formats

  • MP4, MKV, AVI, MOV, WMV, FLV, WEBM, M4V

๐Ÿ› ๏ธ Configuration

Whisper Model Configuration

# Default model path
~/.whisper-models/ggml-large-v3.bin

# Custom model
readvideo input.mp4 --whisper-model /path/to/model.bin

Language Options

  • --auto-detect: Automatic language detection
  • Default: Chinese (zh)
  • YouTube subtitles support multi-language priority

๐Ÿงช Testing

Test Examples

# YouTube video with subtitles
readvideo "https://www.youtube.com/watch?v=JdKVJH3xmlU" --info-only

# Bilibili video
readvideo "https://www.bilibili.com/video/BV1Tjt9zJEdw" --info-only

# Test local file format support
echo "test" > test.txt
readvideo test.txt --info-only  # Should show format error

Debugging

# Verbose output
readvideo input.mp4 --verbose

# Keep temporary files
readvideo input.mp4 --no-cleanup --verbose

# Information only (no processing)
readvideo input.mp4 --info-only

โšก Performance

Speed Comparison

Operation Time Notes
YouTube subtitle fetch ~3-5s When subtitles available
YouTube audio download ~30s-2min Depends on video length
Audio conversion ~5-15s Depends on file size
Whisper transcription ~0.1-0.5x video length Depends on model and audio length

Performance Features

  • Subtitle Priority: 10-100x faster than audio transcription for YouTube
  • Native Tools: Direct whisper-cli calls maintain original performance
  • Smart Caching: Reuses existing models and temporary files efficiently

๐Ÿšจ Troubleshooting

Common Issues

1. whisper-cli not found

# Solution: Install whisper.cpp
brew install whisper-cpp  # macOS
# Or compile from source: https://github.com/ggerganov/whisper.cpp

2. ffmpeg not found

# Solution: Install ffmpeg
brew install ffmpeg      # macOS
sudo apt install ffmpeg  # Ubuntu/Debian

3. Model file missing

# Solution: Download whisper model
mkdir -p ~/.whisper-models
# Download ggml-large-v3.bin from whisper.cpp releases

4. YouTube IP restrictions

  • The tool automatically falls back to audio download when subtitle API is blocked
  • Consider using a proxy with --proxy option if needed
  • Wait some time and retry

5. BBDown not found (Bilibili only)

Error Handling

  • Graceful Fallbacks: YouTube subtitle failures automatically retry with audio transcription
  • Intelligent Retries: Network issues are retried automatically, but IP blocks are not
  • Clear Error Messages: Descriptive error messages with suggested solutions
  • Cleanup on Failure: Temporary files are cleaned up even if processing fails

๐Ÿ”’ Security Notes

Cookie Usage

  • Browser cookies are used only for video downloads (yt-dlp), not for subtitle API calls
  • This follows security recommendations from the youtube-transcript-api maintainer
  • Cookies help bypass some YouTube download restrictions

Privacy

  • No data is sent to external services except for downloading content
  • All processing happens locally on your machine
  • Temporary files are automatically cleaned up

๐Ÿค Contributing

This project replaces a bash script with a modern Python implementation. Key design principles:

  1. Maintain Compatibility: Same functionality as the original bash script
  2. Improve Performance: Leverage existing tools efficiently
  3. Better UX: Rich console output and clear error messages
  4. Extensible: Modular design for easy platform additions

Adding New Platforms

  1. Create a new handler in platforms/
  2. Implement validate_url(), process(), and get_info() methods
  3. Add detection logic in CLI

Adding New Formats

  1. Update format lists in AudioProcessor
  2. Add corresponding ffmpeg parameters
  3. Test with sample files

๐Ÿ“„ License

This project maintains compatibility with the original bash script while providing a modern Python implementation focused on performance, reliability, and user experience.

๐Ÿ™ Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

readvideo-0.1.0.tar.gz (17.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

readvideo-0.1.0-py3-none-any.whl (25.4 kB view details)

Uploaded Python 3

File details

Details for the file readvideo-0.1.0.tar.gz.

File metadata

  • Download URL: readvideo-0.1.0.tar.gz
  • Upload date:
  • Size: 17.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.8

File hashes

Hashes for readvideo-0.1.0.tar.gz
Algorithm Hash digest
SHA256 01f7cd7a1546956037f5cca64b7cc1da65daece4854225a8509a56445cb1dcc8
MD5 cccd3c3302f1bfbc2bba94ce64e3c81a
BLAKE2b-256 d8eb41e7bd21602860282e9ed393fc588e940e7ed070b33edc5b419acaba58ee

See more details on using hashes here.

File details

Details for the file readvideo-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: readvideo-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 25.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.8

File hashes

Hashes for readvideo-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 832a7673c5ee0d1eff2f2527ca6b8021bde23f9642c251961283a8c9ffb07f21
MD5 699b89ad1fab11e38140b383a75708cc
BLAKE2b-256 8004f10500a281798789dba639d2e689131acdf78fee07c45d271d2b8c150772

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page