Skip to main content

A Python tool for downloading and transcribing videos from YouTube/Bilibili and local media files

Project description

ReadVideo

PyPI version Python 3.11+

A modern Python-based video and audio transcription tool that extracts and transcribes content from YouTube, Bilibili, and local media files. This project is a complete rewrite of the original bash script with improved modularity, performance, and user experience.

๐Ÿš€ Features

Multi-Platform Support

  • YouTube: Prioritizes existing subtitles, falls back to audio transcription
  • Bilibili: Automatically downloads and transcribes audio using BBDown
  • Local Files: Supports various audio and video file formats

Intelligent Processing

  • Subtitle Priority: YouTube videos prioritize youtube-transcript-api for existing subtitles
  • Multi-Language Support: Supports Chinese, English, and more with auto-detection or manual specification
  • Fallback Mechanism: Automatically falls back to audio transcription when subtitles are unavailable

High Performance

  • Tool Reuse: Directly calls installed whisper-cli for native performance
  • Model Reuse: Utilizes existing models in ~/.whisper-models/ directory
  • Efficient Processing: Smart temporary file management and cleanup

๐Ÿ“ฆ Installation

Prerequisites

  • Python 3.11+
  • ffmpeg (system installation)
  • whisper-cli (from whisper.cpp)
  • yt-dlp (Python package, included)
  • BBDown (optional, for Bilibili support)

Install from PyPI

Option 1: Install as a global tool (Recommended)

# Using uv (recommended - fast and isolated)
uv tool install readvideo

# Using pipx (alternative tool installer)
pipx install readvideo

# Using pip globally
pip install readvideo

Option 2: Development Installation

# Clone and install for development
git clone https://github.com/learnerLj/readvideo.git
cd readvideo
uv sync

# Or with pip
pip install -e .

System Dependencies

# macOS
brew install ffmpeg whisper-cpp

# Ubuntu/Debian
sudo apt install ffmpeg
# Install whisper.cpp from source: https://github.com/ggerganov/whisper.cpp

# Download Whisper model (if not already present)
mkdir -p ~/.whisper-models
# Download ggml-large-v3.bin to ~/.whisper-models/

๐ŸŽฏ Quick Start

Basic Usage

# YouTube video (prioritizes subtitles)
readvideo https://www.youtube.com/watch?v=abc123

# Auto language detection
readvideo --auto-detect https://www.youtube.com/watch?v=abc123

# Bilibili video
readvideo https://www.bilibili.com/video/BV1234567890

# Local audio file
readvideo ~/Music/podcast.mp3

# Local video file
readvideo ~/Videos/lecture.mp4

# Custom output directory
readvideo input.mp4 --output-dir ./transcripts

# Show information only
readvideo input.mp4 --info-only

Command Line Options

Options:
  --auto-detect              Enable automatic language detection (default: Chinese)
  --output-dir, -o PATH      Output directory (default: current directory or input file directory)
  --no-cleanup               Do not clean up temporary files
  --info-only                Show input information only, do not process
  --whisper-model PATH       Path to Whisper model file [default: ~/.whisper-models/ggml-large-v3.bin]
  --verbose, -v              Verbose output
  --proxy TEXT               HTTP proxy address (e.g., http://127.0.0.1:8080)
  --help                     Show this message and exit

๐Ÿ—๏ธ Architecture

Project Structure

readvideo/
โ”œโ”€โ”€ pyproject.toml              # Project configuration
โ”œโ”€โ”€ README.md                   # Project documentation
โ””โ”€โ”€ src/readvideo/
    โ”œโ”€โ”€ __init__.py             # Package initialization
    โ”œโ”€โ”€ cli.py                  # CLI entry point
    โ”œโ”€โ”€ core/                   # Core functionality modules
    โ”‚   โ”œโ”€โ”€ transcript_fetcher.py   # YouTube subtitle fetcher
    โ”‚   โ”œโ”€โ”€ whisper_wrapper.py      # whisper-cli wrapper
    โ”‚   โ””โ”€โ”€ audio_processor.py      # Audio processor
    โ””โ”€โ”€ platforms/              # Platform handlers
        โ”œโ”€โ”€ youtube.py          # YouTube handler
        โ”œโ”€โ”€ bilibili.py         # Bilibili handler
        โ””โ”€โ”€ local.py            # Local file handler

Core Dependencies

  • youtube-transcript-api: YouTube subtitle extraction
  • yt-dlp: YouTube video downloading
  • click: Command-line interface
  • rich: Beautiful console output
  • tenacity: Retry mechanisms
  • ffmpeg: Audio processing (system dependency)
  • whisper-cli: Speech transcription (system dependency)

๐Ÿ”ง How It Works

YouTube Processing

  1. Subtitle Priority: Attempts to fetch existing subtitles using youtube-transcript-api
  2. Language Preference: Prioritizes Chinese (zh, zh-Hans, zh-Hant), then English
  3. Fallback: If no subtitles available, downloads audio with yt-dlp
  4. Transcription: Converts audio to WAV and transcribes with whisper-cli

Bilibili Processing

  1. Audio Download: Uses BBDown to extract audio from Bilibili videos
  2. Format Conversion: Converts audio to WAV format using ffmpeg
  3. Transcription: Processes audio with whisper-cli

Local File Processing

  1. Format Detection: Automatically detects audio vs video files
  2. Audio Extraction: Extracts audio tracks from video files using ffmpeg
  3. Format Conversion: Converts to whisper-compatible WAV format
  4. Transcription: Processes with whisper-cli

๐Ÿ“‹ Supported Formats

Audio Formats

  • MP3, M4A, WAV, FLAC, OGG, AAC, WMA

Video Formats

  • MP4, MKV, AVI, MOV, WMV, FLV, WEBM, M4V

๐Ÿ› ๏ธ Configuration

Whisper Model Configuration

# Default model path
~/.whisper-models/ggml-large-v3.bin

# Custom model
readvideo input.mp4 --whisper-model /path/to/model.bin

Language Options

  • --auto-detect: Automatic language detection
  • Default: Chinese (zh)
  • YouTube subtitles support multi-language priority

๐Ÿงช Testing

Test Examples

# YouTube video with subtitles
readvideo "https://www.youtube.com/watch?v=JdKVJH3xmlU" --info-only

# Bilibili video
readvideo "https://www.bilibili.com/video/BV1Tjt9zJEdw" --info-only

# Test local file format support
echo "test" > test.txt
readvideo test.txt --info-only  # Should show format error

Debugging

# Verbose output
readvideo input.mp4 --verbose

# Keep temporary files
readvideo input.mp4 --no-cleanup --verbose

# Information only (no processing)
readvideo input.mp4 --info-only

โšก Performance

Speed Comparison

Operation Time Notes
YouTube subtitle fetch ~3-5s When subtitles available
YouTube audio download ~30s-2min Depends on video length
Audio conversion ~5-15s Depends on file size
Whisper transcription ~0.1-0.5x video length Depends on model and audio length

Performance Features

  • Subtitle Priority: 10-100x faster than audio transcription for YouTube
  • Native Tools: Direct whisper-cli calls maintain original performance
  • Smart Caching: Reuses existing models and temporary files efficiently

๐Ÿšจ Troubleshooting

Common Issues

1. whisper-cli not found

# Solution: Install whisper.cpp
brew install whisper-cpp  # macOS
# Or compile from source: https://github.com/ggerganov/whisper.cpp

2. ffmpeg not found

# Solution: Install ffmpeg
brew install ffmpeg      # macOS
sudo apt install ffmpeg  # Ubuntu/Debian

3. Model file missing

# Solution: Download whisper model
mkdir -p ~/.whisper-models
# Download ggml-large-v3.bin from whisper.cpp releases

4. YouTube IP restrictions

  • The tool automatically falls back to audio download when subtitle API is blocked
  • Consider using a proxy with --proxy option if needed
  • Wait some time and retry

5. BBDown not found (Bilibili only)

Error Handling

  • Graceful Fallbacks: YouTube subtitle failures automatically retry with audio transcription
  • Intelligent Retries: Network issues are retried automatically, but IP blocks are not
  • Clear Error Messages: Descriptive error messages with suggested solutions
  • Cleanup on Failure: Temporary files are cleaned up even if processing fails

๐Ÿ”’ Security Notes

Cookie Usage

  • Browser cookies are used only for video downloads (yt-dlp), not for subtitle API calls
  • This follows security recommendations from the youtube-transcript-api maintainer
  • Cookies help bypass some YouTube download restrictions

Privacy

  • No data is sent to external services except for downloading content
  • All processing happens locally on your machine
  • Temporary files are automatically cleaned up

๐Ÿค Contributing

This project replaces a bash script with a modern Python implementation. Key design principles:

  1. Maintain Compatibility: Same functionality as the original bash script
  2. Improve Performance: Leverage existing tools efficiently
  3. Better UX: Rich console output and clear error messages
  4. Extensible: Modular design for easy platform additions

Adding New Platforms

  1. Create a new handler in platforms/
  2. Implement validate_url(), process(), and get_info() methods
  3. Add detection logic in CLI

Adding New Formats

  1. Update format lists in AudioProcessor
  2. Add corresponding ffmpeg parameters
  3. Test with sample files

๐Ÿ“„ License

This project maintains compatibility with the original bash script while providing a modern Python implementation focused on performance, reliability, and user experience.

๐Ÿ™ Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

readvideo-0.1.1.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

readvideo-0.1.1-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file readvideo-0.1.1.tar.gz.

File metadata

  • Download URL: readvideo-0.1.1.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.8

File hashes

Hashes for readvideo-0.1.1.tar.gz
Algorithm Hash digest
SHA256 3d1a852c6dec50ea7ac0fbf09336395a303d6059dff5f3e3ad1ac93398021631
MD5 27832e2a2e8f73af82d1a0d47a76a7fe
BLAKE2b-256 397afb8e98d0d1f3925e4d0fc3d8ac4b91059e61d9bdfae47626d6c4766fa967

See more details on using hashes here.

File details

Details for the file readvideo-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: readvideo-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.8

File hashes

Hashes for readvideo-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bc163e85111b45077ec1f11c9b2d2339594d820847fcefa437be06dd47ec7a98
MD5 0b0e119313925e0d6bf518a373e51b07
BLAKE2b-256 23ffed2d6a468952b289f203ecab0c0fbb4637787ceab51c96a0b53e8e97ab4f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page