Skip to main content

Extract video information from YouTube videos including title, description, channel name, publication date and views

Project description

YouTube Video Information Extractor

CI/CD Pipeline PyPI version Python Support License: MIT Code Coverage Code Quality

A robust Python library for extracting YouTube video metadata including title, description, channel name, publication date, and view count with multiple extraction strategies.

Features

  • Multiple Extraction Strategies: YouTube Data API v3 (official), yt-dlp, pytubefix
  • Automatic Fallback: Seamlessly switches between methods if one fails
  • Flexible Input: Support for video IDs and various YouTube URL formats
  • Batch Processing: Extract information from multiple videos efficiently
  • Multiple Output Formats: Text, JSON, CSV
  • Command Line Interface: Easy-to-use CLI for quick extractions
  • Python Library: Full programmatic access for integration
  • Robust Error Handling: Graceful handling of failures with retry logic
  • Rate Limiting: Built-in delays to respect YouTube's servers

Installation

pip install yt-info-extract

Optional Dependencies

For the best experience, install all extraction backends:

# For YouTube Data API v3 support (recommended)
pip install google-api-python-client

# For yt-dlp support (most robust fallback)
pip install yt-dlp

# For pytubefix support (lightweight fallback)
pip install pytubefix

Quick Start

Python Library Usage

from yt_info_extract import get_video_info

# Extract video information
info = get_video_info("jNQXAC9IVRw")

print(f"Title: {info['title']}")
print(f"Channel: {info['channel_name']}")
print(f"Views: {info['views']:,}")
print(f"Published: {info['publication_date']}")

Command Line Usage

# Extract video information
yt-info jNQXAC9IVRw

# Export to JSON
yt-info -f json -o video.json jNQXAC9IVRw

# Process multiple videos
yt-info --batch video_ids.txt --output-dir results/

# Use specific extraction strategy
yt-info -s api --api-key YOUR_KEY jNQXAC9IVRw

YouTube Data API v3 Setup (Recommended)

The YouTube Data API v3 is the official, most reliable method. It requires a free API key:

  1. Go to Google Cloud Console
  2. Create a new project or select an existing one
  3. Enable the YouTube Data API v3
  4. Create credentials (API Key)
  5. Restrict the API key to YouTube Data API v3
# Set your API key as environment variable
export YOUTUBE_API_KEY="your_api_key_here"

# Now use the library
yt-info jNQXAC9IVRw

Usage Examples

Basic Usage

from yt_info_extract import YouTubeVideoInfoExtractor

# Initialize extractor
extractor = YouTubeVideoInfoExtractor(api_key="your_key")

# Extract video info
info = extractor.get_video_info("jNQXAC9IVRw")

if info:
    print(f"Title: {info['title']}")
    print(f"Channel: {info['channel_name']}")
    print(f"Views: {info['views']:,}")
    print(f"Published: {info['publication_date']}")
    print(f"Description: {info['description'][:100]}...")

Batch Processing

from yt_info_extract import get_video_info_batch

video_ids = ["jNQXAC9IVRw", "dQw4w9WgXcQ", "_OBlgSz8sSM"]
results = get_video_info_batch(video_ids)

for result in results:
    if not result.get('error'):
        print(f"{result['title']} - {result['views']:,} views")

Export Data

from yt_info_extract import get_video_info, export_video_info

# Get video info
info = get_video_info("jNQXAC9IVRw")

# Export to JSON
export_video_info(info, "video.json")

# Export batch results to CSV
batch_results = get_video_info_batch(["jNQXAC9IVRw", "dQw4w9WgXcQ"])
export_video_info(batch_results, "videos.csv", format_type="csv")

Different URL Formats

from yt_info_extract import get_video_info

# All of these work:
formats = [
    "jNQXAC9IVRw",  # Video ID
    "https://www.youtube.com/watch?v=jNQXAC9IVRw",  # Standard URL
    "https://youtu.be/jNQXAC9IVRw",  # Short URL
    "https://www.youtube.com/embed/jNQXAC9IVRw",  # Embed URL
]

for fmt in formats:
    info = get_video_info(fmt)
    print(f"✓ {fmt} -> {info['title']}")

Command Line Interface

Basic Commands

# Extract video information
yt-info jNQXAC9IVRw

# Use different output formats
yt-info -f compact jNQXAC9IVRw
yt-info -f stats jNQXAC9IVRw
yt-info -f json jNQXAC9IVRw

# Export to file
yt-info -f json -o video.json jNQXAC9IVRw
yt-info -f csv -o video.csv jNQXAC9IVRw

Batch Processing

# Create a file with video IDs (one per line)
echo "jNQXAC9IVRw" > video_ids.txt
echo "dQw4w9WgXcQ" >> video_ids.txt

# Process all videos
yt-info --batch video_ids.txt --output-dir results/

# With summary report
yt-info --batch video_ids.txt --summary --output-dir results/

API Key Usage

# Method 1: Environment variable
export YOUTUBE_API_KEY="your_api_key_here"
yt-info jNQXAC9IVRw

# Method 2: Command line argument
yt-info --api-key "your_api_key_here" jNQXAC9IVRw

# Force specific strategy
yt-info -s api --api-key "your_key" jNQXAC9IVRw
yt-info -s yt_dlp jNQXAC9IVRw

Utility Commands

# Test API key
yt-info --test-api

# List available strategies
yt-info --list-strategies

# Verbose output
yt-info -v jNQXAC9IVRw

Extraction Strategies

1. YouTube Data API v3 (Recommended)

  • Pros: Official, reliable, comprehensive data, compliant with ToS
  • Cons: Requires API key, has quota limits (10,000 units/day free)
  • Best for: Production applications, commercial use, reliable automation
extractor = YouTubeVideoInfoExtractor(api_key="your_key", strategy="api")

2. yt-dlp (Most Robust Fallback)

  • Pros: No API key needed, very robust, actively maintained
  • Cons: Violates YouTube ToS, can break with YouTube updates
  • Best for: Personal projects, research, when API quotas are insufficient
extractor = YouTubeVideoInfoExtractor(strategy="yt_dlp")

3. pytubefix (Lightweight Fallback)

  • Pros: No API key needed, simple, lightweight
  • Cons: Violates YouTube ToS, less robust than yt-dlp
  • Best for: Simple scripts, minimal dependencies
extractor = YouTubeVideoInfoExtractor(strategy="pytubefix")

4. Auto Strategy (Default)

Automatically tries strategies in order of preference:

  1. YouTube Data API v3 (if API key available)
  2. yt-dlp (if installed)
  3. pytubefix (if installed)
extractor = YouTubeVideoInfoExtractor(strategy="auto")  # Default

Data Structure

Each video information dictionary contains:

{
    "title": "Video title",
    "description": "Full video description",
    "channel_name": "Channel name",
    "publication_date": "2005-04-23T00:00:00Z",  # ISO format
    "views": 123456789,  # Integer view count
    "extraction_method": "youtube_api"  # Method used
}

Configuration Options

extractor = YouTubeVideoInfoExtractor(
    api_key="your_key",           # YouTube Data API key
    strategy="auto",              # Extraction strategy
    timeout=30,                   # Request timeout (seconds)
    max_retries=3,                # Maximum retry attempts
    backoff_factor=0.75,          # Exponential backoff factor
    rate_limit_delay=0.1,         # Delay between requests
)

Error Handling

The library handles errors gracefully:

info = get_video_info("invalid_video_id")
if info:
    print(f"Success: {info['title']}")
else:
    print("Failed to extract video information")

# For batch processing, check individual results
results = get_video_info_batch(["valid_id", "invalid_id"])
for result in results:
    if result.get('error'):
        print(f"Error: {result['error']}")
    else:
        print(f"Success: {result['title']}")

Legal and Compliance Notes

  • YouTube Data API v3: Fully compliant with YouTube's Terms of Service
  • yt-dlp and pytubefix: Violate YouTube's Terms of Service by scraping data

For commercial applications or production use, always use the YouTube Data API v3.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Submit a pull request

License

MIT License - see LICENSE file for details.

Support

Related Projects

  • yt-ts-extract - YouTube transcript extraction
  • yt-dlp - YouTube downloader (used as fallback)
  • pytubefix - YouTube library (used as fallback)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yt_info_extract-1.0.0.tar.gz (43.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yt_info_extract-1.0.0-py3-none-any.whl (18.7 kB view details)

Uploaded Python 3

File details

Details for the file yt_info_extract-1.0.0.tar.gz.

File metadata

  • Download URL: yt_info_extract-1.0.0.tar.gz
  • Upload date:
  • Size: 43.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for yt_info_extract-1.0.0.tar.gz
Algorithm Hash digest
SHA256 91b3ce07af8c83de6c6ed79c51092e01aa1547a5351891a47abb8b5f4425e103
MD5 dac897a37dcd15a3ee171342fafd38cb
BLAKE2b-256 d7de5fcb79583a872e408b624b6ee632eb3e910fb913313682a5e2f4197b4fa7

See more details on using hashes here.

File details

Details for the file yt_info_extract-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for yt_info_extract-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bafa4ae05561262f8cda5a34de80ddfd90599a8a662eaa3c03d5ad866ccda80c
MD5 05ab1135631eb33abad11b107a9a2545
BLAKE2b-256 2c22fcc4e0974cbd5de2614423b2abe39e1e5dc2bd47eea954b45a4370b605c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page