Skip to main content

Intelligent video keyframe extraction for VLMs

Project description

KeyFrame Scout

[Python Version](https://www.python.org/downloads/) [License](LICENSE) [Version](https://github.com/yourusername/keyframe-scout)

An intelligent video keyframe extraction tool optimized for Vision Language Models (VLMs) and video analysis. Extract meaningful frames from videos using adaptive algorithms, with direct support for Azure OpenAI GPT and other VLMs.

✨ Key Features

  • 🎯 Intelligent Frame Selection: Three extraction modes (adaptive, interval, fixed) to suit different use cases
  • 🤖 VLM-Ready: Direct integration with Azure OpenAI GPT and other vision language models
  • 📦 Base64 Support: Return frames as base64 strings for immediate API usage
  • ⚡ Batch Processing: Process multiple videos efficiently with parallel execution
  • 🎨 Flexible Output: Save as files, return as base64, or both
  • 📊 Smart Analysis: Automatically identifies scene changes and important moments
  • 🔧 Easy Integration: Simple Python API and command-line interface

🚀 What's New in v0.2.1

  • Base64 Encoding: Direct base64 output for VLM integration
  • Azure OpenAI Support: Built-in integration for GPT
  • VLM Utilities: Helper functions for preparing frames for various VLMs
  • Batch Processing: Process entire directories of videos
  • Enhanced API: More flexible configuration options

📦 Installation

Using pip

pip install keyframe-scout

From source

git clone https://github.com/yourusername/keyframe-scout.git
cd keyframe-scout
pip install -e .

Dependencies

  • Python 3.7+
  • OpenCV (cv2)
  • NumPy
  • Pillow
  • FFmpeg (system dependency)

Install FFmpeg:

# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg

# macOS
brew install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

🎯 Quick Start

Basic Usage

import keyframe_scout as ks

# Extract keyframes from a video
result = ks.extract_video_keyframes({
    'video': 'path/to/video.mp4',
    'output_dir': 'output/frames',
    'nframes': 10
})

print(f"Extracted {result['extracted_frames']} frames")

VLM Integration (New!)

import keyframe_scout as ks
from openai import AzureOpenAI

# Extract frames for GPT
frames = ks.extract_frames_for_vlm(
    'video.mp4',
    max_frames=8,
    max_size=1024
)

# Prepare messages for Azure OpenAI
messages = ks.create_video_messages(
    'video.mp4',
    prompt="What's happening in this video?",
    max_frames=8
)

# Use with Azure OpenAI
client = AzureOpenAI(
    azure_endpoint="your-endpoint",
    api_key="your-key",
    api_version="2024-02-15-preview"
)

response = client.chat.completions.create(
    model="gpt-4-vision-preview",
    messages=messages,
    max_tokens=500
)

print(response.choices[0].message.content)

Base64 Output (New!)

# Get frames as base64 strings (no files saved)
result = ks.extract_video_keyframes({
    'video': 'video.mp4',
    'nframes': 5,
    'return_base64': True,
    'max_size': 1024
})

# Access base64 data
for frame in result['frames']:
    print(f"Frame at {frame['timestamp']}s")
    base64_data = frame['base64']
    # Use base64_data with your VLM API

📖 Detailed Usage

Extraction Modes

1. Adaptive Mode (Default)

Intelligently selects the most representative frames based on content analysis.

result = ks.extract_video_keyframes({
    'video': 'video.mp4',
    'output_dir': 'output',
    'mode': 'adaptive',
    'nframes': 10
})

2. Interval Mode

Extracts frames at fixed time intervals.

result = ks.extract_video_keyframes({
    'video': 'video.mp4',
    'output_dir': 'output',
    'mode': 'interval',
    'interval': 5.0,  # Every 5 seconds
    'frames_per_interval': 1
})

3. Fixed Mode

Extracts a fixed number of evenly distributed frames.

result = ks.extract_video_keyframes({
    'video': 'video.mp4',
    'output_dir': 'output',
    'mode': 'fixed',
    'frames_per_interval': 20  # Total 20 frames
})

VLM Integration Examples

Using the VideoAnalyzer Class

# Initialize analyzer
analyzer = ks.VideoAnalyzer(
    azure_endpoint="your-endpoint",
    api_key="your-key"
)

# Analyze video
result = analyzer.analyze_video(
    'video.mp4',
    prompt="Describe the main events in this video",
    max_frames=10
)

print(result)

Batch Video Analysis

# Analyze multiple videos
videos = ['video1.mp4', 'video2.mp4', 'video3.mp4']
prompts = ['What happens?', 'Who appears?', 'Where is this?']

results = analyzer.batch_analyze(videos, prompts, max_frames=8)

Custom VLM Integration

# Get frames for any VLM
frames = ks.extract_frames_for_vlm('video.mp4', max_frames=6)

# Prepare for your VLM API
for i, frame in enumerate(frames):
    image_data = {
        'base64': frame['base64'],
        'timestamp': frame['timestamp'],
        'description': f'Frame {i+1}'
    }
    # Send to your VLM API

Batch Processing

# Process all videos in a directory
results = ks.process_video_directory(
    directory='videos/',
    output_dir='output/',
    extensions=['.mp4', '.avi'],
    recursive=True,
    config_template={
        'mode': 'adaptive',
        'nframes': 10,
        'return_base64': True
    }
)

# Or process a list of videos
video_list = ['video1.mp4', 'video2.mp4', 'video3.mp4']
results = ks.extract_keyframes_batch(
    video_list,
    output_base_dir='batch_output/',
    max_workers=4
)

Advanced Configuration

config = {
    'video': 'video.mp4',
    'output_dir': 'output',
    'mode': 'adaptive',
    'nframes': 10,
    
    # Resolution options
    'resolution': '720p',  # '360p', '480p', '720p', '1080p', 'original'
    
    # Image options
    'image_format': 'jpg',  # 'jpg' or 'png'
    'image_quality': 95,    # 1-100 for JPEG
    
    # Base64 options (new)
    'return_base64': True,
    'include_files': False,  # Don't save files when using base64
    'max_size': 1024,       # Max dimension for base64 images
    
    # Analysis parameters
    'sample_rate': 30,      # Analyze every Nth frame
    'min_frames': 5,        # Minimum frames to extract
    'max_frames': 20        # Maximum frames to extract
}

result = ks.extract_video_keyframes(config)

🔧 Command Line Interface

Basic usage

# Extract 10 keyframes
keyframe-scout video.mp4 -o output_frames --nframes 10

# Use specific mode
keyframe-scout video.mp4 -o output_frames --mode interval --interval 5

# Set resolution and quality
keyframe-scout video.mp4 -o output_frames --resolution 720p --quality 90

Batch processing

# Process directory
keyframe-scout-batch videos/ -o batch_output/ --recursive

# With custom settings
keyframe-scout-batch videos/ -o batch_output/ --nframes 8 --resolution 480p

📊 API Reference

Core Functions

extract_video_keyframes(config)

Main extraction function with full configuration options.

extract_frames_for_vlm(video_path, max_frames, max_size, mode)

Extract frames optimized for VLM usage, returns base64 encoded frames.

create_video_messages(video_path, prompt, max_frames, system_prompt)

Create messages formatted for Azure OpenAI GPT.

get_video_info(video_path)

Get video metadata (duration, resolution, fps, etc).

VLM Utilities

prepare_for_azure_openai(video_path, max_frames, detail)

Prepare frames in Azure OpenAI format with detail level control.

estimate_token_usage(frames, detail)

Estimate token usage for GPT API calls.

save_base64_frames(frames, output_dir, prefix)

Save base64 encoded frames to files.

🎨 Examples

Video Summary for Blog

import keyframe_scout as ks

# Extract key moments from a video
frames = ks.extract_frames_for_vlm('tutorial.mp4', max_frames=6)

# Generate descriptions using GPT
analyzer = ks.VideoAnalyzer()
for i, frame in enumerate(frames):
    description = analyzer.analyze_video(
        'tutorial.mp4',
        f"Describe what's shown at {frame['timestamp']} seconds",
        max_frames=1
    )
    print(f"Time {frame['timestamp']}s: {description}")

Video Content Moderation

# Check video content
messages = ks.create_video_messages(
    'uploaded_video.mp4',
    prompt="Does this video contain any inappropriate content? List any concerns.",
    max_frames=10,
    system_prompt="You are a content moderation assistant."
)

# Send to your moderation API

Creating Video Thumbnails

# Extract best frames for thumbnails
result = ks.extract_video_keyframes({
    'video': 'video.mp4',
    'output_dir': 'thumbnails',
    'mode': 'adaptive',
    'nframes': 5,
    'resolution': '720p',
    'image_quality': 95
})

# The frames are automatically selected for maximum visual interest

🐛 Troubleshooting

FFmpeg not found

# Install FFmpeg
sudo apt install ffmpeg  # Ubuntu/Debian
brew install ffmpeg      # macOS

Import errors

# Install all dependencies
pip install keyframe-scout[all]

GPU acceleration

# OpenCV will automatically use GPU if available
# Check GPU availability
import cv2
print(cv2.cuda.getCudaEnabledDeviceCount())

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

# Development setup
git clone https://github.com/yourusername/keyframe-scout.git
cd keyframe-scout
pip install -e ".[dev]"

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • OpenCV community for the excellent computer vision library
  • FFmpeg project for video processing capabilities
  • Inspired by video analysis needs in the VLM era

📮 Contact


Made with ❤️ for the VLM community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keyframe_scout-0.2.1.tar.gz (24.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

keyframe_scout-0.2.1-py3-none-any.whl (24.3 kB view details)

Uploaded Python 3

File details

Details for the file keyframe_scout-0.2.1.tar.gz.

File metadata

  • Download URL: keyframe_scout-0.2.1.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for keyframe_scout-0.2.1.tar.gz
Algorithm Hash digest
SHA256 9ce09e5ae829620a91384284325b4979bbdbaf667e656614b12c24f2850756f8
MD5 5e5db41b5c62f07c3b554084403cb3c8
BLAKE2b-256 cacc101540169d3ec71a8e2617704e8b75e65afc8254a59ac435d6fee3eb186c

See more details on using hashes here.

File details

Details for the file keyframe_scout-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: keyframe_scout-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 24.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for keyframe_scout-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 68093a60a46e7cd85f3ad6e16e252c16a6c477327632b64aaa4a1e89fb9ded67
MD5 7b44b18d0ea32fe883008568580a0b85
BLAKE2b-256 e1f0256b0798aa948e84a4b8389f331df336861bd3183846276949536294a0a4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page