Skip to main content

Intelligent video keyframe extraction for VLMs

Project description

KeyFrame Scout

[Python Version](https://www.python.org/downloads/) [License](LICENSE) [Version](https://github.com/yourusername/keyframe-scout)

An intelligent video keyframe extraction tool optimized for Vision Language Models (VLMs) and video analysis. Extract meaningful frames from videos using adaptive algorithms, with direct support for Azure OpenAI GPT and other VLMs.

✨ Key Features

  • 🎯 Intelligent Frame Selection: Three extraction modes (adaptive, interval, fixed) to suit different use cases
  • 🤖 VLM-Ready: Direct integration with Azure OpenAI GPT and other vision language models
  • 📦 Base64 Support: Return frames as base64 strings for immediate API usage
  • ⚡ Batch Processing: Process multiple videos efficiently with parallel execution
  • 🎨 Flexible Output: Save as files, return as base64, or both
  • 📊 Smart Analysis: Automatically identifies scene changes and important moments
  • 🔧 Easy Integration: Simple Python API and command-line interface

🚀 What's New in v0.2.2

  • Base64 Encoding: Direct base64 output for VLM integration
  • Azure OpenAI Support: Built-in integration for GPT
  • VLM Utilities: Helper functions for preparing frames for various VLMs
  • Batch Processing: Process entire directories of videos
  • Enhanced API: More flexible configuration options

📦 Installation

Using pip

pip install keyframe-scout

From source

git clone https://github.com/yourusername/keyframe-scout.git
cd keyframe-scout
pip install -e .

Dependencies

  • Python 3.7+
  • OpenCV (cv2)
  • NumPy
  • Pillow
  • FFmpeg (system dependency)

Install FFmpeg:

# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg

# macOS
brew install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

🎯 Quick Start

Basic Usage

import keyframe_scout as ks

# Extract keyframes from a video
result = ks.extract_video_keyframes({
    'video': 'path/to/video.mp4',
    'output_dir': 'output/frames',
    'nframes': 10
})

print(f"Extracted {result['extracted_frames']} frames")

VLM Integration (New!)

import keyframe_scout as ks
from openai import AzureOpenAI

# Extract frames for GPT
frames = ks.extract_frames_for_vlm(
    'video.mp4',
    max_frames=8,
    max_size=1024
)

# Prepare messages for Azure OpenAI
messages = ks.create_video_messages(
    'video.mp4',
    prompt="What's happening in this video?",
    max_frames=8
)

# Use with Azure OpenAI
client = AzureOpenAI(
    azure_endpoint="your-endpoint",
    api_key="your-key",
    api_version="2024-02-15-preview"
)

response = client.chat.completions.create(
    model="gpt-4-vision-preview",
    messages=messages,
    max_tokens=500
)

print(response.choices[0].message.content)

Base64 Output (New!)

# Get frames as base64 strings (no files saved)
result = ks.extract_video_keyframes({
    'video': 'video.mp4',
    'nframes': 5,
    'return_base64': True,
    'max_size': 1024
})

# Access base64 data
for frame in result['frames']:
    print(f"Frame at {frame['timestamp']}s")
    base64_data = frame['base64']
    # Use base64_data with your VLM API

📖 Detailed Usage

Extraction Modes

1. Adaptive Mode (Default)

Intelligently selects the most representative frames based on content analysis.

result = ks.extract_video_keyframes({
    'video': 'video.mp4',
    'output_dir': 'output',
    'mode': 'adaptive',
    'nframes': 10
})

2. Interval Mode

Extracts frames at fixed time intervals.

result = ks.extract_video_keyframes({
    'video': 'video.mp4',
    'output_dir': 'output',
    'mode': 'interval',
    'interval': 5.0,  # Every 5 seconds
    'frames_per_interval': 1
})

3. Fixed Mode

Extracts a fixed number of evenly distributed frames.

result = ks.extract_video_keyframes({
    'video': 'video.mp4',
    'output_dir': 'output',
    'mode': 'fixed',
    'frames_per_interval': 20  # Total 20 frames
})

VLM Integration Examples

Using the VideoAnalyzer Class

# Initialize analyzer
analyzer = ks.VideoAnalyzer(
    azure_endpoint="your-endpoint",
    api_key="your-key"
)

# Analyze video
result = analyzer.analyze_video(
    'video.mp4',
    prompt="Describe the main events in this video",
    max_frames=10
)

print(result)

Batch Video Analysis

# Analyze multiple videos
videos = ['video1.mp4', 'video2.mp4', 'video3.mp4']
prompts = ['What happens?', 'Who appears?', 'Where is this?']

results = analyzer.batch_analyze(videos, prompts, max_frames=8)

Custom VLM Integration

# Get frames for any VLM
frames = ks.extract_frames_for_vlm('video.mp4', max_frames=6)

# Prepare for your VLM API
for i, frame in enumerate(frames):
    image_data = {
        'base64': frame['base64'],
        'timestamp': frame['timestamp'],
        'description': f'Frame {i+1}'
    }
    # Send to your VLM API

Batch Processing

# Process all videos in a directory
results = ks.process_video_directory(
    directory='videos/',
    output_dir='output/',
    extensions=['.mp4', '.avi'],
    recursive=True,
    config_template={
        'mode': 'adaptive',
        'nframes': 10,
        'return_base64': True
    }
)

# Or process a list of videos
video_list = ['video1.mp4', 'video2.mp4', 'video3.mp4']
results = ks.extract_keyframes_batch(
    video_list,
    output_base_dir='batch_output/',
    max_workers=4
)

Advanced Configuration

config = {
    'video': 'video.mp4',
    'output_dir': 'output',
    'mode': 'adaptive',
    'nframes': 10,
    
    # Resolution options
    'resolution': '720p',  # '360p', '480p', '720p', '1080p', 'original'
    
    # Image options
    'image_format': 'jpg',  # 'jpg' or 'png'
    'image_quality': 95,    # 1-100 for JPEG
    
    # Base64 options (new)
    'return_base64': True,
    'include_files': False,  # Don't save files when using base64
    'max_size': 1024,       # Max dimension for base64 images
    
    # Analysis parameters
    'sample_rate': 30,      # Analyze every Nth frame
    'min_frames': 5,        # Minimum frames to extract
    'max_frames': 20        # Maximum frames to extract
}

result = ks.extract_video_keyframes(config)

🔧 Command Line Interface

Basic usage

# Extract 10 keyframes
keyframe-scout video.mp4 -o output_frames --nframes 10

# Use specific mode
keyframe-scout video.mp4 -o output_frames --mode interval --interval 5

# Set resolution and quality
keyframe-scout video.mp4 -o output_frames --resolution 720p --quality 90

Batch processing

# Process directory
keyframe-scout-batch videos/ -o batch_output/ --recursive

# With custom settings
keyframe-scout-batch videos/ -o batch_output/ --nframes 8 --resolution 480p

📊 API Reference

Core Functions

extract_video_keyframes(config)

Main extraction function with full configuration options.

extract_frames_for_vlm(video_path, max_frames, max_size, mode)

Extract frames optimized for VLM usage, returns base64 encoded frames.

create_video_messages(video_path, prompt, max_frames, system_prompt)

Create messages formatted for Azure OpenAI GPT.

get_video_info(video_path)

Get video metadata (duration, resolution, fps, etc).

VLM Utilities

prepare_for_azure_openai(video_path, max_frames, detail)

Prepare frames in Azure OpenAI format with detail level control.

estimate_token_usage(frames, detail)

Estimate token usage for GPT API calls.

save_base64_frames(frames, output_dir, prefix)

Save base64 encoded frames to files.

🎨 Examples

Video Summary for Blog

import keyframe_scout as ks

# Extract key moments from a video
frames = ks.extract_frames_for_vlm('tutorial.mp4', max_frames=6)

# Generate descriptions using GPT
analyzer = ks.VideoAnalyzer()
for i, frame in enumerate(frames):
    description = analyzer.analyze_video(
        'tutorial.mp4',
        f"Describe what's shown at {frame['timestamp']} seconds",
        max_frames=1
    )
    print(f"Time {frame['timestamp']}s: {description}")

Video Content Moderation

# Check video content
messages = ks.create_video_messages(
    'uploaded_video.mp4',
    prompt="Does this video contain any inappropriate content? List any concerns.",
    max_frames=10,
    system_prompt="You are a content moderation assistant."
)

# Send to your moderation API

Creating Video Thumbnails

# Extract best frames for thumbnails
result = ks.extract_video_keyframes({
    'video': 'video.mp4',
    'output_dir': 'thumbnails',
    'mode': 'adaptive',
    'nframes': 5,
    'resolution': '720p',
    'image_quality': 95
})

# The frames are automatically selected for maximum visual interest

🐛 Troubleshooting

FFmpeg not found

# Install FFmpeg
sudo apt install ffmpeg  # Ubuntu/Debian
brew install ffmpeg      # macOS

Import errors

# Install all dependencies
pip install keyframe-scout[all]

GPU acceleration

# OpenCV will automatically use GPU if available
# Check GPU availability
import cv2
print(cv2.cuda.getCudaEnabledDeviceCount())

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

# Development setup
git clone https://github.com/yourusername/keyframe-scout.git
cd keyframe-scout
pip install -e ".[dev]"

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • OpenCV community for the excellent computer vision library
  • FFmpeg project for video processing capabilities
  • Inspired by video analysis needs in the VLM era

📮 Contact


Made with ❤️ for the VLM community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keyframe_scout-0.2.3.tar.gz (24.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

keyframe_scout-0.2.3-py3-none-any.whl (24.7 kB view details)

Uploaded Python 3

File details

Details for the file keyframe_scout-0.2.3.tar.gz.

File metadata

  • Download URL: keyframe_scout-0.2.3.tar.gz
  • Upload date:
  • Size: 24.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for keyframe_scout-0.2.3.tar.gz
Algorithm Hash digest
SHA256 f7020afdbf6eb1fdb7a0046a72df570a300bb981e0764a4f5652b93a9987d706
MD5 cb52914d175af52bcd6c645ef7ee85a4
BLAKE2b-256 c1085455de59a6a9b203df8e38ed8c60062d999a83e94b18e09eb107a9e23bbc

See more details on using hashes here.

File details

Details for the file keyframe_scout-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: keyframe_scout-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 24.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for keyframe_scout-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 900eb47a1135807e0eb4c481b2505d06ac842acd6793e029ac1fd71f7af43a03
MD5 32feaed1e97f45056f909ea7cbab2ea3
BLAKE2b-256 c38019d2de63447d9fd7e435afd0f035492574d827cc03b445b53c1532ddde3d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page