Intelligent video keyframe extraction for VLMs

These details have not been verified by PyPI

Project links

Project description

KeyFrame Scout

[Python Version](https://www.python.org/downloads/) [License](LICENSE) [Version](https://github.com/yourusername/keyframe-scout)

An intelligent video keyframe extraction tool optimized for Vision Language Models (VLMs) and video analysis. Extract meaningful frames from videos using adaptive algorithms, with direct support for Azure OpenAI GPT and other VLMs.

✨ Key Features

🎯 Intelligent Frame Selection: Three extraction modes (adaptive, interval, fixed) to suit different use cases
🤖 VLM-Ready: Direct integration with Azure OpenAI GPT and other vision language models
📦 Base64 Support: Return frames as base64 strings for immediate API usage
⚡ Batch Processing: Process multiple videos efficiently with parallel execution
🎨 Flexible Output: Save as files, return as base64, or both
📊 Smart Analysis: Automatically identifies scene changes and important moments
🔧 Easy Integration: Simple Python API and command-line interface

🚀 What's New in v0.2.4

🧪 Full Test Coverage: Comprehensive test suite with 15 test cases covering all major features
🛡️ Enhanced Error Handling: Robust error handling for edge cases and invalid inputs
⚡ Optimized Performance: Improved video processing speed and memory efficiency
📖 Updated Documentation: Complete API documentation with real-world examples
🔧 Refined API: Better parameter validation and more intuitive function signatures

📦 Installation

Using pip (Recommended)

pip install keyframe-scout

From source

git clone https://github.com/yourusername/keyframe-scout.git
cd keyframe-scout
pip install -e .

Development installation

git clone https://github.com/yourusername/keyframe-scout.git
cd keyframe-scout
pip install -e ".[dev]"

Dependencies

Core Requirements:

Python 3.7+ (tested with Python 3.11)
OpenCV (opencv-python >= 4.5.0)
NumPy (>= 1.19.0)
Pillow (>= 8.0.0)
scikit-image (>= 0.18.0)
tqdm (>= 4.50.0)

System Dependencies:

FFmpeg (for video processing)

Optional Dependencies:

openai (>= 1.0.0) - for Azure OpenAI integration

Install FFmpeg

# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg

# macOS (with Homebrew)
brew install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html
# Or use: winget install FFmpeg

Verify Installation

import keyframe_scout as ks
ks.print_info()  # Check dependencies and features

🚀 Performance

KeyFrame Scout is optimized for speed and efficiency:

Fast Processing: 11.7-second video processed in < 4 seconds
Memory Efficient: Processes videos without loading entire video into memory
Smart Sampling: Analyzes every 30th frame by default for optimal speed/accuracy balance
Parallel Processing: Batch processing with configurable worker threads
Adaptive Algorithms: Automatically adjusts processing based on video characteristics

Benchmark Results (MacBook Pro M1):

1080p video, 12 seconds → 8 keyframes extracted in 3.66 seconds
Base64 encoding included, total memory usage < 100MB
Supports videos up to 4K resolution

🎯 Quick Start

Basic Usage

import keyframe_scout as ks

# Extract keyframes from a video
result = ks.extract_video_keyframes({
    'video': 'path/to/video.mp4',
    'output_dir': 'output/frames',
    'nframes': 10
})

print(f"Extracted {result['extracted_frames']} frames in {result['processing_time']:.2f}s")

VLM Integration (GPT-4V Ready!)

import keyframe_scout as ks

# Extract frames optimized for VLM
frames = ks.extract_frames_for_vlm(
    'video.mp4',
    max_frames=8,
    max_size=1024
)

# Create messages for Azure OpenAI GPT-4V
messages = ks.create_video_messages(
    'video.mp4',
    prompt="Analyze this video and describe the key events",
    max_frames=8,
    system_prompt="You are a professional video analyst."
)

# Estimate API costs before calling
token_estimate = ks.estimate_token_usage(frames, detail="auto")
print(f"Estimated tokens: {token_estimate['total_image_tokens']}")
print(f"Estimated cost: ${token_estimate['estimated_cost_usd']:.4f}")

# Use with Azure OpenAI (example)
# client = AzureOpenAI(...)
# response = client.chat.completions.create(model="gpt-4-vision-preview", messages=messages)

Base64 Output for APIs

# Get frames as base64 (perfect for API calls)
result = ks.extract_video_keyframes({
    'video': 'video.mp4',
    'nframes': 5,
    'return_base64': True,
    'max_size': 1024,
    'include_files': False  # Don't save files, just return base64
})

# Access base64 data for your API
for i, frame in enumerate(result['frames']):
    print(f"Frame {i+1} at {frame['timestamp']:.1f}s")
    # frame['base64'] contains the image data
    # Ready to send to any VLM API!

📖 Detailed Usage

Extraction Modes

1. Adaptive Mode (Default)

Intelligently selects the most representative frames based on content analysis.

result = ks.extract_video_keyframes({
    'video': 'video.mp4',
    'output_dir': 'output',
    'mode': 'adaptive',
    'nframes': 10
})

2. Interval Mode

Extracts frames at fixed time intervals.

result = ks.extract_video_keyframes({
    'video': 'video.mp4',
    'output_dir': 'output',
    'mode': 'interval',
    'interval': 5.0,  # Every 5 seconds
    'frames_per_interval': 1
})

3. Fixed Mode

Extracts a fixed number of evenly distributed frames.

result = ks.extract_video_keyframes({
    'video': 'video.mp4',
    'output_dir': 'output',
    'mode': 'fixed',
    'frames_per_interval': 20  # Total 20 frames
})

VLM Integration Examples

Complete Azure OpenAI Workflow

import keyframe_scout as ks
from openai import AzureOpenAI

# Initialize Azure OpenAI client
client = AzureOpenAI(
    azure_endpoint="your-endpoint",
    api_key="your-key",
    api_version="2024-02-15-preview"
)

# Extract and prepare video for analysis
messages = ks.create_video_messages(
    'video.mp4',
    prompt="Analyze this video and provide a detailed summary of the main events",
    max_frames=8,
    system_prompt="You are a professional video analyst. Provide structured analysis.",
    detail="high"  # Use high detail for better analysis
)

# Estimate costs before API call
frames = ks.extract_frames_for_vlm('video.mp4', max_frames=8)
cost_estimate = ks.estimate_token_usage(frames, detail="high")
print(f"Estimated cost: ${cost_estimate['estimated_cost_usd']:.4f}")

# Call Azure OpenAI
response = client.chat.completions.create(
    model="gpt-4-vision-preview",
    messages=messages,
    max_tokens=1000
)

print(response.choices[0].message.content)

Using the VideoAnalyzer Class (Simplified)

# For users who prefer a simpler interface
analyzer = ks.VideoAnalyzer(
    azure_endpoint="your-endpoint",
    api_key="your-key"
)

# One-line video analysis
result = analyzer.analyze_video(
    'video.mp4',
    prompt="What are the main subjects and activities in this video?",
    max_frames=10
)

print(result)

Batch Video Analysis

# Analyze multiple videos efficiently
videos = ['video1.mp4', 'video2.mp4', 'video3.mp4']
prompts = [
    'Describe the setting and atmosphere',
    'Identify the main characters or subjects',
    'Summarize the key actions or events'
]

results = analyzer.batch_analyze(videos, prompts, max_frames=6)
for video, result in zip(videos, results):
    print(f"{video}: {result}")

Custom VLM Integration (Any Provider)

# Extract frames for any VLM service
frames = ks.extract_frames_for_vlm('video.mp4', max_frames=6)

# Prepare data for your preferred VLM API
for i, frame in enumerate(frames):
    image_data = {
        'base64': frame['base64'],
        'timestamp': frame['timestamp'],
        'description': f'Frame {i+1} at {frame["timestamp"]:.1f}s'
    }
    # Send to OpenAI, Claude, Gemini, or any other VLM API
    # your_vlm_api.analyze(image_data)

Batch Processing

# Process all videos in a directory
results = ks.process_video_directory(
    directory='videos/',
    output_dir='output/',
    extensions=['.mp4', '.avi'],
    recursive=True,
    config_template={
        'mode': 'adaptive',
        'nframes': 10,
        'return_base64': True
    }
)

# Or process a list of videos
video_list = ['video1.mp4', 'video2.mp4', 'video3.mp4']
results = ks.extract_keyframes_batch(
    video_list,
    output_base_dir='batch_output/',
    max_workers=4
)

Advanced Configuration

config = {
    'video': 'video.mp4',
    'output_dir': 'output',
    'mode': 'adaptive',
    'nframes': 10,
    
    # Resolution options
    'resolution': '720p',  # '360p', '480p', '720p', '1080p', 'original'
    
    # Image options
    'image_format': 'jpg',  # 'jpg' or 'png'
    'image_quality': 95,    # 1-100 for JPEG
    
    # Base64 options (new)
    'return_base64': True,
    'include_files': False,  # Don't save files when using base64
    'max_size': 1024,       # Max dimension for base64 images
    
    # Analysis parameters
    'sample_rate': 30,      # Analyze every Nth frame
    'min_frames': 5,        # Minimum frames to extract
    'max_frames': 20        # Maximum frames to extract
}

result = ks.extract_video_keyframes(config)

🔧 Command Line Interface

Basic usage

# Extract 10 keyframes
keyframe-scout video.mp4 -o output_frames --nframes 10

# Use specific mode
keyframe-scout video.mp4 -o output_frames --mode interval --interval 5

# Set resolution and quality
keyframe-scout video.mp4 -o output_frames --resolution 720p --quality 90

Batch processing

# Process directory
keyframe-scout-batch videos/ -o batch_output/ --recursive

# With custom settings
keyframe-scout-batch videos/ -o batch_output/ --nframes 8 --resolution 480p

📊 Complete API Reference

Core Functions

`extract_video_keyframes(config: dict) -> dict`

Main extraction function with comprehensive configuration options.

Parameters:

video (str): Path to video file (required)
output_dir (str, optional): Output directory for saved frames
mode (str): Extraction mode - 'adaptive', 'interval', 'fixed' (default: 'adaptive')
nframes (int, optional): Number of frames to extract
resolution (str): Output resolution - 'original', '360p', '480p', '720p', '1080p'
image_format (str): Output format - 'jpg', 'png' (default: 'jpg')
image_quality (int): JPEG quality 1-100 (default: 95)
return_base64 (bool): Return base64 encoded frames (default: False)
max_size (int): Max dimension for base64 images (default: 1024)
include_files (bool): Save files when using base64 (default: True)

Returns: Dictionary with extraction results and metadata

`extract_frames_for_vlm(video_path, max_frames=10, max_size=1024, mode="adaptive") -> List[dict]`

Extract frames optimized for Vision Language Models.

Returns: List of dictionaries with 'base64', 'timestamp', and 'index' keys

`create_video_messages(video_path, prompt, max_frames=8, system_prompt=None, detail="auto") -> List[dict]`

Create messages formatted for Azure OpenAI GPT-4V.

Parameters:

detail (str): Image detail level - 'low', 'high', 'auto'

`get_video_info(video_path: str) -> dict`

Get comprehensive video metadata.

Returns: Dictionary with duration, fps, width, height, total_frames

VLM Utilities

`prepare_for_azure_openai(video_path, max_frames=8, detail="auto") -> List[dict]`

Prepare frames in Azure OpenAI format with automatic image sizing.

`estimate_token_usage(frames, detail="auto") -> dict`

Estimate GPT-4V token usage and costs.

Returns:

{
    "num_images": int,
    "tokens_per_image": int, 
    "total_image_tokens": int,
    "estimated_cost_usd": float
}

`save_base64_frames(frames, output_dir, prefix="frame") -> List[str]`

Save base64 encoded frames to files.

`frames_to_base64_urls(frames, detail="auto") -> List[dict]`

Convert frame data to Azure OpenAI image URL format.

Advanced Functions

`extract_keyframes_batch(video_list, output_base_dir, max_workers=4, **kwargs)`

Process multiple videos in parallel with configurable worker threads.

`create_batch_messages(video_paths, prompts, max_frames_per_video=5, system_prompt=None)`

Create messages for analyzing multiple videos in a single API call.

Utility Functions

`check_dependencies() -> dict`

Check availability of all required and optional dependencies.

`ensure_output_dir(directory: str)`

Create output directory if it doesn't exist.

`smart_frame_count(duration, mode, min_frames=3, max_frames=30, **kwargs) -> int`

Automatically determine optimal number of frames based on video duration and mode.

🎨 Real-World Examples

Video Content Analysis for Social Media

import keyframe_scout as ks

# Analyze social media video content
frames = ks.extract_frames_for_vlm('social_video.mp4', max_frames=8)

messages = ks.create_video_messages(
    'social_video.mp4',
    prompt="""Analyze this video for:
    1. Main subjects (people, objects, scenes)
    2. Activities and actions
    3. Emotional tone and atmosphere
    4. Any text or captions visible
    5. Potential engagement factors""",
    max_frames=8,
    system_prompt="You are a social media content analyst."
)

# Use with your preferred AI service

Educational Video Summarization

# Extract key learning moments from educational content
result = ks.extract_video_keyframes({
    'video': 'lecture.mp4',
    'mode': 'adaptive',
    'nframes': 12,
    'return_base64': True
})

# Create chapter summaries
for i, frame in enumerate(result['frames']):
    timestamp = frame['timestamp']
    # Analyze each key frame for educational content
    messages = ks.create_video_messages(
        'lecture.mp4',
        prompt=f"What key concept is being taught at {timestamp:.1f} seconds?",
        max_frames=1
    )
    # Send to AI for analysis

Video Thumbnail Generation

# Generate optimized thumbnails for video platforms
result = ks.extract_video_keyframes({
    'video': 'content_video.mp4',
    'output_dir': 'thumbnails',
    'mode': 'adaptive',
    'nframes': 5,
    'resolution': '1280x720',  # YouTube thumbnail size
    'image_quality': 95
})

print(f"Generated {result['extracted_frames']} thumbnail candidates")
# Frames are automatically selected for maximum visual interest

Content Moderation Pipeline

# Automated content screening
frames = ks.extract_frames_for_vlm('user_upload.mp4', max_frames=10)

moderation_prompt = """Review this video content for:
1. Inappropriate content
2. Violence or harmful activities  
3. Copyright violations (branded content)
4. Age-appropriate rating
Provide a safety score (1-10) and explanation."""

messages = ks.create_video_messages(
    'user_upload.mp4',
    prompt=moderation_prompt,
    max_frames=10,
    system_prompt="You are a content moderation specialist.",
    detail="high"
)

# Integrate with moderation service

Video Analytics Dashboard Data

# Extract frames for analytics dashboard
def analyze_video_metrics(video_path):
    # Get basic info
    info = ks.get_video_info(video_path)
    
    # Extract representative frames
    frames = ks.extract_frames_for_vlm(video_path, max_frames=6)
    
    # Estimate AI analysis costs
    cost = ks.estimate_token_usage(frames, detail="auto")
    
    return {
        'duration': info['duration'],
        'resolution': f"{info['width']}x{info['height']}",
        'frames_extracted': len(frames),
        'analysis_cost': cost['estimated_cost_usd'],
        'ready_for_ai': True
    }

# Use in your video management system
metrics = analyze_video_metrics('video.mp4')

🐛 Troubleshooting

Common Issues and Solutions

FFmpeg not found

# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg

# macOS (with Homebrew)
brew install ffmpeg

# Windows (with winget)
winget install FFmpeg

# Or download from: https://ffmpeg.org/download.html

Import errors

# Install with all dependencies
pip install keyframe-scout[all]

# Or install core dependencies manually
pip install opencv-python numpy pillow scikit-image tqdm

"No module named 'cv2'" error

# Reinstall OpenCV
pip uninstall opencv-python opencv-contrib-python
pip install opencv-python>=4.5.0

Memory issues with large videos

# Use smaller sampling rate and image size
config = {
    'video': 'large_video.mp4',
    'sample_rate': 60,  # Analyze every 60th frame (default: 30)
    'max_size': 512,    # Smaller base64 images
    'resolution': '720p'  # Lower output resolution
}

Performance optimization

# For faster processing
config = {
    'video': 'video.mp4',
    'mode': 'fixed',      # Fastest mode
    'sample_rate': 90,    # Less analysis
    'return_base64': True,
    'include_files': False  # Skip file I/O
}

GPU acceleration

# Check GPU availability
import cv2
print(f"CUDA devices: {cv2.cuda.getCudaEnabledDeviceCount()}")

# KeyFrame Scout will automatically use GPU if available

Azure OpenAI API errors

# Check your configuration
try:
    messages = ks.create_video_messages('video.mp4', 'test prompt')
    print("Messages created successfully")
except Exception as e:
    print(f"Error: {e}")
    
# Verify token estimation
frames = ks.extract_frames_for_vlm('video.mp4', max_frames=1)
cost = ks.estimate_token_usage(frames)
print(f"Estimated tokens: {cost['total_image_tokens']}")

Debugging Tips

# Enable detailed logging
import logging
logging.basicConfig(level=logging.INFO)

# Check system status
import keyframe_scout as ks
ks.print_info()  # Shows all dependencies and features

# Test with minimal config
result = ks.extract_video_keyframes({
    'video': 'test_video.mp4',
    'nframes': 1,
    'return_base64': True
})
print("Basic extraction works:", result['success'])

🤝 Contributing

We welcome contributions! Here's how to get started:

Development Setup

# Clone the repository
git clone https://github.com/yourusername/keyframe-scout.git
cd keyframe-scout

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -e ".[dev]"

# Run tests
python test_keyframe_scout.py
python test_vlm_features.py

Running Tests

# Core functionality tests
python test_keyframe_scout.py

# VLM integration tests  
python test_vlm_features.py

# Unit tests with pytest
pytest tests/

Areas for Contribution

Algorithm improvements: Better frame selection algorithms
Performance optimization: Faster video processing
New VLM integrations: Support for more AI services
Documentation: Examples, tutorials, API docs
Testing: More comprehensive test coverage

Pull Request Guidelines

Fork the repository
Create a feature branch: git checkout -b feature-name
Make your changes with tests
Ensure all tests pass
Submit a pull request with clear description

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

OpenCV Community - Excellent computer vision library
FFmpeg Project - Powerful video processing capabilities
VLM Community - Inspiration for vision-language model integration
Contributors - Thanks to all who help improve this project

📮 Contact & Support

GitHub Issues: Report bugs or request features
Discussions: Community discussions and Q&A
Email: cjj198909@gmail.com

Quick Links

Made with ❤️ for the Vision Language Model community

Empowering developers to build amazing video AI applications

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.4

Jun 11, 2025

0.2.3

Jun 11, 2025

0.2.2

Jun 11, 2025

0.2.1

Jun 11, 2025

0.2.0

Jun 10, 2025

0.1.0

Jun 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keyframe_scout-0.2.4.tar.gz (31.4 kB view details)

Uploaded Jun 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

keyframe_scout-0.2.4-py3-none-any.whl (27.9 kB view details)

Uploaded Jun 11, 2025 Python 3

File details

Details for the file keyframe_scout-0.2.4.tar.gz.

File metadata

Download URL: keyframe_scout-0.2.4.tar.gz
Upload date: Jun 11, 2025
Size: 31.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for keyframe_scout-0.2.4.tar.gz
Algorithm	Hash digest
SHA256	`5fffbce4080bbb39a94b7334eda9542dc03b08c1b9142055e733b685d1a68452`
MD5	`ab5caa4feb68c602e9c3bceeecdb8a18`
BLAKE2b-256	`c30bfd4b4b0496b7006d3dda08807336f31993bfceb551af637f797a0888b465`

See more details on using hashes here.

File details

Details for the file keyframe_scout-0.2.4-py3-none-any.whl.

File metadata

Download URL: keyframe_scout-0.2.4-py3-none-any.whl
Upload date: Jun 11, 2025
Size: 27.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for keyframe_scout-0.2.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1f717045480b68d651192ea40218b591364d15cb7b6120722510e5e907af2ac5`
MD5	`a68bc9e8d75d90a2b973445834b34989`
BLAKE2b-256	`f1f015641caa0155ec982deb9e54c1daf4228bf956ee7db5486748bed505af86`

See more details on using hashes here.

keyframe-scout 0.2.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

KeyFrame Scout

✨ Key Features

🚀 What's New in v0.2.4

📦 Installation

Using pip (Recommended)

From source

Development installation

Dependencies

Install FFmpeg

Verify Installation

🚀 Performance

🎯 Quick Start

Basic Usage

VLM Integration (GPT-4V Ready!)

Base64 Output for APIs

📖 Detailed Usage

Extraction Modes

1. Adaptive Mode (Default)

2. Interval Mode

3. Fixed Mode

VLM Integration Examples

Complete Azure OpenAI Workflow

Using the VideoAnalyzer Class (Simplified)

Batch Video Analysis

Custom VLM Integration (Any Provider)

Batch Processing

Advanced Configuration

🔧 Command Line Interface

Basic usage

Batch processing

📊 Complete API Reference

Core Functions

extract_video_keyframes(config: dict) -> dict

extract_frames_for_vlm(video_path, max_frames=10, max_size=1024, mode="adaptive") -> List[dict]

create_video_messages(video_path, prompt, max_frames=8, system_prompt=None, detail="auto") -> List[dict]

get_video_info(video_path: str) -> dict

VLM Utilities

prepare_for_azure_openai(video_path, max_frames=8, detail="auto") -> List[dict]

estimate_token_usage(frames, detail="auto") -> dict

save_base64_frames(frames, output_dir, prefix="frame") -> List[str]

frames_to_base64_urls(frames, detail="auto") -> List[dict]

Advanced Functions

extract_keyframes_batch(video_list, output_base_dir, max_workers=4, **kwargs)

create_batch_messages(video_paths, prompts, max_frames_per_video=5, system_prompt=None)

Utility Functions

check_dependencies() -> dict

ensure_output_dir(directory: str)

smart_frame_count(duration, mode, min_frames=3, max_frames=30, **kwargs) -> int

🎨 Real-World Examples

Video Content Analysis for Social Media

Educational Video Summarization

Video Thumbnail Generation

Content Moderation Pipeline

Video Analytics Dashboard Data

🐛 Troubleshooting

Common Issues and Solutions

FFmpeg not found

Import errors

"No module named 'cv2'" error

Memory issues with large videos

Performance optimization

GPU acceleration

Azure OpenAI API errors

Debugging Tips

🤝 Contributing

Development Setup

Running Tests

Areas for Contribution

Pull Request Guidelines

📄 License

🙏 Acknowledgments

`extract_video_keyframes(config: dict) -> dict`

`extract_frames_for_vlm(video_path, max_frames=10, max_size=1024, mode="adaptive") -> List[dict]`

`create_video_messages(video_path, prompt, max_frames=8, system_prompt=None, detail="auto") -> List[dict]`

`get_video_info(video_path: str) -> dict`

`prepare_for_azure_openai(video_path, max_frames=8, detail="auto") -> List[dict]`

`estimate_token_usage(frames, detail="auto") -> dict`

`save_base64_frames(frames, output_dir, prefix="frame") -> List[str]`

`frames_to_base64_urls(frames, detail="auto") -> List[dict]`

`extract_keyframes_batch(video_list, output_base_dir, max_workers=4, **kwargs)`

`create_batch_messages(video_paths, prompts, max_frames_per_video=5, system_prompt=None)`

`check_dependencies() -> dict`

`ensure_output_dir(directory: str)`

`smart_frame_count(duration, mode, min_frames=3, max_frames=30, **kwargs) -> int`