Skip to main content

Understand and describe your videos with SmolVLM2, Whisper and Qwen. Fully local.

Project description

🎬 Video Understanding Local

Video Understanding

AI-powered video analysis that combines audio transcription and visual understanding - running entirely on your machine!

Perfect for organizing raw footage, preparing video montages, or extracting insights from your video content - all while keeping your data completely private on your own machine.

✨ Key Features

  • 👁️ Visual Scene Analysis: Frame-by-frame understanding with SmolVLM2
  • 🎙️ Audio Undrestanding: Speech-to-text using Whisper model
  • 🔒 Fully Offline: All models run locally - under your entire control

🚀 Installation

Simply install from PyPI:

pip install video_understanding

💡 Quick Start

from video_understanding.video_understanding import analyze_video

# Analyze any video file
summary = analyze_video("path/to/video.mp4")
print(summary)

That's it! The AI will watch your video and tell you what's happening.

📸 Example: Auto-Organizing Raw Footage

Here's a practical example that can save you hours - automatically renaming raw video files based on their content:

import os
from video_understanding.video_understanding import analyze_video

# Define what you want the AI to focus on
system_prompt = """
Analyze this video and generate a concise filename that describes 
the main action and subject. Use lowercase with underscores.
Focus on key visual elements and dialogue.
"""

# Process videos in your folder
video_folder = "raw_footage"

for video_file in os.listdir(video_folder):
    video_path = os.path.join(video_folder, video_file)
    print(f"🎬 Processing: {video_file}")
    
    # Get AI-generated descriptive name
    new_name = analyze_video(video_path, system_prompt)
    ext = video_file.split(".")[-1]
    new_filename = f"{new_name}.{ext}"
    
    # Rename file
    os.rename(video_path, os.path.join(video_folder, new_filename))
    print(f"✅ Renamed to: {new_filename}\n")

This transforms generic filenames like "VIDEO_001.mp4" into descriptive ones like "team_assembling_robot_arm.mp4", making your footage instantly searchable and organized! 🎉

📝 Description

This package analyzes videos using AI models to understand both audio and visual content. It intelligently splits long videos into manageable chunks, transcribes speech with Whisper, analyzes visual scenes with SmolVLM2, and generates comprehensive summaries using Qwen2.5.

💻 Requirements

Hardware

  • GPU: CUDA-compatible GPU strongly recommended (NVIDIA)
  • VRAM: Minimum 8GB for smooth operation
  • Disk Space: ~25GB for model storage

Software

  • Python 3.10+
  • CUDA toolkit
  • FFmpeg (if on Windows, install it with winget install ffmpeg --version 7.1.1)

Models (Downloaded Automatically)

On first run, these models will be downloaded:

  • 🎙️ Whisper Base (~140MB) - audio transcription
  • 👁️ SmolVLM2-2.2B-Instruct (~9GB) - visual understanding
  • 🧠 Qwen2.5-7B-Instruct (~14GB) - final summary generation

More examples

# General analysis
summary = analyze_video("video.mp4")

# Custom analysis with specific instructions
custom_prompt = "List all technical topics discussed in this video."
topics = analyze_video("video.mp4", system_prompt=custom_prompt)

# Extract specific information
prompt = "Describe what tools and materials are being used in this tutorial."
tools_list = analyze_video("tutorial.mp4", system_prompt=prompt)

🎉 Happy Video Understanding!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

video_understanding-0.1.1.tar.gz (717.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

video_understanding-0.1.1-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file video_understanding-0.1.1.tar.gz.

File metadata

  • Download URL: video_understanding-0.1.1.tar.gz
  • Upload date:
  • Size: 717.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for video_understanding-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f6216c18181b71a47b9dd76f759d7c27df6c1cb97bf86b8c055074e52711f8cd
MD5 c3adde48dbfe6cb10b8ca1de883155a6
BLAKE2b-256 4020b0288f9bf8e5def5d12c2ad9bcb0272156c460ee185673b47f4139b40a92

See more details on using hashes here.

File details

Details for the file video_understanding-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for video_understanding-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 588846435e9c45020bc297e9fa7f02889c0fbe9ad6f9f4eec9f8255e8ad9e474
MD5 4b64d64bf3c7c2a11fbd8986b9e91421
BLAKE2b-256 cdfdfb40675cf901ff9a7c2d038516d644c86b708de7741be6e389b0e8c9686a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page