Skip to main content

Multimodal video understanding for Claude Code — extract frames, transcribe audio, build timelines from any video

Project description

vidclaude — Multimodal Video Understanding

A Python CLI tool that extracts structured evidence from videos (frames, audio transcript, OCR, temporal timeline) for analysis by Claude.

Quick Start

Option 1: npm (easiest)

npm install -g vidclaude
vidclaude your_video.mp4 --extract --mode standard --verbose

Option 2: pip

pip install vidclaude
vidclaude your_video.mp4 --extract --mode standard --verbose

Option 3: From source

git clone <repo-url> && cd claudevid
setup.bat          # Windows
bash setup.sh      # macOS / Linux
python video_understand.py your_video.mp4 --extract --mode standard --verbose

No API key needed. If you have a Claude Max/Pro plan, the tool works entirely through Claude Code — Claude in your conversation does the reasoning.

How It Works

Video File → ffmpeg extraction → Frames + Audio + Metadata
                                        ↓
                              faster-whisper large-v3 → Transcript (Hindi, English, 90+ languages)
                              pytesseract → OCR text (optional)
                              Shot detection → Scene boundaries
                                        ↓
                              Timeline builder → Unified event list
                                        ↓
                              evidence.md + cached frames → Claude reasons over it

No API key needed. Your Claude Max/Pro plan covers everything. The tool extracts evidence, and Claude in your conversation reasons over it.

Prerequisites

Requirement How to install
Python 3.10+ python.org
ffmpeg Windows: winget install ffmpeg / macOS: brew install ffmpeg / Linux: sudo apt install ffmpeg

Installation

Option A: One-line setup (recommended)

# Windows
setup.bat

# macOS / Linux
bash setup.sh

Option B: Manual

pip install -r requirements.txt

Optional extras

pip install pytesseract    # OCR (also needs Tesseract binary)
pip install anthropic      # Only for standalone --api mode

Usage

Inside Claude Code (recommended)

  1. Copy SKILL.md into your project (or keep it here)
  2. Ask Claude: "analyze the video at D:/path/to/video.mp4"
  3. Claude runs the extraction, reads the evidence, and answers
  4. Ask follow-up questions — the cache is reused instantly

From the command line

# Standard analysis (recommended for most videos)
python video_understand.py video.mp4 --extract --mode standard --verbose

# Quick analysis (fast, fewer frames)
python video_understand.py video.mp4 --extract --mode quick

# Deep analysis (dense frames, full OCR)
python video_understand.py video.mp4 --extract --mode deep --verbose

# Process a folder of videos
python video_understand.py ./videos/ --extract --verbose

# Skip audio / OCR
python video_understand.py video.mp4 --extract --no-audio --no-ocr

# Force fresh extraction (ignore cache)
python video_understand.py video.mp4 --extract --no-cache --verbose

Processing Modes

Mode Frames Audio model OCR Best for
quick ~20, uniform sampling whisper base skip Fast overview, short clips
standard ~60, shot-aware whisper large-v3 keyframes General analysis
deep ~150, burst sampling whisper large-v3 all frames Detailed review, long videos

Caching

First run extracts everything to .vidcache/<hash>/:

.vidcache/a3f7b2c1/
  meta.json          # Video metadata
  frames/            # Extracted JPEG frames
  transcript.json    # Timestamped transcript
  ocr.json           # OCR results
  timeline.json      # Merged timeline
  evidence.md        # Human-readable report

Follow-up questions reuse the cache — no re-extraction needed. Delete .vidcache/ to free disk space.

CLI Reference

Flag Default Description
input required Video file or folder path
--extract - Extract only (skill mode, no API key)
-q "..." none Question (for --api mode)
--mode standard quick / standard / deep
-f N auto FPS override
-m N auto Max frames override
--no-audio - Skip transcription
--no-ocr - Skip OCR
--no-cache - Force re-extraction
--verbose - Detailed progress
-o file stdout Output file
--batch-summary - Cross-video summary for folders

Project Structure

video_understand.py          # CLI entry point
SKILL.md                     # Claude Code skill definition
setup.bat / setup.sh         # One-click setup scripts
requirements.txt             # Python dependencies
vidclaude/
  cli.py                     # Argument parsing, orchestration
  models.py                  # Data model (VideoMeta, Frame, Shot, etc.)
  ingest.py                  # Layer A: Video validation + metadata
  segment.py                 # Layer B+C: Shot detection + adaptive sampling
  audio.py                   # Layer D: faster-whisper transcription
  ocr.py                     # Layer E: Text extraction from frames
  intent.py                  # Intent classification (adjusts pipeline)
  timeline.py                # Layer G: Temporal event merging
  memory.py                  # Layer I: Hierarchical summaries
  reason.py                  # Layer J: Evidence assembly
  util.py                    # Shared helpers

Architecture

Based on claude_video_understanding_architecture.md, this tool implements a multi-layer video understanding pipeline:

  • Layer A (Ingestion): Format validation, ffprobe metadata
  • Layer B (Segmentation): Shot boundary detection via scene filter
  • Layer C (Adaptive Sampling): Content-aware frame selection
  • Layer D (Audio): faster-whisper large-v3 ASR with timestamps (90+ languages)
  • Layer E (OCR): Text extraction from key frames
  • Layer G (Timeline): Unified temporal event list
  • Layer I (Memory): Hierarchical summaries for long videos
  • Layer J (Reasoning): Evidence assembly for Claude

Claude serves as the reasoning brain — the tool provides structured, time-grounded evidence for Claude to analyze.

Language Support

Uses faster-whisper with large-v3 model which supports 90+ languages including: Hindi, English, Spanish, French, German, Chinese, Japanese, Arabic, and more. Language is auto-detected with confidence scoring.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vidclaude-0.2.0.tar.gz (26.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vidclaude-0.2.0-py3-none-any.whl (28.7 kB view details)

Uploaded Python 3

File details

Details for the file vidclaude-0.2.0.tar.gz.

File metadata

  • Download URL: vidclaude-0.2.0.tar.gz
  • Upload date:
  • Size: 26.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for vidclaude-0.2.0.tar.gz
Algorithm Hash digest
SHA256 c5618b92c7d3db88ece7da0cff202ae6d147d32454a9e673fe019c434f9b2a13
MD5 c95695a8260bfdf2926395024988af52
BLAKE2b-256 64216fab2d9139c27f6c92d7fd14237d64b748d8080f635555b46b821024f8c5

See more details on using hashes here.

File details

Details for the file vidclaude-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: vidclaude-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 28.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for vidclaude-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 52085c710c71cca79b37300b96c2d68c12a0a513cd912b128ec19a8715fc27d2
MD5 3e39784bcb3da951754b82b15dda3d48
BLAKE2b-256 28b1b5c18692ae1d300c05f2db259454a61c164d4a014d98cc1d5bb643fd2f84

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page