Skip to main content

Multimodal video understanding for Claude Code — extract frames, transcribe audio, build timelines from any video

Project description

vidclaude — Multimodal Video Understanding

A Python CLI tool that extracts structured evidence from videos (frames, audio transcript, OCR, temporal timeline) for analysis by Claude.

Quick Start

Option 1: npm (easiest)

npm install -g vidclaude
vidclaude your_video.mp4 --extract --mode standard --verbose

Option 2: pip

pip install vidclaude
vidclaude your_video.mp4 --extract --mode standard --verbose

Option 3: From source

git clone <repo-url> && cd claudevid
setup.bat          # Windows
bash setup.sh      # macOS / Linux
python video_understand.py your_video.mp4 --extract --mode standard --verbose

No API key needed. If you have a Claude Max/Pro plan, the tool works entirely through Claude Code — Claude in your conversation does the reasoning.

How It Works

Video File → ffmpeg extraction → Frames + Audio + Metadata
                                        ↓
                              faster-whisper large-v3 → Transcript (Hindi, English, 90+ languages)
                              pytesseract → OCR text (optional)
                              Shot detection → Scene boundaries
                                        ↓
                              Timeline builder → Unified event list
                                        ↓
                              evidence.md + cached frames → Claude reasons over it

No API key needed. Your Claude Max/Pro plan covers everything. The tool extracts evidence, and Claude in your conversation reasons over it.

Prerequisites

Requirement How to install
Python 3.10+ python.org
ffmpeg Windows: winget install ffmpeg / macOS: brew install ffmpeg / Linux: sudo apt install ffmpeg

Installation

Option A: One-line setup (recommended)

# Windows
setup.bat

# macOS / Linux
bash setup.sh

Option B: Manual

pip install -r requirements.txt

Optional extras

pip install pytesseract    # OCR (also needs Tesseract binary)
pip install anthropic      # Only for standalone --api mode

Usage

Inside Claude Code (recommended)

  1. Copy SKILL.md into your project (or keep it here)
  2. Ask Claude: "analyze the video at D:/path/to/video.mp4"
  3. Claude runs the extraction, reads the evidence, and answers
  4. Ask follow-up questions — the cache is reused instantly

From the command line

# Standard analysis (recommended for most videos)
python video_understand.py video.mp4 --extract --mode standard --verbose

# Quick analysis (fast, fewer frames)
python video_understand.py video.mp4 --extract --mode quick

# Deep analysis (dense frames, full OCR)
python video_understand.py video.mp4 --extract --mode deep --verbose

# Process a folder of videos
python video_understand.py ./videos/ --extract --verbose

# Skip audio / OCR
python video_understand.py video.mp4 --extract --no-audio --no-ocr

# Force fresh extraction (ignore cache)
python video_understand.py video.mp4 --extract --no-cache --verbose

Processing Modes

Mode Frames Audio model OCR Best for
quick ~20, uniform sampling whisper base skip Fast overview, short clips
standard ~60, shot-aware whisper large-v3 keyframes General analysis
deep ~150, burst sampling whisper large-v3 all frames Detailed review, long videos

Caching

First run extracts everything to .vidcache/<hash>/:

.vidcache/a3f7b2c1/
  meta.json          # Video metadata
  frames/            # Extracted JPEG frames
  transcript.json    # Timestamped transcript
  ocr.json           # OCR results
  timeline.json      # Merged timeline
  evidence.md        # Human-readable report

Follow-up questions reuse the cache — no re-extraction needed. Delete .vidcache/ to free disk space.

CLI Reference

Flag Default Description
input required Video file or folder path
--extract - Extract only (skill mode, no API key)
-q "..." none Question (for --api mode)
--mode standard quick / standard / deep
-f N auto FPS override
-m N auto Max frames override
--no-audio - Skip transcription
--no-ocr - Skip OCR
--no-cache - Force re-extraction
--verbose - Detailed progress
-o file stdout Output file
--batch-summary - Cross-video summary for folders

Project Structure

video_understand.py          # CLI entry point
SKILL.md                     # Claude Code skill definition
setup.bat / setup.sh         # One-click setup scripts
requirements.txt             # Python dependencies
vidclaude/
  cli.py                     # Argument parsing, orchestration
  models.py                  # Data model (VideoMeta, Frame, Shot, etc.)
  ingest.py                  # Layer A: Video validation + metadata
  segment.py                 # Layer B+C: Shot detection + adaptive sampling
  audio.py                   # Layer D: faster-whisper transcription
  ocr.py                     # Layer E: Text extraction from frames
  intent.py                  # Intent classification (adjusts pipeline)
  timeline.py                # Layer G: Temporal event merging
  memory.py                  # Layer I: Hierarchical summaries
  reason.py                  # Layer J: Evidence assembly
  util.py                    # Shared helpers

Architecture

Based on claude_video_understanding_architecture.md, this tool implements a multi-layer video understanding pipeline:

  • Layer A (Ingestion): Format validation, ffprobe metadata
  • Layer B (Segmentation): Shot boundary detection via scene filter
  • Layer C (Adaptive Sampling): Content-aware frame selection
  • Layer D (Audio): faster-whisper large-v3 ASR with timestamps (90+ languages)
  • Layer E (OCR): Text extraction from key frames
  • Layer G (Timeline): Unified temporal event list
  • Layer I (Memory): Hierarchical summaries for long videos
  • Layer J (Reasoning): Evidence assembly for Claude

Claude serves as the reasoning brain — the tool provides structured, time-grounded evidence for Claude to analyze.

Language Support

Uses faster-whisper with large-v3 model which supports 90+ languages including: Hindi, English, Spanish, French, German, Chinese, Japanese, Arabic, and more. Language is auto-detected with confidence scoring.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vidclaude-0.1.0.tar.gz (24.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vidclaude-0.1.0-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file vidclaude-0.1.0.tar.gz.

File metadata

  • Download URL: vidclaude-0.1.0.tar.gz
  • Upload date:
  • Size: 24.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for vidclaude-0.1.0.tar.gz
Algorithm Hash digest
SHA256 488e8afc339bebc43d121eafad9b1f36c31bf9ae7dd2029216b81a332b653f8d
MD5 df1ed71669722fd6f6ba803482568707
BLAKE2b-256 c34774d2775f366b9b7070af522827bb637ab0ed09934000d00aa1e6f416b497

See more details on using hashes here.

File details

Details for the file vidclaude-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: vidclaude-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for vidclaude-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 acb9797f11abf63b9939fa5a36aac96b142566e9febba09e951d8624aceeabea
MD5 ef3e1669bbb30c5bc0a008780ee5df26
BLAKE2b-256 9b0526ec07acf0a55bcd33ff7877ad778d7def3035bd4c050a8a7e6f5a9bbd8b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page