Multimodal video understanding for Claude Code — extract frames, transcribe audio, build timelines from any video

These details have not been verified by PyPI

Project description

vidclaude — Multimodal Video Understanding

A Python CLI tool that extracts structured evidence from videos (frames, audio transcript, OCR, temporal timeline) for analysis by Claude.

Quick Start

Option 1: npm (easiest)

npm install -g vidclaude
vidclaude your_video.mp4 --extract --mode standard --verbose

Option 2: pip

pip install vidclaude
vidclaude your_video.mp4 --extract --mode standard --verbose

Option 3: From source

git clone <repo-url> && cd claudevid
setup.bat          # Windows
bash setup.sh      # macOS / Linux
python video_understand.py your_video.mp4 --extract --mode standard --verbose

No API key needed. If you have a Claude Max/Pro plan, the tool works entirely through Claude Code — Claude in your conversation does the reasoning.

How It Works

Video File → ffmpeg extraction → Frames + Audio + Metadata
                                        ↓
                              faster-whisper large-v3 → Transcript (Hindi, English, 90+ languages)
                              pytesseract → OCR text (optional)
                              Shot detection → Scene boundaries
                                        ↓
                              Timeline builder → Unified event list
                                        ↓
                              evidence.md + cached frames → Claude reasons over it

No API key needed. Your Claude Max/Pro plan covers everything. The tool extracts evidence, and Claude in your conversation reasons over it.

Prerequisites

Requirement	How to install
Python 3.10+	python.org
ffmpeg	Windows: `winget install ffmpeg` / macOS: `brew install ffmpeg` / Linux: `sudo apt install ffmpeg`

Installation

Option A: One-line setup (recommended)

# Windows
setup.bat

# macOS / Linux
bash setup.sh

Option B: Manual

pip install -r requirements.txt

Optional extras

pip install pytesseract    # OCR (also needs Tesseract binary)
pip install anthropic      # Only for standalone --api mode

Usage

Inside Claude Code (recommended)

Copy SKILL.md into your project (or keep it here)
Ask Claude: "analyze the video at D:/path/to/video.mp4"
Claude runs the extraction, reads the evidence, and answers
Ask follow-up questions — the cache is reused instantly

From the command line

# Standard analysis (recommended for most videos)
python video_understand.py video.mp4 --extract --mode standard --verbose

# Quick analysis (fast, fewer frames)
python video_understand.py video.mp4 --extract --mode quick

# Deep analysis (dense frames, full OCR)
python video_understand.py video.mp4 --extract --mode deep --verbose

# Process a folder of videos
python video_understand.py ./videos/ --extract --verbose

# Skip audio / OCR
python video_understand.py video.mp4 --extract --no-audio --no-ocr

# Force fresh extraction (ignore cache)
python video_understand.py video.mp4 --extract --no-cache --verbose

Processing Modes

Mode	Frames	Audio model	OCR	Best for
`quick`	~20, uniform sampling	whisper base	skip	Fast overview, short clips
`standard`	~60, shot-aware	whisper large-v3	keyframes	General analysis
`deep`	~150, burst sampling	whisper large-v3	all frames	Detailed review, long videos

Caching

First run extracts everything to .vidcache/<hash>/:

.vidcache/a3f7b2c1/
  meta.json          # Video metadata
  frames/            # Extracted JPEG frames
  transcript.json    # Timestamped transcript
  ocr.json           # OCR results
  timeline.json      # Merged timeline
  evidence.md        # Human-readable report

Follow-up questions reuse the cache — no re-extraction needed. Delete .vidcache/ to free disk space.

CLI Reference

Flag	Default	Description
`input`	required	Video file or folder path
`--extract`	-	Extract only (skill mode, no API key)
`-q "..."`	none	Question (for --api mode)
`--mode`	standard	`quick` / `standard` / `deep`
`-f N`	auto	FPS override
`-m N`	auto	Max frames override
`--no-audio`	-	Skip transcription
`--no-ocr`	-	Skip OCR
`--no-cache`	-	Force re-extraction
`--verbose`	-	Detailed progress
`-o file`	stdout	Output file
`--batch-summary`	-	Cross-video summary for folders

Project Structure

video_understand.py          # CLI entry point
SKILL.md                     # Claude Code skill definition
setup.bat / setup.sh         # One-click setup scripts
requirements.txt             # Python dependencies
vidclaude/
  cli.py                     # Argument parsing, orchestration
  models.py                  # Data model (VideoMeta, Frame, Shot, etc.)
  ingest.py                  # Layer A: Video validation + metadata
  segment.py                 # Layer B+C: Shot detection + adaptive sampling
  audio.py                   # Layer D: faster-whisper transcription
  ocr.py                     # Layer E: Text extraction from frames
  intent.py                  # Intent classification (adjusts pipeline)
  timeline.py                # Layer G: Temporal event merging
  memory.py                  # Layer I: Hierarchical summaries
  reason.py                  # Layer J: Evidence assembly
  util.py                    # Shared helpers

Architecture

Based on claude_video_understanding_architecture.md, this tool implements a multi-layer video understanding pipeline:

Layer A (Ingestion): Format validation, ffprobe metadata
Layer B (Segmentation): Shot boundary detection via scene filter
Layer C (Adaptive Sampling): Content-aware frame selection
Layer D (Audio): faster-whisper large-v3 ASR with timestamps (90+ languages)
Layer E (OCR): Text extraction from key frames
Layer G (Timeline): Unified temporal event list
Layer I (Memory): Hierarchical summaries for long videos
Layer J (Reasoning): Evidence assembly for Claude

Claude serves as the reasoning brain — the tool provides structured, time-grounded evidence for Claude to analyze.

Language Support

Uses faster-whisper with large-v3 model which supports 90+ languages including: Hindi, English, Spanish, French, German, Chinese, Japanese, Arabic, and more. Language is auto-detected with confidence scoring.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.2

Apr 5, 2026

This version

0.2.0

Apr 5, 2026

0.1.0

Apr 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vidclaude-0.2.0.tar.gz (26.3 kB view details)

Uploaded Apr 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vidclaude-0.2.0-py3-none-any.whl (28.7 kB view details)

Uploaded Apr 5, 2026 Python 3

File details

Details for the file vidclaude-0.2.0.tar.gz.

File metadata

Download URL: vidclaude-0.2.0.tar.gz
Upload date: Apr 5, 2026
Size: 26.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for vidclaude-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`c5618b92c7d3db88ece7da0cff202ae6d147d32454a9e673fe019c434f9b2a13`
MD5	`c95695a8260bfdf2926395024988af52`
BLAKE2b-256	`64216fab2d9139c27f6c92d7fd14237d64b748d8080f635555b46b821024f8c5`

See more details on using hashes here.

File details

Details for the file vidclaude-0.2.0-py3-none-any.whl.

File metadata

Download URL: vidclaude-0.2.0-py3-none-any.whl
Upload date: Apr 5, 2026
Size: 28.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for vidclaude-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`52085c710c71cca79b37300b96c2d68c12a0a513cd912b128ec19a8715fc27d2`
MD5	`3e39784bcb3da951754b82b15dda3d48`
BLAKE2b-256	`28b1b5c18692ae1d300c05f2db259454a61c164d4a014d98cc1d5bb643fd2f84`

See more details on using hashes here.

vidclaude 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

vidclaude — Multimodal Video Understanding

Quick Start

Option 1: npm (easiest)

Option 2: pip

Option 3: From source

How It Works

Prerequisites

Installation

Option A: One-line setup (recommended)

Option B: Manual

Optional extras

Usage

Inside Claude Code (recommended)

From the command line

Processing Modes

Caching

CLI Reference

Project Structure

Architecture

Language Support

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes