LLMs can't watch videos. vidwise gives them eyes.
Project description
Videos are the biggest blind spot for AI. A 5-minute Loom bug report, a 30-minute tutorial, a conference talk — all completely opaque to your LLM. You either watch the whole thing yourself or lose the knowledge.
vidwise extracts the visual and audio knowledge from any video into structured, LLM-consumable markdown. Feed the output to any LLM and it instantly "understands" the video.
Video ─→ vidwise ─→ Transcript + Key Frames + Visual Guide ─→ LLM Context
What can you do with it?
| Scenario | What happens |
|---|---|
| Debug a Loom bug report | Feed the output to Claude → it "sees" the bug, the UI state, the error messages |
| Absorb a tutorial | 30-min coding video → structured knowledge your LLM can answer questions about |
| Process a meeting | Extract decisions, action items, and what was on screen |
| Learn from a talk | Turn any conference presentation into searchable, queryable knowledge |
| Onboard faster | Training videos become AI-queryable — new hires get instant answers |
Why vidwise?
| See the whole picture | Most tools only extract audio. vidwise captures both what was said and what was shown — UI states, error messages, slides, code, diagrams. |
| Process once, query forever | The output is a self-contained artifact. Feed it to any LLM, any number of times, at zero additional cost. No re-uploading, no re-processing. |
| Works with any LLM | Standard markdown + images. Claude, GPT, Gemini, Llama, Mistral — whatever you use. No vendor lock-in. |
| Your video stays local | Whisper and ffmpeg run on your machine. Nothing leaves your computer unless you opt into AI guide generation. |
| Smart, not brute-force | Pixel-difference analysis keeps only frames where the visual content actually changed. Less noise, better LLM understanding. |
| Human-readable AND machine-readable | The output isn't just for LLMs — guide.md is a visual walkthrough you can read, share, and bookmark. One command, two audiences. |
| One command | vidwise recording.mp4 → transcript, key frames, and visual guide in a single portable directory. |
Not just for LLMs. The visual guide vidwise generates is a fully readable document with embedded screenshots — open it in VS Code, Obsidian, or GitHub and you have a skimmable walkthrough of the entire video. Share it with your team, bookmark it for later, or feed it to any LLM. One artifact, two audiences.
Quick Start
# Install
pip install vidwise
# Process a local video
vidwise recording.mp4
# Process a YouTube video
vidwise https://youtube.com/watch?v=abc
# With AI-powered visual guide
export ANTHROPIC_API_KEY=sk-... # or OPENAI_API_KEY
vidwise recording.mp4 --provider claude
Prerequisites
- Python 3.10+
- ffmpeg —
brew install ffmpeg(macOS) orapt install ffmpeg(Linux)
Lighter install?
pip install "vidwise[fast]"uses faster-whisper (~200MB) instead of openai-whisper (~2GB). 3-4x faster transcription, but without Apple Metal GPU support. vidwise auto-detects which backend is installed.
Usage
vidwise <source> [options]
| Option | Default | Description |
|---|---|---|
--model, -m |
medium |
Whisper model: tiny, base, small, medium, large |
--output-dir, -o |
auto | Output directory path |
--no-guide |
off | Skip AI guide generation |
--provider, -p |
auto |
AI provider: auto, claude, openai |
--frame-interval |
2 |
Seconds between frame captures |
--frame-threshold |
0.05 |
Pixel diff threshold for key frame selection |
Examples
# Fast transcription of a short video
vidwise demo.mp4 --model tiny --no-guide
# YouTube tutorial with Claude-powered guide
vidwise https://youtube.com/watch?v=abc --model small --provider claude
# Loom bug report — default settings
vidwise https://loom.com/share/abc123def
Output
vidwise creates a single self-contained directory:
vidwise-abc123-2026-02-26/
├── video.mp4 # Source video
├── audio.wav # Extracted audio (16kHz mono)
├── transcript.txt # Plain text transcript
├── transcript.srt # Timestamped subtitles
├── transcript.json # Full Whisper output with segments
├── frames/ # Key frames every 2 seconds
│ ├── frame_0m00s.png
│ ├── frame_0m02s.png
│ ├── frame_0m04s.png
│ └── ...
└── guide.md # Visual guide with embedded frames (if AI enabled)
The guide.md uses relative image paths — open it in any markdown viewer (VS Code, GitHub, Obsidian) and the images render inline.
How It Works
┌─────────────┐
│ Video URL │──→ yt-dlp download
│ or local │
└──────┬───────┘
│
▼
┌──────────────┐ ┌──────────────────┐
│ ffmpeg │────→│ audio.wav │──→ Whisper ──→ transcript.*
│ (parallel) │ │ (16kHz mono) │
│ │────→│ frames/ │──→ Key frame selection
│ │ │ (every 2 sec) │ (pixel diff filtering)
└──────────────┘ └──────────────────┘
│
▼
┌──────────────────┐
│ AI Analysis │ Claude API, OpenAI API,
│ (optional) │ or Claude Code (free)
└────────┬─────────┘
│
▼
┌──────────────────┐
│ guide.md │ Structured markdown with
│ │ embedded frame images
└──────────────────┘
Smart frame selection: Not every frame matters. vidwise compares consecutive frames using pixel-difference analysis and only keeps frames where the visual content actually changed. A 10-minute video might have 300 raw frames but only ~40 meaningful ones.
Claude Code Plugin
If you use Claude Code, install vidwise as a plugin for AI-powered guide generation without needing an API key — Claude Code's native multimodal AI handles the analysis:
# Add the vidwise marketplace and install the plugin
/plugin marketplace add jpdjere/vidwise
/plugin install vidwise@vidwise
# Then use it:
/vidwise:vidwise recording.mp4
/vidwise:vidwise https://loom.com/share/abc123
For local development or testing, you can also load directly:
claude --plugin-dir /path/to/vidwise/plugin
The plugin runs vidwise --no-guide for extraction, then uses Claude Code's built-in vision capabilities to analyze frames in parallel — completely free, no API key needed.
Whisper Model Sizes
| Model | Speed | Quality | Best For |
|---|---|---|---|
tiny |
~1 min/min | Basic | Quick tests, long videos |
base |
~2 min/min | Good | Short videos |
small |
~4 min/min | Better | Videos >30 min |
medium |
~8 min/min | Recommended | Default for most content |
large |
~16 min/min | Best | When accuracy is critical |
Speed estimates on Apple M-series. First run downloads model weights (one-time).
Contributing
Contributions are welcome! Please open an issue first to discuss what you'd like to change.
# Development setup
git clone https://github.com/jpdjere/vidwise
cd vidwise
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
# Run tests
pytest
# Lint
ruff check src/
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vidwise-0.2.0.tar.gz.
File metadata
- Download URL: vidwise-0.2.0.tar.gz
- Upload date:
- Size: 77.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8689673e550bbef7073ed8d4e32ab5c3dbecf511ed760eb81e43cb72197b48b7
|
|
| MD5 |
cc27b3a061ecbb731a3dcd80f073a74e
|
|
| BLAKE2b-256 |
10d922790ae3af4c3b966ee6cc52a16259f23489bfc5a5def50941d3b0d9162c
|
Provenance
The following attestation bundles were made for vidwise-0.2.0.tar.gz:
Publisher:
publish.yml on jpdjere/vidwise
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vidwise-0.2.0.tar.gz -
Subject digest:
8689673e550bbef7073ed8d4e32ab5c3dbecf511ed760eb81e43cb72197b48b7 - Sigstore transparency entry: 1004026099
- Sigstore integration time:
-
Permalink:
jpdjere/vidwise@2a1a9097a9dfd303e4b7b0ba9c1eee2cf6c065e7 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/jpdjere
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2a1a9097a9dfd303e4b7b0ba9c1eee2cf6c065e7 -
Trigger Event:
push
-
Statement type:
File details
Details for the file vidwise-0.2.0-py3-none-any.whl.
File metadata
- Download URL: vidwise-0.2.0-py3-none-any.whl
- Upload date:
- Size: 19.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28ef9c5d771cc145daef6b1f8ed144510f93dce3ceade94826a04bfcdee80f0e
|
|
| MD5 |
84ec02f3dfe427d8bec3c309c1b8136c
|
|
| BLAKE2b-256 |
1dcd2043f7a2e0771518d2d694ce0cbffbbbf059d71c3b068d926ed92ede44e5
|
Provenance
The following attestation bundles were made for vidwise-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on jpdjere/vidwise
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vidwise-0.2.0-py3-none-any.whl -
Subject digest:
28ef9c5d771cc145daef6b1f8ed144510f93dce3ceade94826a04bfcdee80f0e - Sigstore transparency entry: 1004026131
- Sigstore integration time:
-
Permalink:
jpdjere/vidwise@2a1a9097a9dfd303e4b7b0ba9c1eee2cf6c065e7 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/jpdjere
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2a1a9097a9dfd303e4b7b0ba9c1eee2cf6c065e7 -
Trigger Event:
push
-
Statement type: