vidwise

LLMs can't watch videos. vidwise gives them eyes.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jpdjere

These details have not been verified by PyPI

Project description

vidwise — LLMs can't watch videos. vidwise gives them eyes.

Videos are the biggest blind spot for AI. A 5-minute Loom bug report, a 30-minute tutorial, a conference talk — all completely opaque to your LLM. You either watch the whole thing yourself or lose the knowledge.

vidwise extracts the visual and audio knowledge from any video into structured, LLM-consumable markdown. Feed the output to any LLM and it instantly "understands" the video.

Video ─→ vidwise ─→ Transcript + Key Frames + Visual Guide ─→ LLM Context

What can you do with it?

Scenario	What happens
Debug a Loom bug report	Feed the output to Claude → it "sees" the bug, the UI state, the error messages
Absorb a tutorial	30-min coding video → structured knowledge your LLM can answer questions about
Process a meeting	Extract decisions, action items, and what was on screen
Learn from a talk	Turn any conference presentation into searchable, queryable knowledge
Onboard faster	Training videos become AI-queryable — new hires get instant answers

Why vidwise?


See the whole picture	Most tools only extract audio. vidwise captures both what was said and what was shown — UI states, error messages, slides, code, diagrams.
Process once, query forever	The output is a self-contained artifact. Feed it to any LLM, any number of times, at zero additional cost. No re-uploading, no re-processing.
Works with any LLM	Standard markdown + images. Claude, GPT, Gemini, Llama, Mistral — whatever you use. No vendor lock-in.
Your video stays local	Whisper and ffmpeg run on your machine. Nothing leaves your computer unless you opt into AI guide generation.
Smart, not brute-force	Pixel-difference analysis keeps only frames where the visual content actually changed. Less noise, better LLM understanding.
Human-readable AND machine-readable	The output isn't just for LLMs — `guide.md` is a visual walkthrough you can read, share, and bookmark. One command, two audiences.
One command	`vidwise recording.mp4` → transcript, key frames, and visual guide in a single portable directory.

Not just for LLMs. The visual guide vidwise generates is a fully readable document with embedded screenshots — open it in VS Code, Obsidian, or GitHub and you have a skimmable walkthrough of the entire video. Share it with your team, bookmark it for later, or feed it to any LLM. One artifact, two audiences.

Quick Start

# Install
pip install vidwise

# Process a local video
vidwise recording.mp4

# Process a YouTube video
vidwise https://youtube.com/watch?v=abc

# With AI-powered visual guide
export ANTHROPIC_API_KEY=sk-...   # or OPENAI_API_KEY
vidwise recording.mp4 --provider claude

Prerequisites

Python 3.10+
ffmpeg — brew install ffmpeg (macOS) or apt install ffmpeg (Linux)

Lighter install? pip install "vidwise[fast]" uses faster-whisper (~200MB) instead of openai-whisper (~2GB). 3-4x faster transcription, but without Apple Metal GPU support. vidwise auto-detects which backend is installed.

Usage

vidwise <source> [options]

Option	Default	Description
`--model`, `-m`	`medium`	Whisper model: `tiny`, `base`, `small`, `medium`, `large`
`--output-dir`, `-o`	auto	Output directory path
`--no-guide`	off	Skip AI guide generation
`--provider`, `-p`	`auto`	AI provider: `auto`, `claude`, `openai`
`--frame-interval`	`2`	Seconds between frame captures
`--frame-threshold`	`0.05`	Pixel diff threshold for key frame selection

Examples

# Fast transcription of a short video
vidwise demo.mp4 --model tiny --no-guide

# YouTube tutorial with Claude-powered guide
vidwise https://youtube.com/watch?v=abc --model small --provider claude

# Loom bug report — default settings
vidwise https://loom.com/share/abc123def

Output

vidwise creates a single self-contained directory:

vidwise-abc123-2026-02-26/
├── video.mp4              # Source video
├── audio.wav              # Extracted audio (16kHz mono)
├── transcript.txt         # Plain text transcript
├── transcript.srt         # Timestamped subtitles
├── transcript.json        # Full Whisper output with segments
├── frames/                # Key frames every 2 seconds
│   ├── frame_0m00s.png
│   ├── frame_0m02s.png
│   ├── frame_0m04s.png
│   └── ...
└── guide.md               # Visual guide with embedded frames (if AI enabled)

The guide.md uses relative image paths — open it in any markdown viewer (VS Code, GitHub, Obsidian) and the images render inline.

How It Works

┌─────────────┐
│  Video URL   │──→ yt-dlp download
│  or local    │
└──────┬───────┘
       │
       ▼
┌──────────────┐     ┌──────────────────┐
│   ffmpeg     │────→│  audio.wav       │──→ Whisper ──→ transcript.*
│  (parallel)  │     │  (16kHz mono)    │
│              │────→│  frames/         │──→ Key frame selection
│              │     │  (every 2 sec)   │    (pixel diff filtering)
└──────────────┘     └──────────────────┘
                              │
                              ▼
                     ┌──────────────────┐
                     │  AI Analysis     │  Claude API, OpenAI API,
                     │  (optional)      │  or Claude Code (free)
                     └────────┬─────────┘
                              │
                              ▼
                     ┌──────────────────┐
                     │   guide.md       │  Structured markdown with
                     │                  │  embedded frame images
                     └──────────────────┘

Smart frame selection: Not every frame matters. vidwise compares consecutive frames using pixel-difference analysis and only keeps frames where the visual content actually changed. A 10-minute video might have 300 raw frames but only ~40 meaningful ones.

Claude Code Plugin

If you use Claude Code, install vidwise as a plugin for AI-powered guide generation without needing an API key — Claude Code's native multimodal AI handles the analysis:

# Add the vidwise marketplace and install the plugin
/plugin marketplace add jpdjere/vidwise
/plugin install vidwise@vidwise

# Then use it:
/vidwise:vidwise recording.mp4
/vidwise:vidwise https://loom.com/share/abc123

For local development or testing, you can also load directly:

claude --plugin-dir /path/to/vidwise/plugin

The plugin runs vidwise --no-guide for extraction, then uses Claude Code's built-in vision capabilities to analyze frames in parallel — completely free, no API key needed.

Whisper Model Sizes

Model	Speed	Quality	Best For
`tiny`	~1 min/min	Basic	Quick tests, long videos
`base`	~2 min/min	Good	Short videos
`small`	~4 min/min	Better	Videos >30 min
`medium`	~8 min/min	Recommended	Default for most content
`large`	~16 min/min	Best	When accuracy is critical

Speed estimates on Apple M-series. First run downloads model weights (one-time).

Contributing

Contributions are welcome! Please open an issue first to discuss what you'd like to change.

# Development setup
git clone https://github.com/jpdjere/vidwise
cd vidwise
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# Run tests
pytest

# Lint
ruff check src/

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jpdjere

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.2

Feb 27, 2026

0.2.1

Feb 27, 2026

This version

0.2.0

Feb 27, 2026

0.1.0

Feb 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vidwise-0.2.0.tar.gz (77.8 kB view details)

Uploaded Feb 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vidwise-0.2.0-py3-none-any.whl (19.9 kB view details)

Uploaded Feb 27, 2026 Python 3

File details

Details for the file vidwise-0.2.0.tar.gz.

File metadata

Download URL: vidwise-0.2.0.tar.gz
Upload date: Feb 27, 2026
Size: 77.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vidwise-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`8689673e550bbef7073ed8d4e32ab5c3dbecf511ed760eb81e43cb72197b48b7`
MD5	`cc27b3a061ecbb731a3dcd80f073a74e`
BLAKE2b-256	`10d922790ae3af4c3b966ee6cc52a16259f23489bfc5a5def50941d3b0d9162c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vidwise-0.2.0.tar.gz:

Publisher: publish.yml on jpdjere/vidwise

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vidwise-0.2.0.tar.gz
- Subject digest: 8689673e550bbef7073ed8d4e32ab5c3dbecf511ed760eb81e43cb72197b48b7
- Sigstore transparency entry: 1004026099
- Sigstore integration time: Feb 27, 2026
Source repository:
- Permalink: jpdjere/vidwise@2a1a9097a9dfd303e4b7b0ba9c1eee2cf6c065e7
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/jpdjere
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2a1a9097a9dfd303e4b7b0ba9c1eee2cf6c065e7
- Trigger Event: push

File details

Details for the file vidwise-0.2.0-py3-none-any.whl.

File metadata

Download URL: vidwise-0.2.0-py3-none-any.whl
Upload date: Feb 27, 2026
Size: 19.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vidwise-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`28ef9c5d771cc145daef6b1f8ed144510f93dce3ceade94826a04bfcdee80f0e`
MD5	`84ec02f3dfe427d8bec3c309c1b8136c`
BLAKE2b-256	`1dcd2043f7a2e0771518d2d694ce0cbffbbbf059d71c3b068d926ed92ede44e5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vidwise-0.2.0-py3-none-any.whl:

Publisher: publish.yml on jpdjere/vidwise

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vidwise-0.2.0-py3-none-any.whl
- Subject digest: 28ef9c5d771cc145daef6b1f8ed144510f93dce3ceade94826a04bfcdee80f0e
- Sigstore transparency entry: 1004026131
- Sigstore integration time: Feb 27, 2026
Source repository:
- Permalink: jpdjere/vidwise@2a1a9097a9dfd303e4b7b0ba9c1eee2cf6c065e7
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/jpdjere
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2a1a9097a9dfd303e4b7b0ba9c1eee2cf6c065e7
- Trigger Event: push

vidwise 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

What can you do with it?

Why vidwise?

Quick Start

Prerequisites

Usage

Examples

Output

How It Works

Claude Code Plugin

Whisper Model Sizes

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance