Let Claude (or any LLM) actually watch a video — scene-aware, deduplicated frames + transcript, from a URL or local file.

These details have not been verified by PyPI

Project links

Project description

claude-real-video

Let Claude — or any LLM — actually watch a video.

Most AI tools don't really see a video. Paste a YouTube link into ChatGPT and it reads the transcript, not the picture. Claude won't take a video file at all. Even Gemini, which can read video natively, has to send it up to Google and samples frames at a fixed interval (1 fps by default), so fast cuts slip past.

claude-real-video does it differently, and locally: point it at a URL or a file, and it pulls the frames that actually matter (every scene change, not a fixed quota), throws away the near-duplicates, transcribes the audio, and hands you a clean folder any LLM can read. All the processing happens on your own machine — what gets sent anywhere is only the frames/text you choose to paste into an LLM afterwards.

crv "https://www.youtube.com/watch?v=..."
# → crv-out/frames/*.jpg  +  crv-out/transcript.txt  +  crv-out/MANIFEST.txt

Then drop the frames + MANIFEST.txt into Claude / ChatGPT / Gemini and ask away.

New in 0.3.0 — tell it why you're watching, and keep what it finds:

crv "https://youtu.be/..." --why "find the pricing strategy" --kb ~/notes

--why makes the analysis focus on what you care about instead of a generic summary; --kb saves the result as a dated note in your own notes folder, so it doesn't die in crv-out.

Why not just sample frames?

Most "let an LLM watch a video" scripts (and Gemini's own pipeline) grab frames at a fixed interval — e.g. one per second. That over-samples a static screencast and under-samples a fast-cut reel. claude-real-video is smarter:

	fixed-interval sampling	claude-real-video
Frame selection	every N seconds	scene-change detection + density floor
Repeated shots (A-B-A cuts)	sent again every time	sliding-window dedup sends each shot once
Static slide (10 min)	~600 near-identical frames	collapses to 1 (dedup)
Fast-cut reel	misses frames between samples	catches each visual change
Audio	often ignored	Whisper transcript w/ language detect
Where the processing happens	often in someone's cloud	on your machine (you choose what to share with an LLM afterwards)
Input	usually local file only	URL (yt-dlp) or local file

You feed the model fewer, more meaningful frames — cheaper context, better understanding.

Install

pip install claude-real-video              # core (frames + dedup)
pip install "claude-real-video[whisper]"   # + audio transcription

System requirement: ffmpeg

ffmpeg / ffprobe are used for frame extraction and audio, and aren't pip-installable. Install them once:

OS	command
macOS	`brew install ffmpeg`
Linux	`sudo apt install ffmpeg` (or your distro's package manager)
Windows	`winget install Gyan.FFmpeg` — or `choco install ffmpeg` — or download a build and add its `bin\` folder to your `PATH`

Verify it's on your PATH:

ffmpeg -version

Transcription uses the whisper CLI (installed by the [whisper] extra, or pip install openai-whisper). Whisper also relies on ffmpeg.

Works on macOS, Windows, and Linux — Python 3.10+.

Usage

# A YouTube / Instagram / TikTok / ... link
crv "https://www.instagram.com/reel/XXXX/"

# A local file, English transcript, output to ./out
crv lecture.mp4 -o out --lang en

# Frames only, no transcription
crv clip.mp4 --no-transcribe

# A login-gated video (your own / authorised use): pass a Netscape cookie file
crv "https://..." --cookies cookies.txt

python -m claude_real_video ... works as an alias for crv too.

Options

flag	default	meaning
`-o, --out`	`crv-out`	output directory
`--scene`	`0.30`	scene-change sensitivity (lower = more frames)
`--fps-floor`	`1.0`	at least one frame every N seconds
`--max-frames`	`150`	hard cap on total frames
`--lang`	`auto`	Whisper language (`en`, `zh`, `auto`, ...)
`--dedup-threshold`	`8`	% of pixels that must change for a frame to count as new; higher = fewer frames
`--dedup-window`	`4`	compare against the last N kept frames — a shot the model already saw doesn't come back after a cutaway (`1` = consecutive-only)
`--report`	off	keep dropped frames in `./dropped` + write `report.html` visualising every keep/drop decision
`--no-transcribe`	off	skip audio
`--keep-audio`	off	also save the full soundtrack (`audio.m4a`) so audio models can hear it
`--why`	–	why you're watching, e.g. `--why "find the pricing strategy"` — written into `MANIFEST.txt` so the model analyses with that lens instead of a generic summary
`--kb`	–	also save the analysis as a dated markdown note into this folder (your Obsidian vault, notes dir, ...) — so it joins your knowledge base instead of dying in `crv-out`
`--cookies`	–	Netscape cookie file for login-gated sources

Use it from Python

from claude_real_video import process

r = process("https://youtu.be/...", "out", lang="en")
print(r.frame_count, r.transcript_path)

How it works

Fetch — yt-dlp for URLs (optional cookies), or copy a local file.
Extract — one chronological ffmpeg select pass grabs every scene change plus a density floor (at least one frame every --fps-floor seconds), so fast cuts and slow screencasts are both covered.
Dedup — real pixel difference (downscaled RGB, not a perceptual hash — hashes go blind on flat colours and equal-luma hue changes) against a sliding window of the last --dedup-window kept frames, so an A-B-A cutaway doesn't re-send a shot the model has already seen. --report writes report.html showing every keep/drop decision with its diff %, for tuning.
Text — if the video already has subtitles (a sidecar .srt/.vtt next to a local file, or an embedded subtitle track), those are used as the transcript — faster and more accurate than re-transcribing. Only when there are no subtitles does it fall back to Whisper on the audio (skipped cleanly if there's no audio).
Audio (optional, --keep-audio) — save the full original soundtrack (audio.m4a: music + speech + effects, copied losslessly when possible). The transcript only has the words; the audio file lets a model that can listen (Gemini, GPT-4o, …) actually hear the music and tone.
Manifest — MANIFEST.txt summarises everything for the model.

So the model can see (key frames), read (transcript) and — with --keep-audio — hear (full soundtrack) the video. The transcript is plain text any model can read; the tool doesn't burn subtitles into the video — burning is a presentation choice, not something needed to make a video AI-readable.

Notes

Only download content you have the right to. The --cookies option is for your own, authorised access — don't ship credentials in a repo.
Re-running overwrites the output directory.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.0

Jul 3, 2026

0.3.0

Jul 3, 2026

0.2.0

Jul 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claude_real_video-0.4.0.tar.gz (16.4 kB view details)

Uploaded Jul 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

claude_real_video-0.4.0-py3-none-any.whl (15.1 kB view details)

Uploaded Jul 3, 2026 Python 3

File details

Details for the file claude_real_video-0.4.0.tar.gz.

File metadata

Download URL: claude_real_video-0.4.0.tar.gz
Upload date: Jul 3, 2026
Size: 16.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for claude_real_video-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`ccf0184b38a52a2839c4b4e66d3a507fe4d51571a7462c9a22e6dca46b7a7a9c`
MD5	`ed15c7c24c5fca9af12bd2fc8f900d2e`
BLAKE2b-256	`359985f4828ed000aa047ea9fc570374d2a20dc3d888f89104968f190780b422`

See more details on using hashes here.

File details

Details for the file claude_real_video-0.4.0-py3-none-any.whl.

File metadata

Download URL: claude_real_video-0.4.0-py3-none-any.whl
Upload date: Jul 3, 2026
Size: 15.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for claude_real_video-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5c19c7eb8f7c4f892607baf3f4004d9dbb008fff2cebdd8c30401abbeb25cbb5`
MD5	`b7747bb8d18744aefa0ab69a50d85ad3`
BLAKE2b-256	`832bf4b628935a403385c0c85c9f45b898db2317a9453bbe8b5f1339ca95adc7`

See more details on using hashes here.

claude-real-video 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

claude-real-video

Why not just sample frames?

Install

System requirement: ffmpeg

Usage

Options

Use it from Python

How it works

Notes

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes