Skip to main content

Let Claude (or any LLM) actually watch a video — scene-aware, deduplicated frames + transcript, from a URL or local file.

Project description

claude-real-video

Let Claude — or any LLM — actually watch a video.

Most AI tools don't really see a video. Paste a YouTube link into ChatGPT and it reads the transcript, not the picture. Claude won't take a video file at all. Even Gemini, which can read video natively, has to send it up to Google and samples frames at a fixed interval (1 fps by default), so fast cuts slip past.

claude-real-video does it differently, and locally: point it at a URL or a file, and it pulls the frames that actually matter (every scene change, not a fixed quota), throws away the near-duplicates, transcribes the audio, and hands you a clean folder any LLM can read — on your own machine, nothing uploaded.

crv "https://www.youtube.com/watch?v=..."
# → crv-out/frames/*.jpg  +  crv-out/transcript.txt  +  crv-out/MANIFEST.txt

Then drop the frames + MANIFEST.txt into Claude / ChatGPT / Gemini and ask away.


Why not just sample frames?

Most "let an LLM watch a video" scripts (and Gemini's own pipeline) grab frames at a fixed interval — e.g. one per second. That over-samples a static screencast and under-samples a fast-cut reel. claude-real-video is smarter:

fixed-interval sampling claude-real-video
Frame selection every N seconds scene-change detection + density floor
Repeated shots (A-B-A cuts) sent again every time sliding-window dedup sends each shot once
Static slide (10 min) ~600 near-identical frames collapses to 1 (dedup)
Fast-cut reel misses frames between samples catches each visual change
Audio often ignored Whisper transcript w/ language detect
Where the video goes often uploaded to a cloud stays on your machine
Input usually local file only URL (yt-dlp) or local file

You feed the model fewer, more meaningful frames — cheaper context, better understanding.


Install

pip install claude-real-video              # core (frames + dedup)
pip install "claude-real-video[whisper]"   # + audio transcription

System requirement: ffmpeg

ffmpeg / ffprobe are used for frame extraction and audio, and aren't pip-installable. Install them once:

OS command
macOS brew install ffmpeg
Linux sudo apt install ffmpeg (or your distro's package manager)
Windows winget install Gyan.FFmpeg — or choco install ffmpeg — or download a build and add its bin\ folder to your PATH

Verify it's on your PATH:

ffmpeg -version

Transcription uses the whisper CLI (installed by the [whisper] extra, or pip install openai-whisper). Whisper also relies on ffmpeg.

Works on macOS, Windows, and Linux — Python 3.10+.


Usage

# A YouTube / Instagram / TikTok / ... link
crv "https://www.instagram.com/reel/XXXX/"

# A local file, English transcript, output to ./out
crv lecture.mp4 -o out --lang en

# Frames only, no transcription
crv clip.mp4 --no-transcribe

# A login-gated video (your own / authorised use): pass a Netscape cookie file
crv "https://..." --cookies cookies.txt

python -m claude_real_video ... works as an alias for crv too.

Options

flag default meaning
-o, --out crv-out output directory
--scene 0.30 scene-change sensitivity (lower = more frames)
--fps-floor 1.0 at least one frame every N seconds
--max-frames 150 hard cap on total frames
--lang auto Whisper language (en, zh, auto, ...)
--dedup-threshold 8 % of pixels that must change for a frame to count as new; higher = fewer frames
--dedup-window 4 compare against the last N kept frames — a shot the model already saw doesn't come back after a cutaway (1 = consecutive-only)
--report off keep dropped frames in ./dropped + write report.html visualising every keep/drop decision
--no-transcribe off skip audio
--keep-audio off also save the full soundtrack (audio.m4a) so audio models can hear it
--cookies Netscape cookie file for login-gated sources

Use it from Python

from claude_real_video import process

r = process("https://youtu.be/...", "out", lang="en")
print(r.frame_count, r.transcript_path)

How it works

  1. Fetchyt-dlp for URLs (optional cookies), or copy a local file.
  2. Extract — one chronological ffmpeg select pass grabs every scene change plus a density floor (at least one frame every --fps-floor seconds), so fast cuts and slow screencasts are both covered.
  3. Dedup — real pixel difference (downscaled RGB, not a perceptual hash — hashes go blind on flat colours and equal-luma hue changes) against a sliding window of the last --dedup-window kept frames, so an A-B-A cutaway doesn't re-send a shot the model has already seen. --report writes report.html showing every keep/drop decision with its diff %, for tuning.
  4. Text — if the video already has subtitles (a sidecar .srt/.vtt next to a local file, or an embedded subtitle track), those are used as the transcript — faster and more accurate than re-transcribing. Only when there are no subtitles does it fall back to Whisper on the audio (skipped cleanly if there's no audio).
  5. Audio (optional, --keep-audio) — save the full original soundtrack (audio.m4a: music + speech + effects, copied losslessly when possible). The transcript only has the words; the audio file lets a model that can listen (Gemini, GPT-4o, …) actually hear the music and tone.
  6. ManifestMANIFEST.txt summarises everything for the model.

So the model can see (key frames), read (transcript) and — with --keep-audiohear (full soundtrack) the video. The transcript is plain text any model can read; the tool doesn't burn subtitles into the video — burning is a presentation choice, not something needed to make a video AI-readable.


Notes

  • Only download content you have the right to. The --cookies option is for your own, authorised access — don't ship credentials in a repo.
  • Re-running overwrites the output directory.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claude_real_video-0.2.0.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

claude_real_video-0.2.0-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file claude_real_video-0.2.0.tar.gz.

File metadata

  • Download URL: claude_real_video-0.2.0.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for claude_real_video-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3bb22537cdea40ad550b2cfc468d35c95c2f5d884b2a4541de75bb62dd6d44f2
MD5 244ea0faf9dd61579807e66990e8fdd3
BLAKE2b-256 eefed2a95f8fb205418bdb4136aa628343a0f4c34a589550cfd18d1c08869878

See more details on using hashes here.

File details

Details for the file claude_real_video-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for claude_real_video-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2af30ab8ab8f1e184ed9af41cbbe4515fa2ec17605be48dfdc4c4d6b3482283c
MD5 94439eb51bc5243d80b16240fea93d9a
BLAKE2b-256 6fbac962ed9e5d4e607679d3a7451591b4f61bf74441c7ac2e96593b4e65c06c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page