CPU-only video → grid summarizer for vision LLMs (Claude Code, Gemini CLI, Codex, Cursor)
Project description
clipsheet
Turn any video into images your AI agent can read.
Playwright, browser-use, multimodal LLM video analysis, and native video APIs are slow — often minutes per run, expensive, and overkill when you just need to see what happened on screen. clipsheet converts any video into a handful of annotated grid images that any vision-capable model can read in one pass. Record a screen, drop in a clip, hand it a product demo — if it's a video, clipsheet can process it.
Any video → 2-4 grid images → one model call. Process multiple videos at once. CPU-only, no GPU, no audio, no API keys. Best for videos under 5 minutes — beyond that, consider Gemini native video or Twelve Labs.
Why clipsheet?
| Approach | Time for a 2-min recording | Cost | What the agent sees |
|---|---|---|---|
| Playwright / browser-use | 2+ minutes (real-time) | Compute + browser | Screenshots you scripted |
| Gemini native video | 30-60s upload + processing | ~$0.02-0.10/min video | Every frame (slow, expensive) |
| Any video + clipsheet | 1-2 seconds processing | Free (CPU-only) | Deduplicated keyframes with timestamps |
30-second start
pip install clipsheet
clipsheet recording.mp4
Output lands in recording_clips/ next to your video:
recording_clips/
grid_01.jpg 3×3 mosaic, cells labeled A1..C3, timestamps burned in
grid_02.jpg next 9 frames in time order
manifest.json maps each cell back to its source timestamp
Use it from your coding agent
Install the skill (one time)
clipsheet init
This auto-detects every coding agent on your machine — Claude Code, Cursor, Codex, Gemini CLI, Copilot, Windsurf, Aider, Goose — and writes the skill into each one's directory.
Then just talk to your agent
Claude Code:
> /clipsheet review this video for bugs: ~/Downloads/bug-repro.mp4
> /clipsheet what errors do you see in these two recordings: flow1.mp4 flow2.mp4
Cursor:
> /clipsheet debug this flow: recording.mp4
Codex CLI:
> $clipsheet what UI states appear in this recording: session.mp4
Any agent with shell access (no skill needed):
> run clipsheet on recording.mp4 and tell me what went wrong
Real-world examples
Debug agentic applications — see how users interact with your agent's UI:
> /clipsheet ~/Downloads/agent-session.mp4
> the chat layer is clashing with the sidebar — what's happening at each step?
Review short-form content — get feedback on hooks, pacing, and visual elements:
> /clipsheet ~/Desktop/reel-draft.mp4
> rate the hook, suggest a better opening, and write the transcript
Debug web animations and 3D components:
> /clipsheet ~/Desktop/animation-bug.mov
> the CSS transition is janky between 0:03 and 0:05 — what's the state at each frame?
Compare working vs broken flows:
> /clipsheet ~/Desktop/checkout-working.mov ~/Desktop/checkout-broken.mov
> what's different between these two?
Batch-review multiple recordings:
> /clipsheet bug1.mp4 bug2.mp4 bug3.mp4
> list every issue you see across all three
CLI reference
Process videos
clipsheet <video> [video2 ...] [options]
| Option | Default | Description |
|---|---|---|
-o, --output <dir> |
<video>_clips/ |
Output directory. Auto-created next to each input. Override with -o or CLIPSHEET_OUTPUT_DIR env var. |
--grid <RxC> |
3x3 |
Cell layout. 2x3 for larger/more readable cells, 4x4 for dense recordings. |
--max-grids <n> |
4 |
Cap on grid images. Bump for videos > 8 minutes. |
--fps <n> |
4 |
Sample rate in fps. Higher values catch more transitions but take longer. |
--keep-intermediate |
false |
Keep _raw/ and _cells/ for debugging. |
--json |
false |
Emit a JSON summary on stdout (for piping to jq). |
--pretty |
false |
Pretty-print JSON (only with --json). |
-v, --verbose |
false |
Show sampling details and frame counts. |
Examples:
clipsheet recording.mp4 # output → recording_clips/
clipsheet bug1.mp4 bug2.mp4 bug3.mp4 # process multiple videos
clipsheet bug1.mp4 bug2.mp4 -o ./all-bugs # all outputs into one directory
clipsheet recording.mp4 --grid 2x3 # larger cells for readable text
clipsheet animation-bug.mp4 --fps 8 # catch fast UI transitions
Other commands
clipsheet init # install skill into detected agents
clipsheet init --agent <name> # scope to specific agents (repeatable)
clipsheet init --force # overwrite existing skill installs
clipsheet --status # version, ffmpeg, agents, recent runs
clipsheet --version # short version string
clipsheet --help # full help
Install
pip install clipsheet
# or: uv tool install clipsheet
# or: pipx install clipsheet
ffmpeg is bundled. No separate install needed.
What it does NOT do
- No audio transcription. Use Whisper if you need the soundtrack.
- No video editing, trimming, or transcoding. Different tool category.
- No GPU. CPU-only by design, for portability.
Works on any video format ffmpeg can read — MP4, MOV, HEVC, WebM, MKV, AVI, and more. When not to use clipsheet: if you need frame-by-frame motion analysis, audio understanding, or real-time video streaming, use Gemini 2.5 native video or Twelve Labs.
Performance
clipsheet processing times on a 2024 M-series Mac (CPU only):
| Video | Duration | Grids | clipsheet |
|---|---|---|---|
| Agent UI screen recording | 21s | 2 | <1s |
| Product demo | 41s | 4 | ~2s |
| Product demo | 58s | 4 | ~1s |
| YouTube video (1080p) | 69s | 4 | ~1s |
| Presentation | 2 min | 2 | ~2s |
| Presentation | 3.3 min | 4 | ~11s |
| Screen recording (HEVC) | 4.9 min | 4 | ~14s |
Where does the time go? clipsheet itself is fast — under 2 seconds for most videos under 2 minutes. When using it through an agent (Claude, Gemini, etc.), most of the wait is the model reading the grid images and generating a response, not clipsheet processing. A typical loop: ~1s clipsheet + ~5s image reading + ~10s response = ~15-20s total.
Requires macOS 10.15+, Linux, or Windows 10+. Python 3.10+.
License
MIT. See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file clipsheet-0.1.2.tar.gz.
File metadata
- Download URL: clipsheet-0.1.2.tar.gz
- Upload date:
- Size: 28.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45ce2e474392a9d8222a48f004558e10d088ace20f13e05c6d139cc475c598f3
|
|
| MD5 |
497364fd15ea181e4ea093635870ff9d
|
|
| BLAKE2b-256 |
96aca2a9b889d4f6dbfea372c931afdead2f07512f5264e86bf4e40a66a6e983
|
Provenance
The following attestation bundles were made for clipsheet-0.1.2.tar.gz:
Publisher:
release.yml on poonamsnair/clipsheet
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
clipsheet-0.1.2.tar.gz -
Subject digest:
45ce2e474392a9d8222a48f004558e10d088ace20f13e05c6d139cc475c598f3 - Sigstore transparency entry: 1396667401
- Sigstore integration time:
-
Permalink:
poonamsnair/clipsheet@e4dd576268dee85737f4f0d4bc7a3b2bd2f00d04 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/poonamsnair
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e4dd576268dee85737f4f0d4bc7a3b2bd2f00d04 -
Trigger Event:
push
-
Statement type:
File details
Details for the file clipsheet-0.1.2-py3-none-any.whl.
File metadata
- Download URL: clipsheet-0.1.2-py3-none-any.whl
- Upload date:
- Size: 24.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
338ad218b3c08ab4030ad020141a3115583fede6ec2fab331b746d7090e23bbf
|
|
| MD5 |
6cdd91c9a9f7785ac711d500dfd8c341
|
|
| BLAKE2b-256 |
80773abf22ee54742757f1f50727f5b82dd5f1ba6665081a96d47ef07a90a0ba
|
Provenance
The following attestation bundles were made for clipsheet-0.1.2-py3-none-any.whl:
Publisher:
release.yml on poonamsnair/clipsheet
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
clipsheet-0.1.2-py3-none-any.whl -
Subject digest:
338ad218b3c08ab4030ad020141a3115583fede6ec2fab331b746d7090e23bbf - Sigstore transparency entry: 1396667412
- Sigstore integration time:
-
Permalink:
poonamsnair/clipsheet@e4dd576268dee85737f4f0d4bc7a3b2bd2f00d04 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/poonamsnair
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e4dd576268dee85737f4f0d4bc7a3b2bd2f00d04 -
Trigger Event:
push
-
Statement type: