Percept Vision โ store videos in object storage + their CV understanding in Redis, then search and ask questions about them. MCP plugin.
Project description
Percept Vision ๐ฅ โ store videos in Redis, then ask questions about them (MCP)
The MP4 bytes live in object storage. The understanding lives in Redis โ and that's what you query.
pip install percept-vision-plugin gives any agent the ability to ingest a video, run
computer vision over it (OpenCV frame sampling + YOLO object detection + CLIP
embeddings), and store the result in Redis โ so you can find the exact moment
something happens, list what objects appear, and ask natural-language questions
about your video library. Exposed over the Model Context Protocol (MCP).
It's the multimodal layer of Percept Context โ video becomes first-class, searchable nodes alongside your context graph, all in one Redis.
Why this exists
Redis's vector stack is text-only โ every shipping RedisVL vectorizer is a โฆTextโฆ
class; there's no image/video vectorizer, no frame-level search, no media model. And per
Redis's own guidance, you shouldn't put MP4 bytes in Redis. Percept Vision does it the
right way:
| Concern | How |
|---|---|
| Video bytes | Supabase Storage โ a public URL (Redis never holds the file) |
| Frame understanding | OpenCV samples frames; YOLO detects objects; CLIP embeds them |
| Searchable index | RedisVL vector index of CLIP frame vectors (percept_frames) |
| "Find the moment" | CLIP textโimage search returns timestamped, deep-linkable moments |
| "What's in it" | YOLO detections aggregated per video |
| "Ask about it" | retrieved frames + detections โ optional LLM answer with timestamps |
Pipeline
ingest_video(path|url)
โ
โโโบ Supabase Storage โโโบ public MP4 URL โโโโโโโโโโโโโโ
โ โ
โโโบ OpenCV sample frames โ
โโโบ YOLO โโโบ objects per frame โผ
โโโบ CLIP โโโบ 512-d vector โโโบ R E D I S (percept_frames index)
pv:video:{id} + pv:frame:{id}
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
search_moments("a person holding a phone") โ CLIP text vector โ top frames
ask_video("what happens at the end?") โ frames + detections โ answer
list_video_objects(id) โ {person: 12, laptop: 4, ...}
Install
Requires Python โฅ 3.10, a Redis with the Search module (Redis Stack / Redis 8 / Redis Cloud), and a Supabase project. First run downloads YOLOv8n (~6 MB) and CLIP ViT-B-32 (~600 MB).
pip install percept-vision-plugin
Configure (.env or env vars):
SUPABASE_URL=https://YOUR_REF.supabase.co
SUPABASE_SERVICE_KEY=your_service_role_key # bypasses RLS for uploads
SUPABASE_BUCKET=percept-videos # public bucket
REDIS_URL=redis://localhost:6379
REDIS_PROTOCOL=2
Try it:
python examples/quickstart.py /path/to/video.mp4
Register with Claude Code
claude mcp add percept-vision --scope local \
--env SUPABASE_URL=... --env SUPABASE_SERVICE_KEY=... --env SUPABASE_BUCKET=percept-videos \
--env REDIS_URL=redis://localhost:6379 --env REDIS_PROTOCOL=2 \
-- percept-vision
Tools
| Tool | What it does |
|---|---|
ingest_video(source, video_id?) |
Upload + CV-analyze a video (path or URL) into Redis. |
search_moments(query, k?, video_id?) |
Find moments matching a phrase โ timestamps + deep links + objects. |
ask_video(question, video_id?, k?) |
Q&A over retrieved frames (LLM answer if ANTHROPIC_API_KEY set). |
list_video_objects(video_id) |
YOLO object counts for a video. |
list_videos() |
All ingested videos. |
vision_stats() |
Redis health + video count. |
Configuration reference
| Env var | Default | Purpose |
|---|---|---|
SUPABASE_URL / SUPABASE_SERVICE_KEY |
โ (required) | Object storage for MP4s. |
SUPABASE_BUCKET |
percept-videos |
Public bucket name. |
REDIS_URL / REDIS_PROTOCOL |
redis://localhost:6379 / 2 |
Redis with Search. |
PERCEPT_YOLO_MODEL |
yolov8n.pt |
Ultralytics model. |
PERCEPT_CLIP_MODEL |
clip-ViT-B-32 |
CLIP model (image+text). |
PERCEPT_FRAME_INTERVAL |
1.0 |
Seconds between sampled frames. |
PERCEPT_MAX_FRAMES |
60 |
Cap on frames per video. |
ANTHROPIC_API_KEY |
โ | Enables written answers in ask_video. |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file percept_vision_plugin-0.1.0.tar.gz.
File metadata
- Download URL: percept_vision_plugin-0.1.0.tar.gz
- Upload date:
- Size: 11.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c2a11f7943dd8d529f98b41421096bb415fbdff5d4845f163c8c1f5af6da477
|
|
| MD5 |
068ee0f28a40d11e17d650ac3093f568
|
|
| BLAKE2b-256 |
93e7622ecd17ed5f729a0f38bf8836d9a9c08568d2b96806255adbdeaa7920f6
|
File details
Details for the file percept_vision_plugin-0.1.0-py3-none-any.whl.
File metadata
- Download URL: percept_vision_plugin-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
230661018add57f1f086c53b0fc98b273ecf2e368749c60482515c02eab89a73
|
|
| MD5 |
7a5a14c019a2fa6c5ed53eebe960a2c4
|
|
| BLAKE2b-256 |
3c6594c768644bc82776bf9722d9de44f1c239eaf5fb91a2283f8d47e09ac3c5
|