Skip to main content

Percept Vision โ€” store videos in object storage + their CV understanding in Redis, then search and ask questions about them. MCP plugin.

Project description

Percept Vision ๐ŸŽฅ โ€” store videos in Redis, then ask questions about them (MCP)

The MP4 bytes live in object storage. The understanding lives in Redis โ€” and that's what you query.

pip install percept-vision-plugin gives any agent the ability to ingest a video, run computer vision over it (OpenCV frame sampling + YOLO object detection + CLIP embeddings), and store the result in Redis โ€” so you can find the exact moment something happens, list what objects appear, and ask natural-language questions about your video library. Exposed over the Model Context Protocol (MCP).

It's the multimodal layer of Percept Context โ€” video becomes first-class, searchable nodes alongside your context graph, all in one Redis.


Why this exists

Redis's vector stack is text-only โ€” every shipping RedisVL vectorizer is a โ€ฆTextโ€ฆ class; there's no image/video vectorizer, no frame-level search, no media model. And per Redis's own guidance, you shouldn't put MP4 bytes in Redis. Percept Vision does it the right way:

Concern How
Video bytes Supabase Storage โ†’ a public URL (Redis never holds the file)
Frame understanding OpenCV samples frames; YOLO detects objects; CLIP embeds them
Searchable index RedisVL vector index of CLIP frame vectors (percept_frames)
"Find the moment" CLIP textโ†’image search returns timestamped, deep-linkable moments
"What's in it" YOLO detections aggregated per video
"Ask about it" retrieved frames + detections โ†’ optional LLM answer with timestamps

Pipeline

  ingest_video(path|url)
        โ”‚
        โ”œโ”€โ–บ  Supabase Storage  โ”€โ”€โ–บ  public MP4 URL โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚                                                       โ”‚
        โ””โ”€โ–บ  OpenCV sample frames                               โ”‚
                 โ”œโ”€โ–บ  YOLO  โ”€โ”€โ–บ  objects per frame              โ–ผ
                 โ””โ”€โ–บ  CLIP  โ”€โ”€โ–บ  512-d vector โ”€โ”€โ–บ  R E D I S  (percept_frames index)
                                                   pv:video:{id}  +  pv:frame:{id}
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
   search_moments("a person holding a phone")  โ†’ CLIP text vector โ†’ top frames
   ask_video("what happens at the end?")       โ†’ frames + detections โ†’ answer
   list_video_objects(id)                      โ†’ {person: 12, laptop: 4, ...}

Install

Requires Python โ‰ฅ 3.10, a Redis with the Search module (Redis Stack / Redis 8 / Redis Cloud), and a Supabase project. First run downloads YOLOv8n (~6 MB) and CLIP ViT-B-32 (~600 MB).

pip install percept-vision-plugin

Configure (.env or env vars):

SUPABASE_URL=https://YOUR_REF.supabase.co
SUPABASE_SERVICE_KEY=your_service_role_key   # bypasses RLS for uploads
SUPABASE_BUCKET=percept-videos               # public bucket
REDIS_URL=redis://localhost:6379
REDIS_PROTOCOL=2

Try it:

python examples/quickstart.py /path/to/video.mp4

Register with Claude Code

claude mcp add percept-vision --scope local \
  --env SUPABASE_URL=... --env SUPABASE_SERVICE_KEY=... --env SUPABASE_BUCKET=percept-videos \
  --env REDIS_URL=redis://localhost:6379 --env REDIS_PROTOCOL=2 \
  -- percept-vision

Tools

Tool What it does
ingest_video(source, video_id?) Upload + CV-analyze a video (path or URL) into Redis.
search_moments(query, k?, video_id?) Find moments matching a phrase โ†’ timestamps + deep links + objects.
ask_video(question, video_id?, k?) Q&A over retrieved frames (LLM answer if ANTHROPIC_API_KEY set).
list_video_objects(video_id) YOLO object counts for a video.
list_videos() All ingested videos.
vision_stats() Redis health + video count.

Configuration reference

Env var Default Purpose
SUPABASE_URL / SUPABASE_SERVICE_KEY โ€“ (required) Object storage for MP4s.
SUPABASE_BUCKET percept-videos Public bucket name.
REDIS_URL / REDIS_PROTOCOL redis://localhost:6379 / 2 Redis with Search.
PERCEPT_YOLO_MODEL yolov8n.pt Ultralytics model.
PERCEPT_CLIP_MODEL clip-ViT-B-32 CLIP model (image+text).
PERCEPT_FRAME_INTERVAL 1.0 Seconds between sampled frames.
PERCEPT_MAX_FRAMES 60 Cap on frames per video.
ANTHROPIC_API_KEY โ€“ Enables written answers in ask_video.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

percept_vision_plugin-0.1.0.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

percept_vision_plugin-0.1.0-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file percept_vision_plugin-0.1.0.tar.gz.

File metadata

  • Download URL: percept_vision_plugin-0.1.0.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for percept_vision_plugin-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8c2a11f7943dd8d529f98b41421096bb415fbdff5d4845f163c8c1f5af6da477
MD5 068ee0f28a40d11e17d650ac3093f568
BLAKE2b-256 93e7622ecd17ed5f729a0f38bf8836d9a9c08568d2b96806255adbdeaa7920f6

See more details on using hashes here.

File details

Details for the file percept_vision_plugin-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for percept_vision_plugin-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 230661018add57f1f086c53b0fc98b273ecf2e368749c60482515c02eab89a73
MD5 7a5a14c019a2fa6c5ed53eebe960a2c4
BLAKE2b-256 3c6594c768644bc82776bf9722d9de44f1c239eaf5fb91a2283f8d47e09ac3c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page