Skip to main content

Index once, search many — agentic video memory CLI by Pixel ML

Project description

av — Agentic Video Intelligence

Index. Search. Detect. Video intelligence toolkit for AI agents by Pixel ML.

pip install pixelml-av

What av Does

Video Memory — Ingest videos, search by natural language, ask questions with RAG citations.

Surveillance Intelligence — Detect falls, long queues, crowd gathering, and wheelchair compliance in CCTV footage using temporal reasoning.

Quick Start

Video Search

# 1. Set up your provider
av config setup

# 2. Ingest a video
av ingest video.mp4

# 3. Search
av search "person with red bag"

# 4. Ask questions
av ask "what happened at 2:30?"

Surveillance Detection

# Cloud (quick start — Gemini free tier)
export AV_API_KEY=your-gemini-key
av sentinel video.mp4

# Local (free, private — runs on your Mac/GPU)
ollama pull mistral-small3.2
av sentinel video.mp4 --provider ollama

# Specific alerts
av sentinel video.mp4 --alerts FALL,LONG_QUEUE

# Batch a directory
av sentinel videos/ --camera cam_lobby

All Commands

# Video memory
av ingest video.mp4             # Index video content
av search "what was discussed"  # Semantic search
av ask "key decisions?"         # RAG Q&A with citations
av list                         # List indexed videos
av transcript <id> --format vtt # Get transcript
av export --format jsonl        # Export all data
av export --format jsonl

# Surveillance intelligence
av sentinel video.mp4              # Detect events (all 4 alert types)
av sentinel video.mp4 --alerts FALL # Fall detection only
av sentinel video.mp4 -p ollama    # Self-hosted (free)
av sentinel videos/ -c cam_lobby   # Batch with camera tracking

Sentinel — Surveillance Event Detection

Detects 4 event types using temporal reasoning over VLM observations:

Alert Detection How It Works
FALL Position tracking standing→lying transition across frames (F1=0.944)
LONG_QUEUE Temporal persistence Queue detected in 3+ consecutive chunks (90s)
CROWD_GATHERING Density + growth Sustained crowd or rapid person count increase
WHEELCHAIR_COMPLIANCE Service timing Wheelchair user unattended > threshold

Providers for Sentinel

Provider Setup Cost Speed
Gemini (cloud) export AV_API_KEY=key Free tier available ~5s/chunk
OpenRouter export OPENROUTER_API_KEY=key $0.04-0.14/1M tokens ~10s/chunk
Ollama (local) ollama pull mistral-small3.2 Free ~25s/chunk
OpenAI export AV_API_KEY=key $$$ ~5s/chunk

Auto-detection: if no provider specified, av tries Gemini → OpenRouter → ollama → OpenAI.

How It Works

Video → 30s chunks (5s overlap)
  → 8 frames per chunk
  → VLM perception (positions, queue, crowd, wheelchair)
  → Temporal agent (state across chunks)
  → Alert rules (transition detection, persistence, growth)
  → JSON output

Built on 107 experiments across 21 vision models. Key insight: structural extraction + temporal rules beats generic "detect anomalies" prompts.

Configuration

Interactive Setup (Recommended)

av config setup

Choose from four providers:

# Provider Auth Transcription Embeddings
1 OpenAI (Codex OAuth) Auto-detected Whisper text-embedding-3-small
2 OpenAI (API key) sk-... key Whisper text-embedding-3-small
3 Anthropic (Claude) API key Not supported Not supported
4 Google (Gemini) API key Not supported text-embedding-004

Config is saved to ~/.config/av/config.json and persists across sessions.

Note: Anthropic and Gemini don't support Whisper transcription. With these providers, use av ingest --captions for frame-based captioning, or set AV_OPENAI_API_KEY for transcription fallback.

Environment Variables

Env vars always override config.json:

export AV_API_KEY="sk-..."
export AV_API_BASE_URL="https://api.openai.com/v1"  # or any OpenAI-compatible endpoint
export AV_TRANSCRIBE_MODEL="whisper"
export AV_VISION_MODEL="gpt-4-1"
export AV_EMBED_MODEL="text-embedding-3-small"
export AV_CHAT_MODEL="gpt-4-1"

Requirements

  • Python 3.11+
  • FFmpeg (brew install ffmpeg)
  • An API key from OpenAI, Anthropic, or Google — or Codex CLI OAuth

Commands

Command Description
av config setup Interactive provider setup wizard
av config show Show current configuration
av ingest <path> Ingest video file(s) into the index
av search <query> Full-text + semantic search
av ask <question> RAG Q&A with citations
av list List all indexed videos
av info <video_id> Detailed video metadata
av transcript <id> Output transcript (VTT/SRT/text)
av export Export as JSONL/VTT/SRT
av open <id> --at <sec> Open video at timestamp
av version Print version JSON

License

Apache License 2.0 — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pixelml_av-0.1.0.tar.gz (156.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pixelml_av-0.1.0-py3-none-any.whl (60.0 kB view details)

Uploaded Python 3

File details

Details for the file pixelml_av-0.1.0.tar.gz.

File metadata

  • Download URL: pixelml_av-0.1.0.tar.gz
  • Upload date:
  • Size: 156.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pixelml_av-0.1.0.tar.gz
Algorithm Hash digest
SHA256 235b55a79b0a3494f3532ec673c2aa8e004ae4f52ef0c686af2696f8f41b0862
MD5 d61bd8b2c53396cbc32514b12b74052c
BLAKE2b-256 67e9f95a0d76889df076058fa383205493ccbd889043255b3f5e511392e80425

See more details on using hashes here.

Provenance

The following attestation bundles were made for pixelml_av-0.1.0.tar.gz:

Publisher: publish.yml on PixelML/av

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pixelml_av-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pixelml_av-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 60.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pixelml_av-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ee1e5dff4c99c5122803517071e557806583af1b82ca8bf59587d9e89d3f8c99
MD5 fb6fd60a0414212161508f67ec1ee827
BLAKE2b-256 b87616a9d17f7454814b28402146e0aef9cd1214bdf3bcee2f1f18e65a9b622f

See more details on using hashes here.

Provenance

The following attestation bundles were made for pixelml_av-0.1.0-py3-none-any.whl:

Publisher: publish.yml on PixelML/av

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page