VidChain: High-fidelity multimodal RAG framework featuring the IRIS Intelligence Agent
Project description
VidChain
High-Fidelity Multimodal RAG Framework for Forensic Video Intelligence
VidChain is a local-first multimodal RAG framework powered by the IRIS Engine (Intelligent Retrieval & Insight System). It parses video through a modular sensory matrix — fusing visual, auditory, OCR, and temporal signals into a queryable intelligence layer — designed for forensic analysis, security auditing, and automated video summarization with strict on-device privacy.
Features
- 4-Route Agentic Router — Classifies queries into Narrative Summarization, Local Forensic Search, Global Master Intelligence, and Conversational Dialogue.
- Global Master Intelligence — Cross-video entity tracking via a macro-graph, enabling pattern recognition across isolated sessions.
- Temporal Persistence — Chronological reasoning that bridges frame gaps and maintains state continuity between sensor logs.
- Recursive Map-Reduce Summarizer — Collapses hours of video into coherent reports without hitting LLM context limits.
- Neural Concurrency Locking — Prevents state corruption during simultaneous ingestion and query operations.
- 100% Local Execution — All inference runs on host hardware; no data leaves the machine.
Installation
Prerequisites
| Requirement | Version |
|---|---|
| Python | 3.11+ |
| CUDA | 12.1+ |
| Ollama | Latest (running) |
| Node.js | v18+ (for web portal) |
Steps
1. Install PyTorch with CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
2. Clone and install VidChain
git clone https://github.com/rahulsiiitm/videochain-python
cd videochain-python
pip install -e .
3. Pull model weights
ollama pull moondream # Vision Language Model
ollama pull llama3 # Language Model for reasoning & routing
CPU Fallback: If no CUDA device is detected, VidChain automatically degrades to CPU mode — no code changes required.
Quick Start
from vidchain import VidChain
vc = VidChain(db_path="./forensic_vault")
# Ingest video (runs full default pipeline)
video_id = vc.ingest(video_source="interview_01.mp4")
# Query
response = vc.ask("What is the main topic of discussion?", video_id=video_id)
print(response["text"])
# Summarize
summary = vc.summarize_video(video_id=video_id, mode="concise")
print(summary)
CLI Reference
vidchain-serve
Launches the FastAPI backend and Next.js dashboard.
vidchain-serve
- API available at
http://localhost:8000 - Dashboard opens at
http://localhost:3000 - Includes a 7-second neural warmup before accepting requests
vidchain-analyze
Headless video ingestion from the terminal.
vidchain-analyze path/to/video.mp4 --vlm moondream
| Flag | Description |
|---|---|
--vlm <model> |
Vision model to use (default: moondream) |
--llm <model> |
Reasoning model to use (default: gemini/gemini-2.5-flash) |
--fast |
Replaces VLM with YOLO for high-speed detection (ideal for long CCTV footage) |
--emotion |
Injects DeepFace emotion analysis node |
--action |
Injects MobileNetV3 action classification node |
Swapping models — VidChain uses LiteLLM, so any compatible model can be hot-swapped:
# Local
vidchain-analyze video.mp4 --llm "ollama/llama3"
# Cloud (requires API key export)
export GEMINI_API_KEY="your_api_key"
vidchain-analyze video.mp4 --llm "gemini/gemini-2.5-flash"
# Custom VLM
vidchain-analyze video.mp4 --vlm "llava:7b"
SDK: Modular Sensor Matrix
VidChain uses a LangChain-inspired composable pipeline. Each Node handles one sensing modality; chains are assembled per use case.
Available Nodes
| Node | Modality | Description |
|---|---|---|
AdaptiveKeyframeNode |
Logic | Gaussian-differential sampling — drops redundant frames to reduce compute load |
LlavaNode |
Visual | Scene semantics, descriptive captions, and situational context |
YoloNode |
Visual | High-speed discrete object detection (lightweight fallback for LlavaNode) |
WhisperNode |
Audio | Speech transcription and acoustic anomaly detection (e.g., shouts) |
OcrNode |
Text | Digital trace extraction — license plates, screens, documents |
TrackerNode |
Motion | Persistent object tracking (IoU) and camera motion estimation (Optical Flow) |
EmotionNode |
Behavioral | Facial sentiment analysis |
ActionNode |
Behavioral | Human activity classification via MobileNetV3 |
Custom Pipeline Example
from vidchain import VidChain
from vidchain.pipeline import VideoChain
from vidchain.nodes import AdaptiveKeyframeNode, LlavaNode, OcrNode, TrackerNode
vc = VidChain(db_path="./forensic_vault")
surveillance_chain = VideoChain(nodes=[
AdaptiveKeyframeNode(change_threshold=1.5), # High sensitivity
LlavaNode(model="moondream"),
OcrNode(),
TrackerNode()
])
video_id = vc.ingest(
video_source="gate_camera_04.mp4",
chain=surveillance_chain
)
response = vc.ask(
"Were there any vehicles with visible license plates after 14:00?",
video_id=video_id
)
print(response)
REST API
Exposed when running vidchain-serve.
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/health |
System status and list of ingested video IDs |
POST |
/api/sessions |
Create a new isolated neural session |
POST |
/api/ingest |
Submit a video file path for background processing |
POST |
/api/query |
Run a natural language query through the Agentic Router |
GET |
/api/media-stream |
Serve local video securely for frontend playback |
Architecture
Isolated GraphRAG
Each ingested video generates a dedicated Temporal Knowledge Graph (.pkl). The RAG engine retrieves semantically relevant chunks from ChromaDB and fuses them with structured graph data (co-occurrences, tracking IDs, timestamps). Memory boundaries are strictly enforced — no cross-video context bleed.
The Neural Lens
Every query response is paired with a Base64-encoded visual snapshot extracted directly from the referenced timestamp, providing visual proof for AI-generated claims.
License
MIT — See LICENSE for details.
Author: Rahul Sharma — IIIT Manipur
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vidchain-1.0.1.tar.gz.
File metadata
- Download URL: vidchain-1.0.1.tar.gz
- Upload date:
- Size: 3.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea8409de26eddef03efc61fb853575d956c103dcd83f70a953cc335e56c4a936
|
|
| MD5 |
e79ef690f09a9acc6c8726afdf10697f
|
|
| BLAKE2b-256 |
e2e12591a9c92f33e05d382dd9b0f61119091c9ea675c349b922f53833892a2d
|
File details
Details for the file vidchain-1.0.1-py3-none-any.whl.
File metadata
- Download URL: vidchain-1.0.1-py3-none-any.whl
- Upload date:
- Size: 3.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b03e2f9c30a482cacac7b7ef04bffeb9dc563f069ab8cc1696f48487818398cf
|
|
| MD5 |
92e10c5b985ca272a9f9b4b84426eb45
|
|
| BLAKE2b-256 |
f659f0ab2850fdc4428182a38bccabf33dca4b3d0821f16f15691dcb06ea826d
|