VidChain: High-fidelity multimodal RAG framework featuring the IRIS Intelligence Agent
Project description
VidChain: High-Fidelity Multimodal RAG Framework
v1.0.0-Stable — A local-first multimodal Retrieval-Augmented Generation (RAG) framework for forensic video intelligence.
System Overview
VidChain is powered by the IRIS Engine (Intelligent Retrieval & Insight System). The framework parses video files through a modular sensory matrix, fusing visual, auditory, digital (OCR), and temporal data into a queryable intelligence layer. It is designed for forensic analysis, security auditing, and automated video summarization with strict local-hardware privacy constraints.
graph TD
%% --- Ingestion Stage ---
subgraph "1. Ingestion & Optimization Layer"
VS[Video Source] --> AK[Adaptive Gaussian Filter]
AK -- "Delta > Threshold" --> PK[Promote to Keyframe]
AK -- "Redundant" --> DROP{{Neural Compute Firewall}}
end
%% --- Inference Stage ---
subgraph "2. Sensory Node Matrix (Late Fusion)"
PK --> VLM[LlavaNode: Scene Semantics]
PK --> ASR[WhisperNode: Audio Trace]
PK --> OCR[OcrNode: Digital Trace]
PK --> TRK[TrackerNode: Motion Flow]
end
%% --- Intelligence Logic ---
subgraph "3. IRIS Cognitive Engine"
VLM & ASR & OCR & TRK --> FUSE[Spatio-Temporal Fusion]
FUSE --> RDN[Recursive Map-Reduce Summarizer]
end
%% --- Persistence ---
subgraph "4. VidChain Memory Vault"
FUSE --> KV[(ChromaDB Vector Store)]
FUSE --> KG[[Hybrid Global/Local GraphRAG]]
end
%% --- Interaction Stage ---
subgraph "5. IRIS Intelligence Agent"
USER[User Query] --> IR{4-Route Intent Router}
IR -- "1. Narrative" --> SUM[Map-Reduce Summarizer]
IR -- "2. Forensic" --> RAG[Local GraphRAG]
IR -- "3. Master Intel" --> GLOBAL[Global Cross-Video RAG]
IR -- "4. Dialogue" --> CHAT[Conversational Logic]
SUM & RAG & GLOBAL & CHAT --> DISCOVERY([VidChain Insight Canvas])
end
style VS fill:#1e1e2e,stroke:#74c7ec,stroke-width:2px;
style DISCOVERY fill:#11111b,stroke:#e8192c,stroke-width:3px;
Key Capabilities
- 4-Route Agentic Router: Optimized intent classification dividing queries into Narrative Summarization, Local Forensic Search, Global Master Intelligence, and Conversational Dialogue.
- Global Master Intelligence: Cross-video entity tracking. IRIS builds a macro-graph of entities across isolated sessions, enabling broad pattern recognition and historical lookups.
- Temporal Persistence: Sophisticated chronological reasoning. IRIS bridges gaps between frames, recognizing that states persist between active sensor logs.
- Recursive Map-Reduce Summarizer: High-density narrative synthesis capable of collapsing hours of video data into coherent, chronological reports without hitting LLM context limits.
- Neural Concurrency Locking: Production-hardened safety mechanisms preventing state corruption during simultaneous ingestion and querying tasks.
- Local-First Privacy: 100% air-gapped reasoning. Data remains entirely on the host hardware.
Installation and Deployment
Prerequisites
- Python: 3.11 or higher
- CUDA: 12.1 or higher (Required for hardware acceleration)
- Ollama: Must be installed and running for local LLM/VLM execution.
- Node.js: v18+ (Required for the Next.js Web Portal)
Installation Steps
- Install Core Dependencies
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
- Install VidChain
git clone https://github.com/rahulsiiitm/videochain-python
cd videochain-python
pip install -e .
- Pull Required Neural Weights
ollama pull moondream # Vision Language Model for scene semantics
ollama pull llama3 # Large Language Model for reasoning and routing
Hardware-Agnostic Engine
VidChain automatically audits hardware during initialization:
- CUDA Available: Activates high-fidelity GPU pipelines for real-time analysis.
- CPU Fallback: Gracefully degrades to CPU mode with zero code modifications required.
Command Line Interface (CLI)
VidChain provides several CLI tools for operational flexibility.
1. vidchain-serve
Launches the FastAPI backend and hosts the Next.js React frontend dashboard.
vidchain-serve
- Hosts the REST API at
http://localhost:8000. - Automatically opens the Spider-Net web dashboard in the default browser at
http://localhost:3000. - Implements a 7-second "Neural Warmup" to stabilize models before accepting requests.
2. vidchain-analyze
Executes headless video ingestion and analysis directly from the terminal.
vidchain-analyze path/to/video.mp4 --vlm moondream
--fast: Bypasses the VLM and uses YOLO for high-speed object detection (ideal for long-form CCTV).--emotion: Injects the DeepFace emotion analysis node.--action: Injects the MobileNetV3 situational action node.
Changing the Neural Engines (LLM/VLM)
VidChain uses LiteLLM under the hood, meaning you can hot-swap the underlying AI models directly from the command line depending on your local hardware capabilities:
-
Change Reasoning Engine (LLM):
vidchain-analyze path/to/video.mp4 --llm "ollama/llama3" vidchain-analyze path/to/video.mp4 --llm "gemini/gemini-2.5-flash"
(Default is
gemini/gemini-2.5-flash. Note: To use Gemini or other cloud models, you must export your API key as an environment variable, e.g.,export GEMINI_API_KEY="your_api_key"). -
Change Vision Engine (VLM):
vidchain-analyze path/to/video.mp4 --vlm "llava:7b"
(Default is
moondreamvia Ollama)
Developer SDK: The Modular Sensor Matrix
VidChain utilizes a LangChain-inspired composable architecture. Developers can assemble custom pipelines by chaining specific sensory nodes.
Core Sensory Nodes
| Node Class | Modality | Primary Application |
|---|---|---|
AdaptiveKeyframeNode |
Logic | Gaussian-differential sampling to drop redundant frames and reduce compute load. |
LlavaNode |
Visual | High-fidelity scene semantics, descriptive captions, and situational context. |
YoloNode |
Visual | High-speed, discrete object detection (fallback for LlavaNode). |
WhisperNode |
Audio | Speech-to-text transcription and acoustic anomaly detection (e.g., shouts). |
OcrNode |
Text | Digital trace extraction (license plates, computer screens, documents). |
TrackerNode |
Motion | Persistent object tracking (IoU) and camera motion estimation (Optical Flow). |
EmotionNode |
Behavioral | Facial sentiment analysis (requires visible faces). |
ActionNode |
Behavioral | High-speed classification of human activities. |
Basic Usage (Default Pipeline)
The simplest way to integrate VidChain into an existing Python application is to use the default high-fidelity VLM pipeline.
from vidchain import VidChain
# 1. Initialize the IRIS Intelligence Vault
vc = VidChain(db_path="./forensic_vault")
# 2. Ingest Video (Automatically uses AdaptiveKeyframe, Llava, Whisper, etc.)
video_id = vc.ingest(video_source="interview_01.mp4")
# 3. Query the Engine
response = vc.ask("What is the main topic of discussion?", video_id=video_id)
print(response["text"])
# 4. Generate an Executive Summary
summary = vc.summarize_video(video_id=video_id, mode="concise")
print(summary)
SDK Example: Custom Forensic Pipeline
from vidchain import VidChain
from vidchain.pipeline import VideoChain
from vidchain.nodes import (
AdaptiveKeyframeNode,
LlavaNode,
OcrNode,
TrackerNode
)
# 1. Initialize the Orchestrator
vc = VidChain(db_path="./forensic_vault")
# 2. Assemble a High-Sensitivity Custom Chain
surveillance_chain = VideoChain(nodes=[
AdaptiveKeyframeNode(change_threshold=1.5), # High sensitivity for subtle movements
LlavaNode(model="moondream"), # Scene semantics
OcrNode(), # Digital trace extraction
TrackerNode() # Spatio-temporal motion mapping
])
# 3. Execute Ingestion
video_id = vc.ingest(
video_source="gate_camera_04.mp4",
chain=surveillance_chain
)
# 4. Perform Agentic Query
query = "Were there any vehicles with visible license plates after 14:00?"
response = vc.ask(query, video_id=video_id)
print(response)
REST API Reference
When running vidchain-serve, the system exposes a FastAPI backend for external integrations.
GET /api/health: Returns system status and the list of ingested video IDs.POST /api/sessions: Creates a new isolated neural session.POST /api/ingest: Accepts a video file path and initializes background processing.POST /api/query: Submits a natural language query against a specific session, triggering the Agentic Router.GET /api/media-stream: Serves absolute local video paths securely for frontend playback.
Architectural Details
Isolated GraphRAG Intelligence
Each ingested video generates a dedicated, persistent Temporal Knowledge Graph (.pkl). The RAG engine retrieves semantically relevant chunks from ChromaDB and fuses them with structured factual data (co-occurrences, tracking IDs, timestamps) from the graph. Memory boundaries are strictly enforced to prevent cross-video hallucinations.
The Neural Lens
The system automatically pairs textual answers with Base64-encoded visual snapshots extracted directly from the referenced timestamp. This provides immediate, undeniable visual proof for any AI-generated claim.
Author: Rahul Sharma — IIIT Manipur
License: MIT
Status: Production / v1.0.0-Stable
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vidchain-1.0.0.tar.gz.
File metadata
- Download URL: vidchain-1.0.0.tar.gz
- Upload date:
- Size: 3.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
605b449cf5127b808905a2ec64c641368969dd1d278cf3d93e6aeb82bd999da0
|
|
| MD5 |
b36b76ac2ffaa7bf8f945fa2ea2d4759
|
|
| BLAKE2b-256 |
ed8ba61f958a96f57600fa714d2b7e508b8d5032be2702da1adf0d648271420f
|
File details
Details for the file vidchain-1.0.0-py3-none-any.whl.
File metadata
- Download URL: vidchain-1.0.0-py3-none-any.whl
- Upload date:
- Size: 3.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
83c4cb5775315a93d0255966923a9b8dbf60af5dfc284e625dadf4a335acc611
|
|
| MD5 |
0881b85b30ca5decbc284233c5756fb3
|
|
| BLAKE2b-256 |
d360f5b4f2e9ab817acf506f973cff7a0619cf0e449c60a6b5d55cf22d766723
|