Skip to main content

VidChain: High-fidelity multimodal RAG framework featuring the IRIS Intelligence Agent

Project description

VidChain

High-Fidelity Multimodal RAG Framework for Forensic Video Intelligence

Python CUDA License Status PyPI version Downloads

VidChain is a local-first multimodal RAG framework powered by the IRIS Engine (Intelligent Retrieval & Insight System). It parses video through a modular sensory matrix — fusing visual, auditory, OCR, and temporal signals into a queryable intelligence layer — designed for forensic analysis, security auditing, and automated video summarization with strict on-device privacy.

VidChain v1.0 Dashboard


Features

  • 4-Route Agentic Router — Classifies queries into Narrative Summarization, Local Forensic Search, Global Master Intelligence, and Conversational Dialogue.
  • Global Master Intelligence — Cross-video entity tracking via a macro-graph, enabling pattern recognition across isolated sessions.
  • Temporal Persistence — Chronological reasoning that bridges frame gaps and maintains state continuity between sensor logs.
  • Recursive Map-Reduce Summarizer — Collapses hours of video into coherent reports without hitting LLM context limits.
  • Neural Concurrency Locking — Prevents state corruption during simultaneous ingestion and query operations.
  • 100% Local Execution — All inference runs on host hardware; no data leaves the machine.

Installation

Prerequisites

Requirement Version
Python 3.11+
CUDA 12.1+
Ollama Latest (running)
Node.js v18+ (for web portal)

Steps

1. Install PyTorch with CUDA support

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

2. Clone and install VidChain

git clone https://github.com/rahulsiiitm/videochain-python
cd videochain-python
pip install -e .

3. Pull model weights

ollama pull moondream   # Vision Language Model
ollama pull llama3      # Language Model for reasoning & routing

CPU Fallback: If no CUDA device is detected, VidChain automatically degrades to CPU mode — no code changes required.


Quick Start

from vidchain import VidChain

vc = VidChain(db_path="./forensic_vault")

# Ingest video (runs full default pipeline)
video_id = vc.ingest(video_source="interview_01.mp4")

# Query
response = vc.ask("What is the main topic of discussion?", video_id=video_id)
print(response["text"])

# Summarize
summary = vc.summarize_video(video_id=video_id, mode="concise")
print(summary)

CLI Reference

vidchain-serve

Launches the FastAPI backend and Next.js dashboard.

vidchain-serve
  • API available at http://localhost:8000
  • Dashboard opens at http://localhost:3000
  • Includes a 7-second neural warmup before accepting requests

vidchain-analyze

Headless video ingestion from the terminal.

vidchain-analyze path/to/video.mp4 --vlm moondream
Flag Description
--vlm <model> Vision model to use (default: moondream)
--llm <model> Reasoning model to use (default: gemini/gemini-2.5-flash)
--fast Replaces VLM with YOLO for high-speed detection (ideal for long CCTV footage)
--emotion Injects DeepFace emotion analysis node
--action Injects MobileNetV3 action classification node

Swapping models — VidChain uses LiteLLM, so any compatible model can be hot-swapped:

# Local
vidchain-analyze video.mp4 --llm "ollama/llama3"

# Cloud (requires API key export)
export GEMINI_API_KEY="your_api_key"
vidchain-analyze video.mp4 --llm "gemini/gemini-2.5-flash"

# Custom VLM
vidchain-analyze video.mp4 --vlm "llava:7b"

SDK: Modular Sensor Matrix

VidChain uses a LangChain-inspired composable pipeline. Each Node handles one sensing modality; chains are assembled per use case.

Available Nodes

Node Modality Description
AdaptiveKeyframeNode Logic Gaussian-differential sampling — drops redundant frames to reduce compute load
LlavaNode Visual Scene semantics, descriptive captions, and situational context
YoloNode Visual High-speed discrete object detection (lightweight fallback for LlavaNode)
WhisperNode Audio Speech transcription and acoustic anomaly detection (e.g., shouts)
OcrNode Text Digital trace extraction — license plates, screens, documents
TrackerNode Motion Persistent object tracking (IoU) and camera motion estimation (Optical Flow)
EmotionNode Behavioral Facial sentiment analysis
ActionNode Behavioral Human activity classification via MobileNetV3

Custom Pipeline Example

from vidchain import VidChain
from vidchain.pipeline import VideoChain
from vidchain.nodes import AdaptiveKeyframeNode, LlavaNode, OcrNode, TrackerNode

vc = VidChain(db_path="./forensic_vault")

surveillance_chain = VideoChain(nodes=[
    AdaptiveKeyframeNode(change_threshold=1.5),  # High sensitivity
    LlavaNode(model="moondream"),
    OcrNode(),
    TrackerNode()
])

video_id = vc.ingest(
    video_source="gate_camera_04.mp4",
    chain=surveillance_chain
)

response = vc.ask(
    "Were there any vehicles with visible license plates after 14:00?",
    video_id=video_id
)
print(response)

REST API

Exposed when running vidchain-serve.

Method Endpoint Description
GET /api/health System status and list of ingested video IDs
POST /api/sessions Create a new isolated neural session
POST /api/ingest Submit a video file path for background processing
POST /api/query Run a natural language query through the Agentic Router
GET /api/media-stream Serve local video securely for frontend playback

Architecture

Isolated GraphRAG

Each ingested video generates a dedicated Temporal Knowledge Graph (.pkl). The RAG engine retrieves semantically relevant chunks from ChromaDB and fuses them with structured graph data (co-occurrences, tracking IDs, timestamps). Memory boundaries are strictly enforced — no cross-video context bleed.

The Neural Lens

Every query response is paired with a Base64-encoded visual snapshot extracted directly from the referenced timestamp, providing visual proof for AI-generated claims.


License

MIT — See LICENSE for details.

Author: Rahul Sharma — IIIT Manipur

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vidchain-1.0.1.tar.gz (3.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vidchain-1.0.1-py3-none-any.whl (3.6 MB view details)

Uploaded Python 3

File details

Details for the file vidchain-1.0.1.tar.gz.

File metadata

  • Download URL: vidchain-1.0.1.tar.gz
  • Upload date:
  • Size: 3.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for vidchain-1.0.1.tar.gz
Algorithm Hash digest
SHA256 ea8409de26eddef03efc61fb853575d956c103dcd83f70a953cc335e56c4a936
MD5 e79ef690f09a9acc6c8726afdf10697f
BLAKE2b-256 e2e12591a9c92f33e05d382dd9b0f61119091c9ea675c349b922f53833892a2d

See more details on using hashes here.

File details

Details for the file vidchain-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: vidchain-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 3.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for vidchain-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b03e2f9c30a482cacac7b7ef04bffeb9dc563f069ab8cc1696f48487818398cf
MD5 92e10c5b985ca272a9f9b4b84426eb45
BLAKE2b-256 f659f0ab2850fdc4428182a38bccabf33dca4b3d0821f16f15691dcb06ea826d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page