VidChain: High-fidelity multimodal RAG framework featuring the IRIS Intelligence Agent

These details have not been verified by PyPI

Project links

Project description

VidChain: High-Fidelity Multimodal RAG Framework

v1.0.0-Stable — A local-first multimodal Retrieval-Augmented Generation (RAG) framework for forensic video intelligence.

Python CUDA License Status

VidChain v1.0 Dashboard

System Overview

VidChain is powered by the IRIS Engine (Intelligent Retrieval & Insight System). The framework parses video files through a modular sensory matrix, fusing visual, auditory, digital (OCR), and temporal data into a queryable intelligence layer. It is designed for forensic analysis, security auditing, and automated video summarization with strict local-hardware privacy constraints.

graph TD
    %% --- Ingestion Stage ---
    subgraph "1. Ingestion & Optimization Layer"
        VS[Video Source] --> AK[Adaptive Gaussian Filter]
        AK -- "Delta > Threshold" --> PK[Promote to Keyframe]
        AK -- "Redundant" --> DROP{{Neural Compute Firewall}}
    end

    %% --- Inference Stage ---
    subgraph "2. Sensory Node Matrix (Late Fusion)"
        PK --> VLM[LlavaNode: Scene Semantics]
        PK --> ASR[WhisperNode: Audio Trace]
        PK --> OCR[OcrNode: Digital Trace]
        PK --> TRK[TrackerNode: Motion Flow]
    end

    %% --- Intelligence Logic ---
    subgraph "3. IRIS Cognitive Engine"
        VLM & ASR & OCR & TRK --> FUSE[Spatio-Temporal Fusion]
        FUSE --> RDN[Recursive Map-Reduce Summarizer]
    end

    %% --- Persistence ---
    subgraph "4. VidChain Memory Vault"
        FUSE --> KV[(ChromaDB Vector Store)]
        FUSE --> KG[[Hybrid Global/Local GraphRAG]]
    end

    %% --- Interaction Stage ---
    subgraph "5. IRIS Intelligence Agent"
        USER[User Query] --> IR{4-Route Intent Router}
        IR -- "1. Narrative" --> SUM[Map-Reduce Summarizer]
        IR -- "2. Forensic" --> RAG[Local GraphRAG]
        IR -- "3. Master Intel" --> GLOBAL[Global Cross-Video RAG]
        IR -- "4. Dialogue" --> CHAT[Conversational Logic]
        SUM & RAG & GLOBAL & CHAT --> DISCOVERY([VidChain Insight Canvas])
    end

    style VS fill:#1e1e2e,stroke:#74c7ec,stroke-width:2px;
    style DISCOVERY fill:#11111b,stroke:#e8192c,stroke-width:3px;

Key Capabilities

4-Route Agentic Router: Optimized intent classification dividing queries into Narrative Summarization, Local Forensic Search, Global Master Intelligence, and Conversational Dialogue.
Global Master Intelligence: Cross-video entity tracking. IRIS builds a macro-graph of entities across isolated sessions, enabling broad pattern recognition and historical lookups.
Temporal Persistence: Sophisticated chronological reasoning. IRIS bridges gaps between frames, recognizing that states persist between active sensor logs.
Recursive Map-Reduce Summarizer: High-density narrative synthesis capable of collapsing hours of video data into coherent, chronological reports without hitting LLM context limits.
Neural Concurrency Locking: Production-hardened safety mechanisms preventing state corruption during simultaneous ingestion and querying tasks.
Local-First Privacy: 100% air-gapped reasoning. Data remains entirely on the host hardware.

Installation and Deployment

Prerequisites

Python: 3.11 or higher
CUDA: 12.1 or higher (Required for hardware acceleration)
Ollama: Must be installed and running for local LLM/VLM execution.
Node.js: v18+ (Required for the Next.js Web Portal)

Installation Steps

Install Core Dependencies

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Install VidChain

git clone https://github.com/rahulsiiitm/videochain-python
cd videochain-python
pip install -e .

Pull Required Neural Weights

ollama pull moondream   # Vision Language Model for scene semantics
ollama pull llama3      # Large Language Model for reasoning and routing

Hardware-Agnostic Engine

VidChain automatically audits hardware during initialization:

CUDA Available: Activates high-fidelity GPU pipelines for real-time analysis.
CPU Fallback: Gracefully degrades to CPU mode with zero code modifications required.

Command Line Interface (CLI)

VidChain provides several CLI tools for operational flexibility.

1. `vidchain-serve`

Launches the FastAPI backend and hosts the Next.js React frontend dashboard.

vidchain-serve

Hosts the REST API at http://localhost:8000.
Automatically opens the Spider-Net web dashboard in the default browser at http://localhost:3000.
Implements a 7-second "Neural Warmup" to stabilize models before accepting requests.

2. `vidchain-analyze`

Executes headless video ingestion and analysis directly from the terminal.

vidchain-analyze path/to/video.mp4 --vlm moondream

--fast: Bypasses the VLM and uses YOLO for high-speed object detection (ideal for long-form CCTV).
--emotion: Injects the DeepFace emotion analysis node.
--action: Injects the MobileNetV3 situational action node.

Changing the Neural Engines (LLM/VLM)

VidChain uses LiteLLM under the hood, meaning you can hot-swap the underlying AI models directly from the command line depending on your local hardware capabilities:

Change Reasoning Engine (LLM):
```
vidchain-analyze path/to/video.mp4 --llm "ollama/llama3"
vidchain-analyze path/to/video.mp4 --llm "gemini/gemini-2.5-flash"
```
(Default is gemini/gemini-2.5-flash. Note: To use Gemini or other cloud models, you must export your API key as an environment variable, e.g., export GEMINI_API_KEY="your_api_key").
Change Vision Engine (VLM):
```
vidchain-analyze path/to/video.mp4 --vlm "llava:7b"
```
(Default is moondream via Ollama)

Developer SDK: The Modular Sensor Matrix

VidChain utilizes a LangChain-inspired composable architecture. Developers can assemble custom pipelines by chaining specific sensory nodes.

Core Sensory Nodes

Node Class	Modality	Primary Application
`AdaptiveKeyframeNode`	Logic	Gaussian-differential sampling to drop redundant frames and reduce compute load.
`LlavaNode`	Visual	High-fidelity scene semantics, descriptive captions, and situational context.
`YoloNode`	Visual	High-speed, discrete object detection (fallback for `LlavaNode`).
`WhisperNode`	Audio	Speech-to-text transcription and acoustic anomaly detection (e.g., shouts).
`OcrNode`	Text	Digital trace extraction (license plates, computer screens, documents).
`TrackerNode`	Motion	Persistent object tracking (IoU) and camera motion estimation (Optical Flow).
`EmotionNode`	Behavioral	Facial sentiment analysis (requires visible faces).
`ActionNode`	Behavioral	High-speed classification of human activities.

Basic Usage (Default Pipeline)

The simplest way to integrate VidChain into an existing Python application is to use the default high-fidelity VLM pipeline.

from vidchain import VidChain

# 1. Initialize the IRIS Intelligence Vault
vc = VidChain(db_path="./forensic_vault")

# 2. Ingest Video (Automatically uses AdaptiveKeyframe, Llava, Whisper, etc.)
video_id = vc.ingest(video_source="interview_01.mp4")

# 3. Query the Engine
response = vc.ask("What is the main topic of discussion?", video_id=video_id)
print(response["text"])

# 4. Generate an Executive Summary
summary = vc.summarize_video(video_id=video_id, mode="concise")
print(summary)

SDK Example: Custom Forensic Pipeline

from vidchain import VidChain
from vidchain.pipeline import VideoChain
from vidchain.nodes import (
    AdaptiveKeyframeNode, 
    LlavaNode, 
    OcrNode, 
    TrackerNode
)

# 1. Initialize the Orchestrator
vc = VidChain(db_path="./forensic_vault")

# 2. Assemble a High-Sensitivity Custom Chain
surveillance_chain = VideoChain(nodes=[
    AdaptiveKeyframeNode(change_threshold=1.5), # High sensitivity for subtle movements
    LlavaNode(model="moondream"),              # Scene semantics
    OcrNode(),                                 # Digital trace extraction
    TrackerNode()                              # Spatio-temporal motion mapping
])

# 3. Execute Ingestion
video_id = vc.ingest(
    video_source="gate_camera_04.mp4", 
    chain=surveillance_chain
)

# 4. Perform Agentic Query
query = "Were there any vehicles with visible license plates after 14:00?"
response = vc.ask(query, video_id=video_id)

print(response)

REST API Reference

When running vidchain-serve, the system exposes a FastAPI backend for external integrations.

GET /api/health: Returns system status and the list of ingested video IDs.
POST /api/sessions: Creates a new isolated neural session.
POST /api/ingest: Accepts a video file path and initializes background processing.
POST /api/query: Submits a natural language query against a specific session, triggering the Agentic Router.
GET /api/media-stream: Serves absolute local video paths securely for frontend playback.

Architectural Details

Isolated GraphRAG Intelligence

Each ingested video generates a dedicated, persistent Temporal Knowledge Graph (.pkl). The RAG engine retrieves semantically relevant chunks from ChromaDB and fuses them with structured factual data (co-occurrences, tracking IDs, timestamps) from the graph. Memory boundaries are strictly enforced to prevent cross-video hallucinations.

The Neural Lens

The system automatically pairs textual answers with Base64-encoded visual snapshots extracted directly from the referenced timestamp. This provides immediate, undeniable visual proof for any AI-generated claim.

Author: Rahul Sharma — IIIT Manipur
License: MIT
Status: Production / v1.0.0-Stable

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.1

May 6, 2026

This version

1.0.0

Apr 25, 2026

0.9.1

Apr 24, 2026

0.9.0

Apr 22, 2026

0.8.8

Apr 22, 2026

0.8.3

Apr 21, 2026

0.8.0

Apr 20, 2026

0.7.2

Apr 19, 2026

0.6.0

Apr 18, 2026

0.5.0

Apr 18, 2026

0.4.0

Apr 18, 2026

0.2.0

Apr 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vidchain-1.0.0.tar.gz (3.6 MB view details)

Uploaded Apr 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vidchain-1.0.0-py3-none-any.whl (3.6 MB view details)

Uploaded Apr 25, 2026 Python 3

File details

Details for the file vidchain-1.0.0.tar.gz.

File metadata

Download URL: vidchain-1.0.0.tar.gz
Upload date: Apr 25, 2026
Size: 3.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for vidchain-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`605b449cf5127b808905a2ec64c641368969dd1d278cf3d93e6aeb82bd999da0`
MD5	`b36b76ac2ffaa7bf8f945fa2ea2d4759`
BLAKE2b-256	`ed8ba61f958a96f57600fa714d2b7e508b8d5032be2702da1adf0d648271420f`

See more details on using hashes here.

File details

Details for the file vidchain-1.0.0-py3-none-any.whl.

File metadata

Download URL: vidchain-1.0.0-py3-none-any.whl
Upload date: Apr 25, 2026
Size: 3.6 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for vidchain-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`83c4cb5775315a93d0255966923a9b8dbf60af5dfc284e625dadf4a335acc611`
MD5	`0881b85b30ca5decbc284233c5756fb3`
BLAKE2b-256	`d360f5b4f2e9ab817acf506f973cff7a0619cf0e449c60a6b5d55cf22d766723`

See more details on using hashes here.

vidchain 1.0.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

VidChain: High-Fidelity Multimodal RAG Framework

System Overview

Key Capabilities

Installation and Deployment

Prerequisites

Installation Steps

Hardware-Agnostic Engine

Command Line Interface (CLI)

1. vidchain-serve

2. vidchain-analyze

Changing the Neural Engines (LLM/VLM)

Developer SDK: The Modular Sensor Matrix

Core Sensory Nodes

Basic Usage (Default Pipeline)

SDK Example: Custom Forensic Pipeline

REST API Reference

Architectural Details

Isolated GraphRAG Intelligence

The Neural Lens

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1. `vidchain-serve`

2. `vidchain-analyze`