Skip to main content

Edge-optimized multimodal RAG framework for video understanding

Project description

VidChain: The "LangChain for Videos"

v0.8.0-Stable — Edge-optimized, local-first multimodal RAG framework for forensic video intelligence. Compose modular sensory nodes into custom pipelines, deploy as a microservice, or query via the Spider-Net Intelligence Portal.

Python CUDA License Status PyPI version

Spider-Net Intelligence Portal


Advanced Forensic Architecture

VidChain v0.8.0-Stable is powered by the B.A.B.U.R.A.O. Engine (Behavioral Analysis & Broadcasting Unit for Real-time Artificial Observation). It utilizes a modular "Nodes & Chains" framework to transform raw pixels into serialized forensic intelligence.

graph TD
    %% --- Ingestion Stage ---
    subgraph "1. Ingestion & Optimization Layer"
        VS[Video Source] --> AK[Adaptive Gaussian Filter]
        AK -- "Delta > Threshold" --> PK[Promote to Keyframe]
        AK -- "Redundant" --> DROP{{GPU Compute Firewall}}
    end

    %% --- Inference Stage ---
    subgraph "2. Sensory Node Matrix (Late Fusion)"
        PK --> VLM[LlavaNode: Scene Semantics]
        PK --> ASR[WhisperNode: Audio Trace]
        PK --> OCR[OcrNode: Digital Trace]
        PK --> TRK[TrackerNode: Motion Flow]
        
        %% Optional Sensors
        PK -.-> ACT[ActionNode: Situational Verbs]
        PK -.-> EMT[EmotionNode: Sentiment]
    end

    %% --- Intelligence Logic ---
    subgraph "3. B.A.B.U.R.A.O. Cognitive Engine"
        VLM & ASR & OCR & TRK & ACT & EMT --> FUSE[Semantic Fusion Pipeline]
        FUSE --> RDN[Recursive Map-Reduce Summarizer]
    end

    %% --- Persistence ---
    subgraph "4. Forensic Memory Vault"
        FUSE --> KV[(ChromaDB Vector Store)]
        FUSE --> KG[[Temporal Knowledge Graph]]
    end

    %% --- Interaction Stage ---
    subgraph "5. Spider-Net Intelligence Portal"
        USER[User Query] --> IR{Intent Router}
        IR -- "Forensic Search" --> RAG[RAG Retrieval Loop]
        IR -- "Executive Overview" --> RDN
        RAG <--> KV
        RAG <--> KG
        RDN --> REPORT([Intelligence Report])
        RAG --> DISCOVERY([Discovery Hub])
    end

    %% --- Hardware Loop ---
    HM[NVML Hardware Monitor] -.-> AK
    HM -.-> VLM
    HM -.-> DISCOVERY

    style VS fill:#1e1e2e,stroke:#74c7ec,stroke-width:2px;
    style DISCOVERY fill:#11111b,stroke:#a6e3a1,stroke-width:3px;
    style REPORT fill:#11111b,stroke:#a6e3a1,stroke-width:3px;
    style DROP fill:#313244,stroke-dasharray: 5 5;
    style AK fill:#1e1e2e,stroke:#fab387;

Key Features (v0.8.0 Evolution)

Composable Sensory Chains

Snap together modular nodes to build custom forensic pipelines. Optimized for Hardware Awareness, the system scales its inference depth based on live GPU/VRAM telemetry.

  • Adaptive Keyframe Firewall: Gaussian-blur differential filtering blocks identical frames, saving 70% of GPU compute in static scenes.
  • VLM-First Captions: Replaces blind tags with dense semantic descriptions ("Subject is hiding a silver object in their left pocket").

Spider-Net Intelligence Portal

A professional-grade forensic command center served natively via vidchain-serve.

  • Evidence Vault: surgical frame-by-frame seeking with 33ms precision.
  • Neural HUD: Real-time visualization of sensor activity and hardware stress.
  • Semantic Heatmap: Intelligence density mapping across the video timeline.

Automated Intelligence Reporting

The built-in Recursive Map-Reduce engine automatically iterates over forensic logs to generate high-fidelity executive summaries, complete with verified timestamps and entity relationship discovery.


Installation

# Core installation
pip install VidChain

# Setup local AI backends (Ollama)
ollama pull moondream   # Optimized Edge VLM (1.7GB)
ollama pull llama3      # Local Reasoning Hub (4.7GB)

# Verify Hardware Readiness (Bundled utility)
python -m vidchain.scripts.check_gpu

Developer API Recipes (Python)

VidChain is designed to be deeply extensible. Here are the core "Intelligence Recipes" for v0.8.0-Stable.

1. High-Fidelity Forensic Scan (Default)

Best for evidence reconstruction where detail matters more than speed.

from vidchain import VidChain, VideoChain
from vidchain.nodes import AdaptiveKeyframeNode, LlavaNode, WhisperNode, OcrNode

# Build the chain
chain = VideoChain(nodes=[
    AdaptiveKeyframeNode(change_threshold=5.0),
    LlavaNode(model_name="moondream"), 
    WhisperNode(),
    OcrNode()
])

vc = VidChain()
vid = vc.ingest("evidence.mp4", chain=chain)
print(vc.summarize_video(vid))

2. "CCTV Ultra-Fast" Scan (Low Latency)

Prioritize object detection speed over descriptive captioning.

from vidchain.nodes import YoloNode, TrackerNode

# Swap the VLM for a fast YOLOv8 tracker
fast_chain = VideoChain(nodes=[
    YoloNode(confidence=0.5), # Ultra-fast detection
    TrackerNode()             # Subject persistence
], frame_skip=30)             # 1 FPS skip for massive speedup

vc.ingest("cctv_feed.mp4", chain=fast_chain)

3. Behavioral Sentiment Investigation

Combine Kinetic and Emotional sensors for psychological profiling.

from vidchain.nodes import EmotionNode, ActionNode, LlavaNode

profile_chain = VideoChain(nodes=[
    ActionNode(),             # Situational "Verbs"
    EmotionNode(),            # Facial Sentiment
    LlavaNode()               # Visual Context
])

vc.ingest("interview.mp4", chain=profile_chain)
print(vc.ask("Does the subject appear agitated when talking about the incident?"))

4. Direct Knowledge Graph Inquiry (No LLM)

Query entities directly from the Temporal Knowledge Graph without using LLM tokens.

# Access the structured GraphRAG facts directly
graph_facts = vc.graph_query("Laptop")
print(f"Appearances: {graph_facts['timestamps']}")
print(f"Co-occurrences: {graph_facts['entities_seen_together']}")

Forensic CLI Mastery

Command Mode Intelligence Depth
python -m vidchain.cli report.mp4 VLM-Standard Adaptive Keyframes + VLM + Summary.
python -m vidchain.cli report.mp4 --fast YOLO-Scan High-speed object detection for long CCTV.
python -m vidchain.cli report.mp4 --emotion Behavioral Injects EmotionNode for sentiment analysis.
python -m vidchain.cli report.mp4 --query "..." Direct Instant query without interactive chat.

Sensory Node Suite (The Matrix)

Node Type Purpose
LlavaNode VLM Dense Contextual Scene Captioning (Moondream/LLaVA).
WhisperNode Audio Time-aligned speech-to-text forensics.
OcrNode Text Screen text and digital trace extraction.
TrackerNode Motion Optical flow subject tracking & persistence.
ActionNode Verb Situational classification (Emergency, Violation).
EmotionNode Sentiment Behavioral sentiment analysis (DeepFace).
YoloNode Fast-Detect Ultra-fast object detection (Fallback for VLM).

Research Position & Uniqueness

VidChain treats video as Serialized Sensor Logs, performing retrieval over structured multimodal narratives rather than raw pixel tokens. This significantly reduces hallucinations and enables multi-video GraphRAG reasoning.

See RESEARCH_COMPARISON.md for detailed SOTA benchmarks.


📜 Changelog (The v0.8.0 Milestone)

  • v0.8.0: The Modular Revolution. Deprecated monolithic processors for a 100% composable Node framework. Added internal hardware diagnostics, automatic reporting, and fresh Next.js UI bundling.
  • v0.7.2: Integrated the Spider-Net Portal as a native microservice. Added Neural HUD and Evidence Vault.
  • v0.6.0: Introduced GraphRAG and Temporal Knowledge Graphs for entity tracking.

Author

Rahul Sharma — IIIT Manipur
SEM Project Version 0.8.0-Stable

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vidchain-0.8.0.tar.gz (546.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vidchain-0.8.0-py3-none-any.whl (570.8 kB view details)

Uploaded Python 3

File details

Details for the file vidchain-0.8.0.tar.gz.

File metadata

  • Download URL: vidchain-0.8.0.tar.gz
  • Upload date:
  • Size: 546.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for vidchain-0.8.0.tar.gz
Algorithm Hash digest
SHA256 c7f49c4e778437dc5fc715ed9085de856ee59f879c71d82e1515a8a69811a267
MD5 60363e21ed6aaa7d41ae8ab7856190d6
BLAKE2b-256 a1b03e2b08ef5ce8030d7d0bd8d9db22d3e48a1dfb6382e37fc6b04d0d1e951a

See more details on using hashes here.

File details

Details for the file vidchain-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: vidchain-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 570.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for vidchain-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aec8eaacbb022bfaf7b71efd0ff249148e5606dfdb123db0ecec9e043d6daac8
MD5 4305c77f07bfa987df03e24d99c939dd
BLAKE2b-256 c67181ab0c88542e80a4b33df563e176a246e669de84954359a4502bccb79cf8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page