Edge-optimized multimodal RAG framework for video understanding
Project description
VidChain: The "LangChain for Videos"
v0.8.0-Stable — Edge-optimized, local-first multimodal RAG framework for forensic video intelligence. Compose modular sensory nodes into custom pipelines, deploy as a microservice, or query via the Spider-Net Intelligence Portal.
Advanced Forensic Architecture
VidChain v0.8.0-Stable is powered by the B.A.B.U.R.A.O. Engine (Behavioral Analysis & Broadcasting Unit for Real-time Artificial Observation). It utilizes a modular "Nodes & Chains" framework to transform raw pixels into serialized forensic intelligence.
graph TD
%% --- Ingestion Stage ---
subgraph "1. Ingestion & Optimization Layer"
VS[Video Source] --> AK[Adaptive Gaussian Filter]
AK -- "Delta > Threshold" --> PK[Promote to Keyframe]
AK -- "Redundant" --> DROP{{GPU Compute Firewall}}
end
%% --- Inference Stage ---
subgraph "2. Sensory Node Matrix (Late Fusion)"
PK --> VLM[LlavaNode: Scene Semantics]
PK --> ASR[WhisperNode: Audio Trace]
PK --> OCR[OcrNode: Digital Trace]
PK --> TRK[TrackerNode: Motion Flow]
%% Optional Sensors
PK -.-> ACT[ActionNode: Situational Verbs]
PK -.-> EMT[EmotionNode: Sentiment]
end
%% --- Intelligence Logic ---
subgraph "3. B.A.B.U.R.A.O. Cognitive Engine"
VLM & ASR & OCR & TRK & ACT & EMT --> FUSE[Semantic Fusion Pipeline]
FUSE --> RDN[Recursive Map-Reduce Summarizer]
end
%% --- Persistence ---
subgraph "4. Forensic Memory Vault"
FUSE --> KV[(ChromaDB Vector Store)]
FUSE --> KG[[Temporal Knowledge Graph]]
end
%% --- Interaction Stage ---
subgraph "5. Spider-Net Intelligence Portal"
USER[User Query] --> IR{Intent Router}
IR -- "Forensic Search" --> RAG[RAG Retrieval Loop]
IR -- "Executive Overview" --> RDN
RAG <--> KV
RAG <--> KG
RDN --> REPORT([Intelligence Report])
RAG --> DISCOVERY([Discovery Hub])
end
%% --- Hardware Loop ---
HM[NVML Hardware Monitor] -.-> AK
HM -.-> VLM
HM -.-> DISCOVERY
style VS fill:#1e1e2e,stroke:#74c7ec,stroke-width:2px;
style DISCOVERY fill:#11111b,stroke:#a6e3a1,stroke-width:3px;
style REPORT fill:#11111b,stroke:#a6e3a1,stroke-width:3px;
style DROP fill:#313244,stroke-dasharray: 5 5;
style AK fill:#1e1e2e,stroke:#fab387;
Key Features (v0.8.0 Evolution)
Composable Sensory Chains
Snap together modular nodes to build custom forensic pipelines. Optimized for Hardware Awareness, the system scales its inference depth based on live GPU/VRAM telemetry.
- Adaptive Keyframe Firewall: Gaussian-blur differential filtering blocks identical frames, saving 70% of GPU compute in static scenes.
- VLM-First Captions: Replaces blind tags with dense semantic descriptions ("Subject is hiding a silver object in their left pocket").
Spider-Net Intelligence Portal
A professional-grade forensic command center served natively via vidchain-serve.
- Evidence Vault: surgical frame-by-frame seeking with 33ms precision.
- Neural HUD: Real-time visualization of sensor activity and hardware stress.
- Semantic Heatmap: Intelligence density mapping across the video timeline.
Automated Intelligence Reporting
The built-in Recursive Map-Reduce engine automatically iterates over forensic logs to generate high-fidelity executive summaries, complete with verified timestamps and entity relationship discovery.
Installation
# Core installation
pip install VidChain
# Setup local AI backends (Ollama)
ollama pull moondream # Optimized Edge VLM (1.7GB)
ollama pull llama3 # Local Reasoning Hub (4.7GB)
# Verify Hardware Readiness (Bundled utility)
python -m vidchain.scripts.check_gpu
Developer API Recipes (Python)
VidChain is designed to be deeply extensible. Here are the core "Intelligence Recipes" for v0.8.0-Stable.
1. High-Fidelity Forensic Scan (Default)
Best for evidence reconstruction where detail matters more than speed.
from vidchain import VidChain, VideoChain
from vidchain.nodes import AdaptiveKeyframeNode, LlavaNode, WhisperNode, OcrNode
# Build the chain
chain = VideoChain(nodes=[
AdaptiveKeyframeNode(change_threshold=5.0),
LlavaNode(model_name="moondream"),
WhisperNode(),
OcrNode()
])
vc = VidChain()
vid = vc.ingest("evidence.mp4", chain=chain)
print(vc.summarize_video(vid))
2. "CCTV Ultra-Fast" Scan (Low Latency)
Prioritize object detection speed over descriptive captioning.
from vidchain.nodes import YoloNode, TrackerNode
# Swap the VLM for a fast YOLOv8 tracker
fast_chain = VideoChain(nodes=[
YoloNode(confidence=0.5), # Ultra-fast detection
TrackerNode() # Subject persistence
], frame_skip=30) # 1 FPS skip for massive speedup
vc.ingest("cctv_feed.mp4", chain=fast_chain)
3. Behavioral Sentiment Investigation
Combine Kinetic and Emotional sensors for psychological profiling.
from vidchain.nodes import EmotionNode, ActionNode, LlavaNode
profile_chain = VideoChain(nodes=[
ActionNode(), # Situational "Verbs"
EmotionNode(), # Facial Sentiment
LlavaNode() # Visual Context
])
vc.ingest("interview.mp4", chain=profile_chain)
print(vc.ask("Does the subject appear agitated when talking about the incident?"))
4. Direct Knowledge Graph Inquiry (No LLM)
Query entities directly from the Temporal Knowledge Graph without using LLM tokens.
# Access the structured GraphRAG facts directly
graph_facts = vc.graph_query("Laptop")
print(f"Appearances: {graph_facts['timestamps']}")
print(f"Co-occurrences: {graph_facts['entities_seen_together']}")
Forensic CLI Mastery
| Command | Mode | Intelligence Depth |
|---|---|---|
python -m vidchain.cli report.mp4 |
VLM-Standard | Adaptive Keyframes + VLM + Summary. |
python -m vidchain.cli report.mp4 --fast |
YOLO-Scan | High-speed object detection for long CCTV. |
python -m vidchain.cli report.mp4 --emotion |
Behavioral | Injects EmotionNode for sentiment analysis. |
python -m vidchain.cli report.mp4 --query "..." |
Direct | Instant query without interactive chat. |
Sensory Node Suite (The Matrix)
| Node | Type | Purpose |
|---|---|---|
LlavaNode |
VLM | Dense Contextual Scene Captioning (Moondream/LLaVA). |
WhisperNode |
Audio | Time-aligned speech-to-text forensics. |
OcrNode |
Text | Screen text and digital trace extraction. |
TrackerNode |
Motion | Optical flow subject tracking & persistence. |
ActionNode |
Verb | Situational classification (Emergency, Violation). |
EmotionNode |
Sentiment | Behavioral sentiment analysis (DeepFace). |
YoloNode |
Fast-Detect | Ultra-fast object detection (Fallback for VLM). |
Research Position & Uniqueness
VidChain treats video as Serialized Sensor Logs, performing retrieval over structured multimodal narratives rather than raw pixel tokens. This significantly reduces hallucinations and enables multi-video GraphRAG reasoning.
See RESEARCH_COMPARISON.md for detailed SOTA benchmarks.
📜 Changelog (The v0.8.0 Milestone)
- v0.8.0: The Modular Revolution. Deprecated monolithic processors for a 100% composable Node framework. Added internal hardware diagnostics, automatic reporting, and fresh Next.js UI bundling.
- v0.7.2: Integrated the Spider-Net Portal as a native microservice. Added Neural HUD and Evidence Vault.
- v0.6.0: Introduced GraphRAG and Temporal Knowledge Graphs for entity tracking.
Author
Rahul Sharma — IIIT Manipur
SEM Project Version 0.8.0-Stable
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vidchain-0.8.0.tar.gz.
File metadata
- Download URL: vidchain-0.8.0.tar.gz
- Upload date:
- Size: 546.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7f49c4e778437dc5fc715ed9085de856ee59f879c71d82e1515a8a69811a267
|
|
| MD5 |
60363e21ed6aaa7d41ae8ab7856190d6
|
|
| BLAKE2b-256 |
a1b03e2b08ef5ce8030d7d0bd8d9db22d3e48a1dfb6382e37fc6b04d0d1e951a
|
File details
Details for the file vidchain-0.8.0-py3-none-any.whl.
File metadata
- Download URL: vidchain-0.8.0-py3-none-any.whl
- Upload date:
- Size: 570.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aec8eaacbb022bfaf7b71efd0ff249148e5606dfdb123db0ecec9e043d6daac8
|
|
| MD5 |
4305c77f07bfa987df03e24d99c939dd
|
|
| BLAKE2b-256 |
c67181ab0c88542e80a4b33df563e176a246e669de84954359a4502bccb79cf8
|