Edge-optimized multimodal RAG framework for video understanding

These details have not been verified by PyPI

Project links

Project description

VidChain: Video Intelligence RAG Framework

Edge-optimized multimodal RAG framework for video understanding — transforms raw footage into a structured, queryable knowledge base.

Python CUDA License Status

Overview

VidChain v0.2.0 is a lightweight, modular framework that combines computer vision, OCR, speech recognition, emotion analysis, and LLM reasoning into a unified late-fusion pipeline. Designed to run on consumer-grade GPUs (tested on NVIDIA RTX 3050 4GB), it makes on-device video intelligence practical without cloud dependency.

At the heart is B.A.B.U.R.A.O. (Behavioral Analysis & Broadcasting Unit for Real-time Artificial Observation) — a conversational AI copilot that translates raw sensor logs into human-readable narratives using abductive reasoning.

Core Pipeline

Video → WAV Extraction → Whisper ASR → Frame Loop →
  ├── YOLO (Objects)
  ├── MobileNetV3 (Action)
  ├── EasyOCR (Screen Text)
  ├── DeepFace (Emotion, threaded)
  └── TemporalTracker (Object Persistence + Camera Motion)
→ Semantic Fusion → ChromaDB → B.A.B.U.R.A.O. RAG

Key Capabilities

🧠 Dual-Brain Vision Engine

YOLO (Nouns): Detects objects with bounding boxes — "1 person, 1 laptop"
MobileNetV3 (Verbs): Classifies scene intent — NORMAL / SUSPICIOUS / VIOLENCE / EMERGENCY

🔤 Context-Aware OCR

EasyOCR runs only when YOLO detects readable surfaces (laptop, monitor, whiteboard) — saves compute while capturing ground-truth text.

😶 Threaded Emotion Analysis

DeepFace runs on CPU in a background thread so it never competes with YOLO/MobileNet for VRAM.

📡 Temporal Tracking

Object Persistence: IoU tracker assigns persistent IDs across frames (person #1 present 12s, moving left)
Camera Motion: Lucas-Kanade optical flow detects pan, tilt, zoom, static
Scene Cut Detection: HSV histogram correlation resets trackers on hard cuts

🗣️ B.A.B.U.R.A.O. RAG Engine

BGE embedder (BAAI/bge-base-en-v1.5) for domain-specific retrieval
Cross-encoder reranker for precision before LLM call
Intent routing — distinguishes video search from conversational follow-ups
Chat memory — maintains context across multi-turn conversations

Installation

pip install vidchain

# GPU-accelerated PyTorch (recommended)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 --force-reinstall

Run python scripts/check_gpu.py to verify CUDA is detected.

Quick Start

Python API (Library)

from vidchain import VidChain

# Initialize
vc = VidChain(config={
    "llm_provider": "gemini/gemini-2.5-flash",  # or "ollama/llama3" for offline
    "db_path": "./vidchain_storage"              # omit for in-memory (no persistence)
})

# Ingest a video
video_id = vc.ingest("surveillance.mp4")

# Query
print(vc.ask("what happened in the video?"))
print(vc.ask("was anyone acting suspiciously?"))

# Multi-video: scope query to a specific video
vc.ingest("cam1.mp4", video_id="cam1")
vc.ingest("cam2.mp4", video_id="cam2")
print(vc.ask("did anyone enter the room?", video_id="cam1"))

CLI

# Analyze and chat
vidchain-analyze video.mp4

# Single-shot query
vidchain-analyze video.mp4 --query "what happened at the desk?"

# Offline with Ollama
vidchain-analyze video.mp4 --llm ollama/llama3

# Multilingual OCR
vidchain-analyze video.mp4 --ocr-lang en fr

Train Custom Action Engine

# Place labeled images in data/train/<class>/
vidchain-train

Knowledge Base Schema

Each fused timeline entry contains all modalities at that moment:

{
    "time": 5.8,
    "duration": 3.2,
    "objects": "1 person, 1 laptop",
    "action": "SUSPICIOUS",
    "emotion": "visibly agitated",
    "ocr": "ASUS Vivobook",
    "audio": "I told you this would happen",
    "camera": "static",
    "tracking": ["person #1 (present 4.8s), moving left", "laptop #2 (present 5.8s)"],
    "audio_anomaly": "NORMAL"
}

Tech Stack

Component	Technology
Object Detection	YOLOv8s (Ultralytics)
Action Classification	MobileNetV3 (custom fine-tuned)
Speech Recognition	OpenAI Whisper (base)
OCR	EasyOCR
Emotion Analysis	DeepFace (opencv backend)
Temporal Tracking	IoU tracker + Lucas-Kanade optical flow
Embedder	`BAAI/bge-base-en-v1.5`
Reranker	`cross-encoder/ms-marco-MiniLM-L-6-v2`
Vector Store	ChromaDB (persistent)
LLM Routing	LiteLLM (`gemini-2.5-flash` default, Ollama supported)
Scene Understanding	CLIP (`openai/clip-vit-base-patch32`)
GPU Runtime	CUDA 12.1 (4GB+ VRAM, RTX 30-series tested)

Developer Utilities

# List all indexed videos
vc.list_indexed_videos()

# Generate a narrative summary
vc.summarize_video(video_id, depth="concise")  # or "detailed"

# Hot-swap LLM
vc.set_llm("ollama/llama3")

# Purge a specific video
vc.purge_storage(video_id="cam1")

# Purge everything
vc.purge_storage()

Roadmap

CLIP scene understanding — zero-shot environment classification (v0.3.0)
Adaptive audio filtering — energy gating, anomaly detection, segment merging (v0.3.0)
Multi-video scoped queries — vc.ask(query, video_id="cam1") (v0.3.0)
Graceful degradation — every engine fails independently (v0.3.0)
Real-time streaming — live camera ingestion with low-latency indexing
Cross-video subject tracking — link the same person across multiple camera feeds
Export to CSV — structured timeline export for downstream analysis

Contributing

Contributions, issues, and feature requests are welcome. Open a GitHub issue or submit a pull request.

Author

Rahul Sharma — B.Tech CSE, IIIT Manipur

License

Distributed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.1

May 6, 2026

1.0.0

Apr 25, 2026

0.9.1

Apr 24, 2026

0.9.0

Apr 22, 2026

0.8.8

Apr 22, 2026

0.8.3

Apr 21, 2026

0.8.0

Apr 20, 2026

0.7.2

Apr 19, 2026

0.6.0

Apr 18, 2026

0.5.0

Apr 18, 2026

This version

0.4.0

Apr 18, 2026

0.2.0

Apr 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vidchain-0.4.0.tar.gz (43.1 kB view details)

Uploaded Apr 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vidchain-0.4.0-py3-none-any.whl (52.0 kB view details)

Uploaded Apr 18, 2026 Python 3

File details

Details for the file vidchain-0.4.0.tar.gz.

File metadata

Download URL: vidchain-0.4.0.tar.gz
Upload date: Apr 18, 2026
Size: 43.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for vidchain-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`444e58ca24336564ea2fbc76caa48df04208c8c0c206a81f89bfba171938d3ed`
MD5	`616be2486d7dc7537504fef4126e0eb2`
BLAKE2b-256	`80d0d9f29f52e85872450e9fd6adfffbf6eadbf88a6e01002e29b6b54f77bc57`

See more details on using hashes here.

File details

Details for the file vidchain-0.4.0-py3-none-any.whl.

File metadata

Download URL: vidchain-0.4.0-py3-none-any.whl
Upload date: Apr 18, 2026
Size: 52.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for vidchain-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`afd5a3d51087f9c09a1279587cc862785273860598614d895c87226da48eda8f`
MD5	`f5a8a84bbf9e7466f257be9cfcfcb209`
BLAKE2b-256	`57a796d96ab895a264f86427359764e22e62c58efa7c9245492a10f4ca53b456`

See more details on using hashes here.

vidchain 0.4.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

VidChain: Video Intelligence RAG Framework

Overview

Core Pipeline

Key Capabilities

🧠 Dual-Brain Vision Engine

🔤 Context-Aware OCR

😶 Threaded Emotion Analysis

📡 Temporal Tracking

🗣️ B.A.B.U.R.A.O. RAG Engine

Installation

Quick Start

Python API (Library)

CLI

Train Custom Action Engine

Knowledge Base Schema

Tech Stack

Developer Utilities

Roadmap

Contributing

Author

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes