Skip to main content

An Intelligent Open-Source Video Understanding System A different path from traditional Large Video Language Models (LVLMs). Built for modularity, openness, and real-world usability.

Project description

PyPI version Python Versions License Downloads Twitter Follow

gUrrT (derived from the Surveilens research paper) is an optimized framework designed to bypass the heavy computational requirements of Large Video Language Models (LVLMs). While standard LVLMs often require high-end enterprise GPUs, gUrrT is engineered to deliver high-accuracy video understanding on consumer grade hardware (e.g., 4GB VRAM) by decomposing video into its core sensory components.

The Philosophy: Pragmatic Decomposition With gUrrT, the goal isn't to reinvent the wheel or solve the complex "temporal dimension" problem that plagues modern AI. Instead, the project explores a critical question: Can we achieve "Video Understanding" simply by treating a video as a searchable collection of moments?

By bypassing the temporal modeling used in expensive LVLMs, gUrrT enables you to "talk to a video" by transforming it into a structured, queryable index. It gets the job done without the hefty compute tax.

The "Temporal Dimension" of video is computationally expensive to process directly. gUrrT shifts the paradigm from Video Modeling to Contextual Retrieval:

  • Vision Models (The Eyes): Describe discrete scenes and frames.
  • Transcription Models (The Ears): Process audio via Faster-Whisper.
  • Advanced Sampling: Intelligently reduces the frame-load to only what is relevant.
  • RAG (The Brain): Compiles these sensory inputs into a vector-based context for a Large Language Model (LLM).

The Technical Pipeline

  1. Dual-Stage Frame Sampling: * Scene Detection: The primary method, segmenting video into distinct events. For each scene, the pipeline captures the start, middle, and end frames.
  • Uniform Sampling: Acts as a robust fallback if no distinct scene transitions are detected.
  • Note: SSIM (Structural Similarity Index) was tested but discarded to prioritize processing speed.
  1. Multimodal Embedding: * Visuals are embedded using CLIP, and captions are generated via BLIP (though experimentation shows BLIPโ€™s limitations in context density).
  • Audio is processed via Faster-Whisper and stored in a separate vector collection.
  1. Inference & LLM Integration:
  • The system supports local execution via Ollama (Gemma 3 performs exceptionally well) and cloud-based inference via Groq (utilizing Llama 3-70B for high-reasoning tasks).
  1. Supermemory: * To prevent context "noise," the system utilizes a Supermemory feature that maintains a clean, video-specific context. It refreshes upon new video uploads to ensure response quality remains high and relevant to the current file.

Key Insights & Experimental Inferences

  • The "Captioning Bottleneck": The quality of the LLMโ€™s response is directly proportional to the quality of the image-to-text descriptions. Upgrading from BLIP to more descriptive captioning models remains a primary goal for improving context.
  • Model Scaling: Moving from Llama 3.1-8B to Llama 3-70B resulted in a phenomenal leap in performance. While the 8B model struggled with simple queries when fed BLIP data, the 70B model (and Gemma 3) demonstrated the "reasoning' necessary to synthesize poor context into accurate answers.
  • The Summary Challenge: While RAG excels at specific "needle-in-a-haystack" queries, generating holistic video summaries remains a challenge for vanilla RAG architectures.

Future Roadmap

I am looking into transitioning from Vanilla RAG to a Graph-based RAG or a Hierarchical RAG architecture. This would allow the system to understand the relationship between scenes over time, rather than treating them as isolated data points.

๐ŸŒฟ Quick Start Guide for pypi package

1. Installation

Set up gurrt using uv. Note: This project requires Python 3.12.

# 1. Install uv and set Python version
pip install uv
uv venv
uv python pin 3.12

# 2. Activate environment
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# 3. Install gurrt (Standard/CPU)
uv pip install gurrt

# 4. OR Install with GPU Support
uv pip install gurrt[cuda] --extra-index-url https://download.pytorch.org/whl/cu121

2. Commands

Command Description
gurrt init Configure API keys (Groq, Supermemory, Ollama).
gurrt models-download Download and cache AI models locally.
gurrt index <path> Extract frames and audio for search.
gurrt index-ollama <path> <model> Index using a specific Ollama model.
gurrt ask "<query>" Query your indexed video content.

The tool automatically optimizes performance by disabling unnecessary logging and tokenizer parallelism to ensure a clean CLI experience yet some logs do appear of Moviepy will resolve it in future iterations.


Architecture Overview

Video
  โ”‚
  โ”œโ”€โ”€ Smart Frame Extraction
  โ”‚     โ””โ”€โ”€ Captioning + Embeddings
  โ”‚
  โ”œโ”€โ”€ Audio Extraction
  โ”‚     โ””โ”€โ”€ Speech-to-Text + Embeddings
  โ”‚
  โ”œโ”€โ”€ Vector Memory Store
  โ”‚
  โ”œโ”€โ”€ Supermemory (Persistent Conversation Layer)
  โ”‚
  โ””โ”€โ”€ LLM Reasoning Engine

Project Setup (using uv)

# Install uv if you haven't already
pip install uv

# Sync dependencies
uv sync

# Activate environment
.venv\Scripts\activate

File Structure

gurrt/
โ”œโ”€โ”€ src/
โ”‚   |
โ”‚   โ”‚
โ”‚   โ””โ”€โ”€ videorag/                      # Core Video-RAG application package
โ”‚       โ”‚
โ”‚       โ”œโ”€โ”€ api/
โ”‚       โ”‚   โ””โ”€โ”€ server.py              # API server (exposes endpoints for querying, ingestion, etc.)
โ”‚       โ”‚
โ”‚       โ”œโ”€โ”€ cli/
โ”‚       โ”‚   โ””โ”€โ”€ main.py                # CLI entry point (init, ingest, query commands)
โ”‚       โ”‚
โ”‚       โ”œโ”€โ”€ config/
โ”‚       โ”‚   โ””โ”€โ”€ config.py              # Configuration management (API keys, paths, environment setup)
โ”‚       โ”‚
โ”‚       โ”œโ”€โ”€ core/                      # Core intelligence pipeline
โ”‚       โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚       โ”‚   โ”œโ”€โ”€ asr.py                 # Audio extraction + speech-to-text processing
โ”‚       โ”‚   โ”œโ”€โ”€ embedding.py           # Embedding generation for captions & transcripts
โ”‚       โ”‚   โ”œโ”€โ”€ llm.py                 # LLM interaction and reasoning logic
โ”‚       โ”‚   โ”œโ”€โ”€ models.py              # Model loading and management utilities
โ”‚       โ”‚   โ”œโ”€โ”€ pipeline.py            # End-to-end ingestion + query pipeline orchestration
โ”‚       โ”‚   โ”œโ”€โ”€ prompts.py             # Prompt templates and structured context injection
โ”‚       โ”‚   โ”œโ”€โ”€ search.py              # Retrieval logic (semantic search over stored embeddings)
โ”‚       โ”‚   โ””โ”€โ”€ vectordb.py            # Vector database interface and storage abstraction
โ”‚       โ”‚
โ”‚       โ””โ”€โ”€ utils/
โ”‚           โ””โ”€โ”€ utils.py            # Shared utility functions and helpers
โ”‚
โ””โ”€โ”€ README.md                         # Project documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gurrt-1.0.4.tar.gz (20.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gurrt-1.0.4-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file gurrt-1.0.4.tar.gz.

File metadata

  • Download URL: gurrt-1.0.4.tar.gz
  • Upload date:
  • Size: 20.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.4

File hashes

Hashes for gurrt-1.0.4.tar.gz
Algorithm Hash digest
SHA256 88cb22763e6814f77020228dcf969e007b256ec6124ab179a06c203a093d38ca
MD5 0541dc1c74d748b5d5a17ac60210eb97
BLAKE2b-256 bc6f7534b090a040bd7b0c2286006c64ba7d14a11040f63908d91a34b1e6082f

See more details on using hashes here.

File details

Details for the file gurrt-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: gurrt-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 19.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.4

File hashes

Hashes for gurrt-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 b56e2fef65875151cfe747d40308e702b85f0f6208f57859c5cc9437f7c64569
MD5 cf84e771a65a8b90ee8972ce797963f1
BLAKE2b-256 98efb1f9ab70a18caa02507fd2414e25ad191bd6a9c77a798f7ca468cd7eeb46

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page