Skip to main content

Multi-modal video analysis with AI-powered RAG

Project description

Mini VideoRAG

Multi-modal mini-video analysis framework to not spend to much tokens on your videos analysis tasks 🤗

Installation

git clone https://github.com/bdallard/mini-videorag
cd mini-videorag

# Install everything (recommended - unified installation)
pip install -e ".[all]"

Environment Setup

cp .env.example .env

Required variables:

  • OPENAI_API_KEY - For transcription and VLM queries
  • STORAGE_TYPE=minio - Object storage backend (default minio)
  • STORAGE_ENDPOINT_URL=http://localhost:9000
  • LLM_MODEL=openai/gpt-4o-mini - LLM provider for RAG

Quickstart

# Start the stack using docker 
make docker-up

Then you can query videos in natural language with 100+ LLM provider support via LiteLLM.

from mini_videorag.utils.config_loader import load_video_rag_config, create_pipeline_from_config
from mini_videorag.core.video_rag import VideoRAG
from mini_videorag.core import VideoRAGSession
from pydantic import BaseModel, Field

class DetailedAnswer(BaseModel):
    answer: str
    confidence_score: float = Field(ge=0.0, le=1.0)
    reasoning: str
    sources: list[str] = Field(default_factory=list)

config = load_video_rag_config()
pipeline = create_pipeline_from_config(config)

# Option 1: Context manager (automatic cleanup) for scripts
with VideoRAG(pipeline, output_schema=DetailedAnswer) as rag:
    rag.init("video.mp4", num_frames=10)
    answer = rag.ask("Does this have subtitles?")
    print(f"{answer.answer} (confidence: {answer.confidence_score})")

# Option 2: Session (manual control) for services/APIs
session = VideoRAGSession(pipeline, output_schema=DetailedAnswer)
session.initialize("video.mp4", num_frames=10)
answer = session.ask("Does this have subtitles?")
print(f"{answer.answer}\nReasoning: {answer.reasoning}")
session.cleanup()

Features

  • Multi-modal processors: Transcription, OCR, object/person detection, brand detection, NSFW detection, music recognition
  • DAG workflow engine: Configure processor dependencies and execution order
  • Type-safe config: Pydantic models with validation
  • Pluggable architecture: Custom processors via factory pattern, entry points, or YAML config
  • 100+ LLM providers: OpenAI, Anthropic, Ollama via LiteLLM
  • Production-ready: Thread-safe, resource cleanup, error handling

Extending with Custom Processors

Three methods to add custom processors without forking:

1. Runtime Registration

from mini_videorag.processors import ProcessorFactory

factory = ProcessorFactory()
factory.register_processor(
    processor_type="scene_detection",
    provider="opencv",
    class_path="my_package.processors:SceneDetector",
    set_as_default=True
)

2. Plugin System (Entry Points)

# pyproject.toml
[project.entry-points."mini_videorag.processors"]
scene_detection.opencv = "my_package.processors:SceneDetector"

3. YAML Configuration

# config/video_rag_config.yml
custom_processors:
  - processor_type: scene_detection
    provider: opencv
    class_path: "my_package.processors:SceneDetector"
    default: true

See CONTRIBUTING.md for detailed plugin development guide.

Workflow Configuration

Edit config/video_rag_config.yml to control processor execution dependencies.

Processor Purpose Dependencies
transcription Speech-to-text (Whisper) OPENAI_API_KEY
frame_extraction Extract frames (OpenCV) -
music_detection Music recognition (Shazam) -
ocr Text extraction (EasyOCR) frame_extraction*
person_detection Detect people (YOLO) frame_extraction*
object_detection Detect objects (YOLO) frame_extraction*
brand_detection Detect brands (HF) frame_extraction*
content_safety NSFW detection (HF) frame_extraction*
subtitle_check Subtitle presence (N-gram) ocr, transcription

* Optional dependency - processors can run independently but benefit from shared frame extraction

Temporal Workflows

Distributed video processing with fault tolerance, retries, and progress tracking via Temporal.

Prerequisites

  • Temporal server running (via Docker or cloud)
  • MinIO/S3 for video storage (optional, supports local paths too)
# Start Temporal + MinIO
docker-compose -f docker-compose.prod.yml up

or you can start the worker manually with your settings like below :

python -m mini_videorag.temporal.run_worker

# With custom settings
TEMPORAL_URL=localhost:7233 \
TASK_QUEUE=video-processing \
MAX_CONCURRENT_ACTIVITIES=10 \
python -m mini_videorag.temporal.run_worker

then start the API for triggering workflows and querying results

python -m mini_videorag --host 0.0.0.0 --port 8000

Swagger UI available at http://localhost:8000/docs.

API Endpoints

Endpoint Method Purpose
/storage/upload POST Upload video to MinIO/S3
/storage/objects GET List stored objects
/storage/url/{key} GET Get presigned download URL
/workflows/analyze POST Start video analysis workflow
/workflows/{id}/progress GET Query workflow progress
/workflows/{id}/result GET Get final result

Configuration

Temporal settings in config/video_rag_config.yml:

temporal:
  activity:
    start_to_close_timeout_minutes: 30
    heartbeat_timeout_minutes: 10
    processor_overrides:
      ocr:
        heartbeat_timeout_minutes: 15
    retry:
      maximum_attempts: 3
      non_retryable_errors: [ModelInitError, VideoLoadError]
  workflow:
    execution_timeout_minutes: 120

The same workflow: DAG configuration drives both local ProcessorPipeline and Temporal execution.

Monitoring

Enable Prometheus metrics on the worker:

PROMETHEUS_ENABLED=true PROMETHEUS_PORT=9091 python -m mini_videorag.temporal.run_worker

Access metrics at http://localhost:9091/metrics or Prometheus UI at http://localhost:9090 (when using docker-compose).


Testing

pip install -e ".[dev]"

# Fast mode (skip model downloads and heavy tasks) - recommended for dev & ci 
make test-ci
pytest -m "not requires_download"

# full test suite
pytest

Markers: unit, integration, slow, requires_download

Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mini_videorag-0.1.0.tar.gz (113.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mini_videorag-0.1.0-py3-none-any.whl (146.0 kB view details)

Uploaded Python 3

File details

Details for the file mini_videorag-0.1.0.tar.gz.

File metadata

  • Download URL: mini_videorag-0.1.0.tar.gz
  • Upload date:
  • Size: 113.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mini_videorag-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b646714a4197fab10d775698de44bfc5975e616abed319803b6e3983d19f9955
MD5 a042d71f29b2a5df5432108b8e4c9b7b
BLAKE2b-256 8eb58d89f9f52dcb84d53d207899fc0c83dfea894ec57149122df57f13bb9c37

See more details on using hashes here.

File details

Details for the file mini_videorag-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mini_videorag-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 146.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mini_videorag-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d89deced3617dd4de78e3f98830f326426ba474fadeac8c4d57575886c59dca2
MD5 fbbf075301d3c6bb757175e61389e1b1
BLAKE2b-256 4d5229cf557a51ac98995a715d02f4ab5cb8e410510a1529b33bf125a953b047

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page