Multi-modal video analysis with AI-powered RAG
Project description
Mini VideoRAG
Multi-modal mini-video analysis framework to not spend to much tokens on your videos analysis tasks 🤗
Installation
git clone https://github.com/bdallard/mini-videorag
cd mini-videorag
# Install everything (recommended - unified installation)
pip install -e ".[all]"
Environment Setup
cp .env.example .env
Required variables:
OPENAI_API_KEY- For transcription and VLM queriesSTORAGE_TYPE=minio- Object storage backend (default minio)STORAGE_ENDPOINT_URL=http://localhost:9000LLM_MODEL=openai/gpt-4o-mini- LLM provider for RAG
Quickstart
# Start the stack using docker
make docker-up
Then you can query videos in natural language with 100+ LLM provider support via LiteLLM.
from mini_videorag.utils.config_loader import load_video_rag_config, create_pipeline_from_config
from mini_videorag.core.video_rag import VideoRAG
from mini_videorag.core import VideoRAGSession
from pydantic import BaseModel, Field
class DetailedAnswer(BaseModel):
answer: str
confidence_score: float = Field(ge=0.0, le=1.0)
reasoning: str
sources: list[str] = Field(default_factory=list)
config = load_video_rag_config()
pipeline = create_pipeline_from_config(config)
# Option 1: Context manager (automatic cleanup) for scripts
with VideoRAG(pipeline, output_schema=DetailedAnswer) as rag:
rag.init("video.mp4", num_frames=10)
answer = rag.ask("Does this have subtitles?")
print(f"{answer.answer} (confidence: {answer.confidence_score})")
# Option 2: Session (manual control) for services/APIs
session = VideoRAGSession(pipeline, output_schema=DetailedAnswer)
session.initialize("video.mp4", num_frames=10)
answer = session.ask("Does this have subtitles?")
print(f"{answer.answer}\nReasoning: {answer.reasoning}")
session.cleanup()
Features
- Multi-modal processors: Transcription, OCR, object/person detection, brand detection, NSFW detection, music recognition
- DAG workflow engine: Configure processor dependencies and execution order
- Type-safe config: Pydantic models with validation
- Pluggable architecture: Custom processors via factory pattern, entry points, or YAML config
- 100+ LLM providers: OpenAI, Anthropic, Ollama via LiteLLM
- Production-ready: Thread-safe, resource cleanup, error handling
Extending with Custom Processors
Three methods to add custom processors without forking:
1. Runtime Registration
from mini_videorag.processors import ProcessorFactory
factory = ProcessorFactory()
factory.register_processor(
processor_type="scene_detection",
provider="opencv",
class_path="my_package.processors:SceneDetector",
set_as_default=True
)
2. Plugin System (Entry Points)
# pyproject.toml
[project.entry-points."mini_videorag.processors"]
scene_detection.opencv = "my_package.processors:SceneDetector"
3. YAML Configuration
# config/video_rag_config.yml
custom_processors:
- processor_type: scene_detection
provider: opencv
class_path: "my_package.processors:SceneDetector"
default: true
See CONTRIBUTING.md for detailed plugin development guide.
Workflow Configuration
Edit config/video_rag_config.yml to control processor execution dependencies.
| Processor | Purpose | Dependencies |
|---|---|---|
transcription |
Speech-to-text (Whisper) | OPENAI_API_KEY |
frame_extraction |
Extract frames (OpenCV) | - |
music_detection |
Music recognition (Shazam) | - |
ocr |
Text extraction (EasyOCR) | frame_extraction* |
person_detection |
Detect people (YOLO) | frame_extraction* |
object_detection |
Detect objects (YOLO) | frame_extraction* |
brand_detection |
Detect brands (HF) | frame_extraction* |
content_safety |
NSFW detection (HF) | frame_extraction* |
subtitle_check |
Subtitle presence (N-gram) | ocr, transcription |
* Optional dependency - processors can run independently but benefit from shared frame extraction
Temporal Workflows
Distributed video processing with fault tolerance, retries, and progress tracking via Temporal.
Prerequisites
- Temporal server running (via Docker or cloud)
- MinIO/S3 for video storage (optional, supports local paths too)
# Start Temporal + MinIO
docker-compose -f docker-compose.prod.yml up
or you can start the worker manually with your settings like below :
python -m mini_videorag.temporal.run_worker
# With custom settings
TEMPORAL_URL=localhost:7233 \
TASK_QUEUE=video-processing \
MAX_CONCURRENT_ACTIVITIES=10 \
python -m mini_videorag.temporal.run_worker
then start the API for triggering workflows and querying results
python -m mini_videorag --host 0.0.0.0 --port 8000
Swagger UI available at http://localhost:8000/docs.
API Endpoints
| Endpoint | Method | Purpose |
|---|---|---|
/storage/upload |
POST | Upload video to MinIO/S3 |
/storage/objects |
GET | List stored objects |
/storage/url/{key} |
GET | Get presigned download URL |
/workflows/analyze |
POST | Start video analysis workflow |
/workflows/{id}/progress |
GET | Query workflow progress |
/workflows/{id}/result |
GET | Get final result |
Configuration
Temporal settings in config/video_rag_config.yml:
temporal:
activity:
start_to_close_timeout_minutes: 30
heartbeat_timeout_minutes: 10
processor_overrides:
ocr:
heartbeat_timeout_minutes: 15
retry:
maximum_attempts: 3
non_retryable_errors: [ModelInitError, VideoLoadError]
workflow:
execution_timeout_minutes: 120
The same workflow: DAG configuration drives both local ProcessorPipeline and Temporal execution.
Monitoring
Enable Prometheus metrics on the worker:
PROMETHEUS_ENABLED=true PROMETHEUS_PORT=9091 python -m mini_videorag.temporal.run_worker
Access metrics at http://localhost:9091/metrics or Prometheus UI at http://localhost:9090 (when using docker-compose).
Testing
pip install -e ".[dev]"
# Fast mode (skip model downloads and heavy tasks) - recommended for dev & ci
make test-ci
pytest -m "not requires_download"
# full test suite
pytest
Markers: unit, integration, slow, requires_download
Documentation
- CLAUDE.md - AI/Developer guide and architecture
- CONTRIBUTING.md - development guide
- demo_video_rag.ipynb - notebook example
- tests/TESTING_GUIDE.md - testing guide
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mini_videorag-0.1.0.tar.gz.
File metadata
- Download URL: mini_videorag-0.1.0.tar.gz
- Upload date:
- Size: 113.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b646714a4197fab10d775698de44bfc5975e616abed319803b6e3983d19f9955
|
|
| MD5 |
a042d71f29b2a5df5432108b8e4c9b7b
|
|
| BLAKE2b-256 |
8eb58d89f9f52dcb84d53d207899fc0c83dfea894ec57149122df57f13bb9c37
|
File details
Details for the file mini_videorag-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mini_videorag-0.1.0-py3-none-any.whl
- Upload date:
- Size: 146.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d89deced3617dd4de78e3f98830f326426ba474fadeac8c4d57575886c59dca2
|
|
| MD5 |
fbbf075301d3c6bb757175e61389e1b1
|
|
| BLAKE2b-256 |
4d5229cf557a51ac98995a715d02f4ab5cb8e410510a1529b33bf125a953b047
|