Skip to main content

End-to-end visual document retrieval with ColPali, featuring two-stage pooling for scalable search

Project description

Visual RAG Toolkit

PyPI version CI License: MIT Python 3.9+

End-to-end visual document retrieval toolkit featuring fast multi-stage retrieval (prefetch with pooled vectors + exact MaxSim reranking).

This repo contains:

  • a Python package (visual_rag)
  • a Streamlit demo app (demo/)
  • benchmark & evaluation scripts for ViDoRe v2 (benchmarks/)

๐ŸŽฏ Key Features

  • Modular: PDF โ†’ images, embedding, Qdrant indexing, retrieval can be used independently.
  • Multi-stage retrieval: two-stage and three-stage retrieval modes built for Qdrant named vectors.
  • Model-aware embedding: ColSmol + ColPali support behind a single VisualEmbedder interface.
  • Token hygiene: query special-token filtering by default for more stable MaxSim behavior.
  • Practical pipelines: robust indexing, retries, optional Cloudinary image URLs, evaluation reporting.

๐Ÿ“ฆ Installation

# Core package (minimal dependencies)
pip install visual-rag-toolkit

# With specific features
pip install visual-rag-toolkit[embedding]    # ColSmol/ColPali embedding support
pip install visual-rag-toolkit[pdf]          # PDF processing
pip install visual-rag-toolkit[qdrant]       # Vector database
pip install visual-rag-toolkit[cloudinary]   # Image CDN
pip install visual-rag-toolkit[ui]           # Streamlit demo dependencies

# All dependencies
pip install visual-rag-toolkit[all]

System dependencies (PDF)

pdf2image requires Poppler.

  • macOS: brew install poppler
  • Ubuntu/Debian: sudo apt-get update && sudo apt-get install -y poppler-utils

๐Ÿš€ Quick Start

Minimal: embed a query and run two-stage search (server-side)

from qdrant_client import QdrantClient
from visual_rag import VisualEmbedder, TwoStageRetriever

client = QdrantClient(url="https://YOUR_QDRANT", api_key="YOUR_KEY")
collection_name = "your_collection"

# Embed query tokens
embedder = VisualEmbedder(model_name="vidore/colpali-v1.3")
q = embedder.embed_query("What is the budget allocation?")

# Fast path: all stages computed in Qdrant (prefetch + exact rerank)
retriever = TwoStageRetriever(client, collection_name)
results = retriever.search_server_side(
    query_embedding=q,
    top_k=10,
    prefetch_k=256,
    stage1_mode="tokens_vs_experimental",  # or: tokens_vs_tiles / pooled_query_vs_tiles / pooled_query_vs_global
)

for r in results[:3]:
    print(r["id"], r["score_final"])

Process a PDF into images (no embedding, no vector DB)

from pathlib import Path
from visual_rag import PDFProcessor

processor = PDFProcessor(dpi=140)
images, texts = processor.process_pdf(Path("report.pdf"))
print(len(images), "pages")

๐Ÿ”ฌ Multi-stage Retrieval (Two-stage / Three-stage)

Traditional ColBERT-style MaxSim scoring compares all query tokens vs all document tokens, which becomes expensive at scale.

Our approach:

Stage 1: Fast prefetch with tile-level pooled vectors
         โ”œโ”€โ”€ Pool each tile (64 patches) โ†’ num_tiles vectors
         โ”œโ”€โ”€ Use HNSW index for O(log N) retrieval  
         โ””โ”€โ”€ Retrieve top-K candidates (e.g., 200)

Stage 2: Exact MaxSim reranking on candidates
         โ”œโ”€โ”€ Load full multi-vector embeddings
         โ”œโ”€โ”€ Compute exact ColBERT MaxSim scores
         โ””โ”€โ”€ Return top-k results (e.g., 10)

Three-stage extends this with an additional โ€œcheap prefetchโ€ stage before stage 2.

๐Ÿ“ Package Structure

visual-rag-toolkit/
โ”œโ”€โ”€ visual_rag/              # Import as: from visual_rag import ...
โ”‚   โ”œโ”€โ”€ embedding/           # VisualEmbedder, pooling functions
โ”‚   โ”œโ”€โ”€ indexing/            # PDFProcessor, QdrantIndexer, CloudinaryUploader
โ”‚   โ”œโ”€โ”€ retrieval/           # TwoStageRetriever
โ”‚   โ”œโ”€โ”€ visualization/       # Saliency maps
โ”‚   โ”œโ”€โ”€ cli/                 # Command-line: visual-rag process/search
โ”‚   โ””โ”€โ”€ config.py            # load_config, get, get_section
โ”‚
โ”œโ”€โ”€ benchmarks/              # ViDoRe evaluation scripts
โ””โ”€โ”€ examples/                # Usage examples

โš™๏ธ Configuration

Configure via environment variables or YAML:

# Qdrant credentials (preferred names used by the demo + scripts)
export SIGIR_QDRANT_URL="https://your-cluster.qdrant.io"
export SIGIR_QDRANT_KEY="your-api-key"

# Backwards-compatible fallbacks (also supported)
export QDRANT_URL="https://your-cluster.qdrant.io"
export QDRANT_API_KEY="your-api-key"

export VISUALRAG_MODEL="vidore/colSmol-500M"

# Special token handling (default: filter them out)
export VISUALRAG_INCLUDE_SPECIAL_TOKENS=true  # Include special tokens

Or use a config file (visual_rag.yaml):

model:
  name: "vidore/colSmol-500M"
  batch_size: 4
  
qdrant:
  url: "https://your-cluster.qdrant.io"
  collection: "my_documents"
  
search:
  strategy: "two_stage"  # or "multi_vector", "pooled"
  prefetch_k: 200
  top_k: 10

๐Ÿ–ฅ๏ธ Demo (Streamlit)

pip install "visual-rag-toolkit[ui,qdrant,embedding,pdf]"

# Option A: from Python
python -c "import visual_rag; visual_rag.demo()"

# Option B: CLI launcher
visual-rag-demo

๐Ÿ“Š Benchmark Evaluation

Run ViDoRe benchmark evaluation:

# Example: evaluate a collection against ViDoRe BEIR datasets in Qdrant
python -m benchmarks.vidore_beir_qdrant.run_qdrant_beir \
  --datasets vidore/esg_reports_v2 vidore/biomedical_lectures_v2 \
  --collection YOUR_COLLECTION \
  --mode two_stage \
  --stage1-mode tokens_vs_experimental \
  --prefetch-k 256 \
  --top-k 100 \
  --evaluation-scope union

More commands (including multi-stage variants and cropping configs) live in:

  • benchmarks/vidore_tatdqa_test/COMMANDS.md

๐Ÿ”ง Development

git clone https://github.com/Ara-Yeroyan/visual-rag-toolkit
cd visual-rag-toolkit
pip install -e ".[dev]"
pytest tests/ -v

๐Ÿ“„ Citation

If you use this toolkit in your research, please cite:

@software{visual_rag_toolkit,
  title = {Visual RAG Toolkit: Scalable Visual Document Retrieval with Two-Stage Pooling},
  author = {Ara Yeroyan},
  year = {2026},
  url = {https://github.com/Ara-Yeroyan/visual-rag-toolkit}
}

๐Ÿ“ License

MIT License - see LICENSE for details.

๐Ÿ™ Acknowledgments

  • Qdrant - Vector database with multi-vector support
  • ColPali - Visual document retrieval models
  • ViDoRe - Benchmark dataset

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

visual_rag_toolkit-0.1.1.tar.gz (120.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

visual_rag_toolkit-0.1.1-py3-none-any.whl (142.6 kB view details)

Uploaded Python 3

File details

Details for the file visual_rag_toolkit-0.1.1.tar.gz.

File metadata

  • Download URL: visual_rag_toolkit-0.1.1.tar.gz
  • Upload date:
  • Size: 120.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for visual_rag_toolkit-0.1.1.tar.gz
Algorithm Hash digest
SHA256 3fc9be8d1e00eaf33861b0b9ed8ec86186f533400359420c7c67f5fa18fd2b28
MD5 a7a66f99fd053ae7b1aae9f0ad311e2e
BLAKE2b-256 4df8830e63b88ac26df2ab393e44a44a0363527727f62d2105861b746e77d21d

See more details on using hashes here.

Provenance

The following attestation bundles were made for visual_rag_toolkit-0.1.1.tar.gz:

Publisher: publish_pypi.yaml on Ara-Yeroyan/visual-rag-toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file visual_rag_toolkit-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for visual_rag_toolkit-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e5d8cf9a2477e4592d009c5b75e5422f4c3aa71deff14ff8bf616ca539be942e
MD5 ba406c80c1d784aefec37f8f04041897
BLAKE2b-256 43718a5419f0abe4e970981888cb7a56c2b79695cc25593fedc9a2fa430a0fb1

See more details on using hashes here.

Provenance

The following attestation bundles were made for visual_rag_toolkit-0.1.1-py3-none-any.whl:

Publisher: publish_pypi.yaml on Ara-Yeroyan/visual-rag-toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page