Skip to main content

The code used to train and run inference with the ColPali architecture.

Project description

ColPali: Efficient Document Retrieval with Vision Language Models 👀

arXiv GitHub Hugging Face

Test Version Downloads


[Model card] [ViDoRe Leaderboard] [Demo] [Blog Post]

[!TIP] For production usage in your RAG pipelines, we recommend using the byaldi package, which is a lightweight wrapper around the colpali-engine package developed by the author of the popular RAGatouille repostiory. 🐭

Associated Paper

This repository contains the code used for training the vision retrievers in the ColPali: Efficient Document Retrieval with Vision Language Models paper. In particular, it contains the code for training the ColPali model, which is a vision retriever based on the ColBERT architecture and the PaliGemma model.

Introduction

With our new model ColPali, we propose to leverage VLMs to construct efficient multi-vector embeddings in the visual space for document retrieval. By feeding the ViT output patches from PaliGemma-3B to a linear projection, we create a multi-vector representation of documents. We train the model to maximize the similarity between these document embeddings and the query embeddings, following the ColBERT method.

Using ColPali removes the need for potentially complex and brittle layout recognition and OCR pipelines with a single model that can take into account both the textual and visual content (layout, charts, ...) of a document.

ColPali Architecture

Setup

We used Python 3.11.6 and PyTorch 2.2.2 to train and test our models, but the codebase is compatible with Python >=3.9 and recent PyTorch versions. To install the package, run:

pip install colpali-engine

[!WARNING] For ColPali versions above v1.0, make sure to install the colpali-engine package from source or with a version above v0.2.0.

Usage

Quick start

import torch
from PIL import Image

from colpali_engine.models import ColPali, ColPaliProcessor

model_name = "vidore/colpali-v1.2"

model = ColPali.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="cuda:0",  # or "mps" if on Apple Silicon
)

processor = ColPaliProcessor.from_pretrained(model_name)

# Your inputs
images = [
    Image.new("RGB", (32, 32), color="white"),
    Image.new("RGB", (16, 16), color="black"),
]
queries = [
    "Is attention really all you need?",
    "Are Benjamin, Antoine, Merve, and Jo best friends?",
]

# Process the inputs
batch_images = processor.process_images(images).to(model.device)
batch_queries = processor.process_queries(queries).to(model.device)

# Forward pass
with torch.no_grad():
    image_embeddings = model(**batch_images)
    query_embeddings = model(**batch_queries)

scores = processor.score_multi_vector(query_embeddings, image_embeddings)

Inference

You can find an example here. If you need an indexing system, we recommend using byaldi - RAGatouille's little sister 🐭 - which share a similar API and leverages our colpali-engine package.

Benchmarking

To benchmark ColPali to reproduce the results on the ViDoRe leaderboard, it is recommended to use the vidore-benchmark package.

Training

To keep a lightweight repository, only the essential packages were installed. In particular, you must specify the dependencies to use the training script for ColPali. You can do this using the following command:

pip install "colpali-engine[train]"

All the model configs used can be found in scripts/configs/ and rely on the configue package for straightforward configuration. They should be used with the train_colbert.py script.

Example 1: Local training

USE_LOCAL_DATASET=0 python scripts/train/train_colbert.py scripts/configs/pali/train_colpali_docmatix_hardneg_model.yaml

or using accelerate:

accelerate launch scripts/train/train_colbert.py scripts/configs/pali/train_colpali_docmatix_hardneg_model.yaml

Example 2: Training on a SLURM cluster

sbatch --nodes=1 --cpus-per-task=16 --mem-per-cpu=32GB --time=20:00:00 --gres=gpu:1  -p gpua100 --job-name=colidefics --output=colidefics.out --error=colidefics.err --wrap="accelerate launch scripts/train/train_colbert.py scripts/configs/pali/train_colpali_docmatix_hardneg_model.yaml"

sbatch --nodes=1  --time=5:00:00 -A cad15443 --gres=gpu:8  --constraint=MI250 --job-name=colpali --wrap="python scripts/train/train_colbert.py scripts/configs/pali/train_colpali_docmatix_hardneg_model.yaml"

Paper result reproduction

To reproduce the results from the paper, you should checkout to the v0.1.1 tag or install the corresponding colpali-engine package release using:

pip install colpali-engine==0.1.1

Citation

ColPali: Efficient Document Retrieval with Vision Language Models

Authors: Manuel Faysse*, Hugues Sibille*, Tony Wu*, Bilel Omrani, Gautier Viaud, Céline Hudelot, Pierre Colombo (* denotes equal contribution)

@misc{faysse2024colpaliefficientdocumentretrieval,
      title={ColPali: Efficient Document Retrieval with Vision Language Models}, 
      author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo},
      year={2024},
      eprint={2407.01449},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2407.01449}, 
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

colpali_engine-0.3.1.tar.gz (117.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

colpali_engine-0.3.1-py3-none-any.whl (32.2 kB view details)

Uploaded Python 3

File details

Details for the file colpali_engine-0.3.1.tar.gz.

File metadata

  • Download URL: colpali_engine-0.3.1.tar.gz
  • Upload date:
  • Size: 117.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for colpali_engine-0.3.1.tar.gz
Algorithm Hash digest
SHA256 d8460c1afa4132a9369e0ee51b27b81b3a52caf8fad8980c0225250dadbb0def
MD5 69a5428bcd598f29ea428ce93f05a766
BLAKE2b-256 d4ba3250821d21e1b018f4bfe70b5d65e3a1da76d573e43da696aad296066e7b

See more details on using hashes here.

File details

Details for the file colpali_engine-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: colpali_engine-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 32.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for colpali_engine-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c11f8537236dc8b113842785147b0be5f5646a2683a0f08980ffdbfeed690e8b
MD5 483f54cf58d21cb05eefbeb9f75c773b
BLAKE2b-256 3de05116dc298ea7776bc5819f6cac0d4f66aa9c1a4588753cc511c956e98731

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page