Skip to main content

The code used to train and run inference with the ColPali architecture.

Project description

ColPali: Efficient Document Retrieval with Vision Language Models 👀

arXiv GitHub Hugging Face

[Model card] [ViDoRe Benchmark] [ViDoRe Leaderboard] [Demo] [Blog Post]

[!TIP] For production usage in your RAG pipelines, we recommend using the byaldi package, which is a lightweight wrapper around the colpali-engine package developed by the author of the popular RAGatouille repostiory. 🐭

Associated Paper

This repository contains the code used for training the vision retrievers in the ColPali: Efficient Document Retrieval with Vision Language Models paper. In particular, it contains the code for training the ColPali model, which is a vision retriever based on the ColBERT architecture.

Setup

We used Python 3.11.6 and PyTorch 2.2.2 to train and test our models, but the codebase is compatible with Python >=3.9 and recent PyTorch versions.

The eval codebase depends on a few Python packages, which can be downloaded using the following command:

pip install colpali-engine

[!WARNING] For ColPali versions above v1.0, make sure to install the colpali-engine package from source or with a version above v0.2.0.

Usage

Inference

This repository doesn't contain the code to run optimized retrieval for RAG pipelines. For this, we recommend using byaldi - RAGatouille's little sister 🐭 - which share a similar API and leverages our colpali-engine package.

Benchmarking

To benchmark ColPali to reproduce the results on the ViDoRe leaderboard, it is recommended to use the vidore-benchmark package.

Training

To keep a lightweight repository, only the essential packages were installed. In particular, you must specify the dependencies to use the training script for ColPali. You can do this using the following command:

pip install "colpali-engine[train]"

All the model configs used can be found in scripts/configs/ and rely on the configue package for straightforward configuration. They should be used with the train_colbert.py script.

Example 1: Local training

USE_LOCAL_DATASET=0 python scripts/train/train_colbert.py scripts/configs/pali/train_colpali_docmatix_hardneg_model.yaml

or using accelerate:

accelerate launch scripts/train/train_colbert.py scripts/configs/pali/train_colpali_docmatix_hardneg_model.yaml

Example 2: Training on a SLURM cluster

sbatch --nodes=1 --cpus-per-task=16 --mem-per-cpu=32GB --time=20:00:00 --gres=gpu:1  -p gpua100 --job-name=colidefics --output=colidefics.out --error=colidefics.err --wrap="accelerate launch scripts/train/train_colbert.py scripts/configs/pali/train_colpali_docmatix_hardneg_model.yaml"

sbatch --nodes=1  --time=5:00:00 -A cad15443 --gres=gpu:8  --constraint=MI250 --job-name=colpali --wrap="python scripts/train/train_colbert.py scripts/configs/pali/train_colpali_docmatix_hardneg_model.yaml"

Paper result reproduction

To reproduce the results from the paper, you should checkout to the v0.1.1 tag or install the corresponding colpali-engine package release using:

pip install colpali-engine==0.1.1

Citation

ColPali: Efficient Document Retrieval with Vision Language Models

Authors: Manuel Faysse*, Hugues Sibille*, Tony Wu*, Bilel Omrani, Gautier Viaud, Céline Hudelot, Pierre Colombo

(* Denotes Equal Contribution)

@misc{faysse2024colpaliefficientdocumentretrieval,
      title={ColPali: Efficient Document Retrieval with Vision Language Models}, 
      author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo},
      year={2024},
      eprint={2407.01449},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2407.01449}, 
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

colpali_engine-0.3.0.tar.gz (26.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

colpali_engine-0.3.0-py3-none-any.whl (27.8 kB view details)

Uploaded Python 3

File details

Details for the file colpali_engine-0.3.0.tar.gz.

File metadata

  • Download URL: colpali_engine-0.3.0.tar.gz
  • Upload date:
  • Size: 26.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for colpali_engine-0.3.0.tar.gz
Algorithm Hash digest
SHA256 e0290ebaf48a1a58124f8323e5e1cf23993e0d5e0415f7e2af2835a8087ec97d
MD5 eb2a9e2a1d575eea18ddfa52d35fde80
BLAKE2b-256 0478438b9cdc55df572e52b5d799b418ef86ceebfd82b317d3df8b18dc5366bc

See more details on using hashes here.

File details

Details for the file colpali_engine-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: colpali_engine-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 27.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for colpali_engine-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a16ff8734c1961669624c3e2ede3164cd05f1d57282fcec747878cf7b94fe143
MD5 d4bfc8bb17e7b6806fe5bd99c5d3e0c7
BLAKE2b-256 498c840214ebde7d474d4e0e95713f96a5cbe34de390e1fed2bbb9ffbb52b724

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page