The code used to train and run inference with the ColPali architecture.
Project description
ColPali: Efficient Document Retrieval with Vision Language Models 👀
[Model card] [ViDoRe Benchmark] [ViDoRe Leaderboard] [Demo] [Blog Post]
[!TIP] For production usage in your RAG pipelines, we recommend using the
byaldi
package, which is a lightweight wrapper around thecolpali-engine
package developed by the author of the popular RAGatouille repostiory. 🐭
Associated Paper
This repository contains the code used for training the vision retrievers in the ColPali: Efficient Document Retrieval with Vision Language Models paper. In particular, it contains the code for training the ColPali model, which is a vision retriever based on the ColBERT architecture.
Setup
We used Python 3.11.6 and PyTorch 2.2.2 to train and test our models, but the codebase is compatible with Python >=3.9 and recent PyTorch versions.
The eval codebase depends on a few Python packages, which can be downloaded using the following command:
pip install colpali-engine
[!WARNING] For ColPali versions above v1.0, make sure to install the
colpali-engine
package from source or with a version above v0.2.0.
Usage
Inference
This repository doesn't contain the code to run optimized retrieval for RAG pipelines. For this, we recommend using byaldi
- RAGatouille's little sister 🐭 - which share a similar API and leverages our colpali-engine
package.
Benchmarking
To benchmark ColPali to reproduce the results on the ViDoRe leaderboard, it is recommended to use the vidore-benchmark
package.
Training
To keep a lightweight repository, only the essential packages were installed. In particular, you must specify the dependencies to use the training script for ColPali. You can do this using the following command:
pip install "colpali-engine[train]"
All the model configs used can be found in scripts/configs/
and rely on the configue package for straightforward configuration. They should be used with the train_colbert.py
script.
Example 1: Local training
USE_LOCAL_DATASET=0 python scripts/train/train_colbert.py scripts/configs/pali/train_colpali_docmatix_hardneg_model.yaml
or using accelerate
:
accelerate launch scripts/train/train_colbert.py scripts/configs/pali/train_colpali_docmatix_hardneg_model.yaml
Example 2: Training on a SLURM cluster
sbatch --nodes=1 --cpus-per-task=16 --mem-per-cpu=32GB --time=20:00:00 --gres=gpu:1 -p gpua100 --job-name=colidefics --output=colidefics.out --error=colidefics.err --wrap="accelerate launch scripts/train/train_colbert.py scripts/configs/pali/train_colpali_docmatix_hardneg_model.yaml"
sbatch --nodes=1 --time=5:00:00 -A cad15443 --gres=gpu:8 --constraint=MI250 --job-name=colpali --wrap="python scripts/train/train_colbert.py scripts/configs/pali/train_colpali_docmatix_hardneg_model.yaml"
Paper result reproduction
To reproduce the results from the paper, you should checkout to the v0.1.1
tag or install the corresponding colpali-engine
package release using:
pip install colpali-engine==0.1.1
Citation
ColPali: Efficient Document Retrieval with Vision Language Models
Authors: Manuel Faysse*, Hugues Sibille*, Tony Wu*, Bilel Omrani, Gautier Viaud, Céline Hudelot, Pierre Colombo
(* Denotes Equal Contribution)
@misc{faysse2024colpaliefficientdocumentretrieval,
title={ColPali: Efficient Document Retrieval with Vision Language Models},
author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo},
year={2024},
eprint={2407.01449},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2407.01449},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for colpali_engine-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a16ff8734c1961669624c3e2ede3164cd05f1d57282fcec747878cf7b94fe143 |
|
MD5 | d4bfc8bb17e7b6806fe5bd99c5d3e0c7 |
|
BLAKE2b-256 | 498c840214ebde7d474d4e0e95713f96a5cbe34de390e1fed2bbb9ffbb52b724 |