Skip to main content

Code used to train ColPali

Project description

ColPali: Efficient Document Retrieval with Vision Language Models 👀

arXiv GitHub Hugging Face

[Model card] [ViDoRe Benchmark] [ViDoRe Leaderboard] [Demo] [Blog Post]

[!TIP] If you want to try the pre-trained ColPali on your own documents, you can use the vidore-benchmark repository. It comes with a Python package and a CLI tool for convenient evaluation. You can also use code provided in the model cards on the hub.

Associated Paper

ColPali: Efficient Document Retrieval with Vision Language Models Manuel Faysse*, Hugues Sibille*, Tony Wu* Bilel Omrani, Gautier Viaud, Céline Hudelot, Pierre Colombo (*Equal Contribution)

This repository contains the code used for training the vision retrievers in the paper. In particular, it contains the code for training the ColPali model, which is a vision retriever based on the ColBERT architecture.

Setup

We used Python 3.11.6 and PyTorch 2.2.2 to train and test our models, but the codebase is expected to be compatible with Python >=3.9 and recent PyTorch versions.

The eval codebase depends on a few Python packages, which can be downloaded using the following command:

pip install colpali-engine

To keep a lightweight repository, only the essential packages were installed. In particular, you must specify the dependencies to use the training script for ColPali. You can do this using the following command:

pip install "colpali-engine[train]"

[!TIP] For ColPali versions above v1.0, make sure to install the colpali-engine package from source or with a version above v0.2.0.

Usage

The scripts/ directory contains scripts to run training and inference.

Inference

While there is an inference script in this repository, it's recommended to run inference using the vidore-benchmark package.

Training

All the model configs used can be found in scripts/configs/ and rely on the configue package for straightforward configuration. They should be used with the train_colbert.py script.

Example 1: Local training

USE_LOCAL_DATASET=0 python scripts/train/train_colbert.py scripts/configs/siglip/train_siglip_model_debug.yaml

or using accelerate:

accelerate launch scripts/train/train_colbert.py scripts/configs/train_colidefics_model.yaml

Example 2: Training on a SLURM cluster

sbatch --nodes=1 --cpus-per-task=16 --mem-per-cpu=32GB --time=20:00:00 --gres=gpu:1  -p gpua100 --job-name=colidefics --output=colidefics.out --error=colidefics.err --wrap="accelerate launch scripts/train/train_colbert.py  scripts/configs/train_colidefics_model.yaml"

sbatch --nodes=1  --time=5:00:00 -A cad15443 --gres=gpu:8  --constraint=MI250 --job-name=colpali --wrap="python scripts/train/train_colbert.py scripts/configs/train_colpali_model.yaml"

Paper result reproduction

To reproduce the results from the paper, you should checkout to the v0.1.1 tag or install the corresponding colpali-engine package release using:

pip install colpali-engine==0.1.1

Citation

ColPali: Efficient Document Retrieval with Vision Language Models

  • First authors: Manuel Faysse*, Hugues Sibille*, Tony Wu* (*Equal Contribution)
  • Contributors: Bilel Omrani, Gautier Viaud, Céline Hudelot, Pierre Colombo
@misc{faysse2024colpaliefficientdocumentretrieval,
      title={ColPali: Efficient Document Retrieval with Vision Language Models}, 
      author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo},
      year={2024},
      eprint={2407.01449},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2407.01449}, 
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

colpali_engine-0.2.2.tar.gz (35.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

colpali_engine-0.2.2-py3-none-any.whl (37.4 kB view details)

Uploaded Python 3

File details

Details for the file colpali_engine-0.2.2.tar.gz.

File metadata

  • Download URL: colpali_engine-0.2.2.tar.gz
  • Upload date:
  • Size: 35.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for colpali_engine-0.2.2.tar.gz
Algorithm Hash digest
SHA256 fcec57ccdfd8820dd7ae9b7c563cad3c8c96ad8685b77eb0df5c9a648b66431f
MD5 94078cc8b41d3731fd40978de64b099c
BLAKE2b-256 5e2c5cca3b9d26d2c27298e174823cf339f49dc96588513c52ad8829b20eaa5e

See more details on using hashes here.

File details

Details for the file colpali_engine-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: colpali_engine-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 37.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for colpali_engine-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 954706cd8f019fcf707aa6c48ab7dc9bf32d47133c7cccd253d2e3d36f761c03
MD5 e50cb4161d74aa3e3f0932dca9fac1c3
BLAKE2b-256 9c392691d67c1b10ce483a7445ef5040e4be6ef0d2d0bd1c593c4af469f69b9e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page