Skip to main content

[CVPR24] Polos: Multimodal Metric Learning from Human Feedback for Image Captioning

Project description

🌟 Polos: Multimodal Metric Learning from Human Feedback for Image Captioning

Establishing an automatic evaluation metric that closely aligns with human judgements is essential for the effective development of image captioning models. Data-driven metrics have recently gained prominence in this field, demonstrating a stronger correlation with human judgements than classic metrics such as CIDEr and SPICE. However, these approaches pose challenges; for instance, they lack sufficient capabilities to handle hallucinations and to generalize across various types of images and texts. This limitation is partly attributed to the fact that existing approaches compute scalar similarities merely using embeddings learned from tasks that are not directly related to image captioning evaluation. In this study, we propose Polos, a supervised automatic evaluation metric tailored for image captioning models. To enhance robustness and practicality, we also present Multimodal Metric Learning from Human Feedback (M LHF), a novel framework for developing metrics based on human feedback. In line with the principles of M LHF, Polos is trained directly from human feedback and computes evaluation scores using multimodal inputs, employing a parallel feature extraction mechanism that leverages SimCSE and CLIP. This mechanism enables our metric to effectively model intricate relationships within the vector space of text-image pairs as well as text-text pairs. In addition, we have constructed a large-scale dataset for M LHF, which comprises 131K human judgements collected from 550 evaluators. Our dataset further distinguishes itself from existing datasets in terms of the inclusion of diverse captions, which are collected from humans and generated from ten image captioning models, including modern models. Our approach has achieved state-of-the-art performance on various image captioning benchmarks, including Composite, Flickr8K-Expert, Flickr8K-CF, FOIL, and our dataset, demonstrating its effectiveness and robustness.

Instructions

We assume the following environment for our experiments:

  • Python 3.10.0 (pyenv is strongly recommended)
  • Poetry for dependency management (refer to Poetry documentation)
  • PyTorch version 2.1.0 with CUDA 11.8 support
  • PyTorch Lightning for model training facilitation

Clone & Install

git clone git@github.com:keio-smilab24/Polos.git
cd Polos
pyenv virtualenv 3.10.0 polos
pyenv local polos
sh install.sh # cuda=11.8

Datasets

  • Polaris
    • The Polaris dataset can be downloaded at this link.
    • Unzip and extract the contents into the data_en directory.
  • Flickr8k
    • We evaluate Flickr8K according to the PAC-S pre-processing.
    • Download the dataset from this link provided by the PAC-S authors.
    • Once you have downloaded the dataset, place them under the data_en/flickr8k folder.
  • Composite / PASCAL-50S / FOIL
    • For the Composite, PASCAL-50S, and FOIL datasets, download them from the following links:
    • Composite
    • PASCAL-50S
    • FOIL

Checkpoint

The best checkpoint can be downloaded at this link. Unzip and extract the checkpoints.

Train

sh train.sh

Evaluation

PAC-S checkpoints are required to assess PAC-S.

Download the checkpoints according to the instructions on the authors' github and place them in the specified locations.

sh validate.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polos-0.1.3.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polos-0.1.3-py3-none-any.whl (1.4 MB view details)

Uploaded Python 3

File details

Details for the file polos-0.1.3.tar.gz.

File metadata

  • Download URL: polos-0.1.3.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.0 Linux/5.11.0-27-generic

File hashes

Hashes for polos-0.1.3.tar.gz
Algorithm Hash digest
SHA256 dc5d7737bd9a110bc66ffc3b4132ac3e11a48201686b38730fd43815f96e0ee0
MD5 25d75451fb2e65737ec8a70cb05ad52f
BLAKE2b-256 c4b0a3b3417d28f7ccec26eb7aff9c25bd99439c9b3e0aa2f399daa3b380ca28

See more details on using hashes here.

File details

Details for the file polos-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: polos-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.0 Linux/5.11.0-27-generic

File hashes

Hashes for polos-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 55cf797d9e1b04c382f78b44ad90a489fbe469d0f9254428dfa42295d1344e26
MD5 07335822d7bc0c8354c6f23521e61ae8
BLAKE2b-256 b5586df4fc0fb6cd28bfc988d0edb0235211c809ff41eae7dd4b24f36f79d27a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page