Skip to main content

A wrapper package for CLIP, providing explanations for how the model compares images and captions.

Project description

exCLIP

This repository contains the code for the TMLR'25 paper Explaining Caption-Image Interactions in CLIP Models with Second-Order Attributions.

We are still working on cleaning up the code to make it easily accessible and will be updating this repo over the next couple of days. To stay tuned, we would be glad if you leave a star! 🤩

Contribution

Our method enables to look into which part of a caption and an image CLIP matches. We can make arbitrary selections over spans in captions and see which image regions correspond to them or vice versa. This is demonstrated in the follwing plot.

example

In the top row, we select spans in captions (yellow) and see what they correspond to in the image above. In the bottom row, we select bounding-boxes in the image (yellow) and see what they correspond to in the caption below. Heatmaps in both images and captions are red for positive and blue for negative values.

For all details, check out the paper!

Installation

To use our exclip package, simply install it with (The pip version is currently behind, will be updated soon):

$ pip install exclip

You also need to install OpenAI's clip package with the following command (since it is not available on PyPI):

$ pip install git+https://github.com/openai/CLIP.git

Alternatively, you can directly install this repository:

$ pip install git+https://github.com/lucasmllr/exCLIP

or clone it and run $ pip install . inside the cloned directory. The latter two versions already include the clip installation, too.

Getting started

The following minimal example initializes a clip model, wraps it into our Explainer and computes interaction explanations for a given image-caption pair.

import clip
from PIL import Image
from exclip import Explainer
from exclip.models.tokenization import ClipTokenizer

device = 'cuda:1'
model, prep = clip.load('ViT-B/16', device=device)
tokenizer = ClipTokenizer()
explainer = Explainer(model, device=device)

image = Image.open("examples/dogs.jpg")
caption = 'A white husky and a black dog running in a snow covered forest.'
txt_inpt = tokenizer.tokenize(caption).to(device)
img_inpt = prep(image).unsqueeze(0).to(device)

# computing explanations for all token-patch interactions between the image and caption
interactions = explainer.explain(txt_inpt, img_inpt)

This example is also inclued in minimal_example_openai.py and minimap_example_open_clip.py contains an equivalent for an OpenClip model.

The demo.ipynb notebook also shows how to visualize the resulting explanations for different token ranges in the caption.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exclip-1.1.tar.gz (10.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

exclip-1.1-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file exclip-1.1.tar.gz.

File metadata

  • Download URL: exclip-1.1.tar.gz
  • Upload date:
  • Size: 10.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for exclip-1.1.tar.gz
Algorithm Hash digest
SHA256 a9d6f815bf232eb08936481821ebccd993af7fabb8fe94ddbe1c6916b740b8bf
MD5 ff8d3436d05a2b7db911eae3dc799acc
BLAKE2b-256 f4acf9f42c5b004a1ed46ea7fa761e43c36ac528ab82631c33f814a3a0d92674

See more details on using hashes here.

File details

Details for the file exclip-1.1-py3-none-any.whl.

File metadata

  • Download URL: exclip-1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for exclip-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c200b71ef5aa34e4245b620a1f9731095b4b780eb46657733b166021655ecdd2
MD5 7fe9b67bda22b3a862db8d3ef7f8fd83
BLAKE2b-256 554d7d06b108aaea2acd09d2b8c1febfb1ec5cb635d07ca5645cf5dedfbcdedf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page