A wrapper package for CLIP, providing explanations for how the model compares images and captions.
Project description
exCLIP
This repository contains the code for the TMLR'25 paper Explaining Caption-Image Interactions in CLIP Models with Second-Order Attributions.
We are still working on cleaning up the code to make it easily accessible and will be updating this repo over the next couple of days. To stay tuned, we would be glad if you leave a star! 🤩
Contribution
Our method enables to look into which part of a caption and an image CLIP matches. We can make arbitrary selections over spans in captions and see which image regions correspond to them or vice versa. This is demonstrated in the follwing plot.
In the top row, we select spans in captions (yellow) and see what they correspond to in the image above. In the bottom row, we select bounding-boxes in the image (yellow) and see what they correspond to in the caption below. Heatmaps in both images and captions are red for positive and blue for negative values.
For all details, check out the paper!
Installation
To use our exclip package, simply install it with (The pip version is currently behind, will be updated soon):
$ pip install exclip
You also need to install OpenAI's clip package with the following command (since it is not available on PyPI):
$ pip install git+https://github.com/openai/CLIP.git
Alternatively, you can directly install this repository:
$ pip install git+https://github.com/lucasmllr/exCLIP
or clone it and run $ pip install . inside the cloned directory.
The latter two versions already include the clip installation, too.
Getting started
The following minimal example initializes a clip model, wraps it into our Explainer and computes interaction explanations for a given image-caption pair.
import clip
from PIL import Image
from exclip import Explainer
from exclip.models.tokenization import ClipTokenizer
device = 'cuda:1'
model, prep = clip.load('ViT-B/16', device=device)
tokenizer = ClipTokenizer()
explainer = Explainer(model, device=device)
image = Image.open("examples/dogs.jpg")
caption = 'A white husky and a black dog running in a snow covered forest.'
txt_inpt = tokenizer.tokenize(caption).to(device)
img_inpt = prep(image).unsqueeze(0).to(device)
# computing explanations for all token-patch interactions between the image and caption
interactions = explainer.explain(txt_inpt, img_inpt)
This example is also inclued in minimal_example_openai.py and minimap_example_open_clip.py contains an equivalent for an OpenClip model.
The demo.ipynb notebook also shows how to visualize the resulting explanations for different token ranges in the caption.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file exclip-1.1.tar.gz.
File metadata
- Download URL: exclip-1.1.tar.gz
- Upload date:
- Size: 10.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9d6f815bf232eb08936481821ebccd993af7fabb8fe94ddbe1c6916b740b8bf
|
|
| MD5 |
ff8d3436d05a2b7db911eae3dc799acc
|
|
| BLAKE2b-256 |
f4acf9f42c5b004a1ed46ea7fa761e43c36ac528ab82631c33f814a3a0d92674
|
File details
Details for the file exclip-1.1-py3-none-any.whl.
File metadata
- Download URL: exclip-1.1-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c200b71ef5aa34e4245b620a1f9731095b4b780eb46657733b166021655ecdd2
|
|
| MD5 |
7fe9b67bda22b3a862db8d3ef7f8fd83
|
|
| BLAKE2b-256 |
554d7d06b108aaea2acd09d2b8c1febfb1ec5cb635d07ca5645cf5dedfbcdedf
|