Skip to main content

Relational Visual Similarity - A new visual similarity notion that captures the internal relational logic of a scene

Project description

relsim

arXiv BibTeX Project Page HuggingFace Dataset Data Viewer

We introduce a new visual similarity notion — relational visual similarity (relsim) — which captures the internal relational logic of a scene rather than its surface appearance.

Relational Visual Similarity (arXiv 2025)
Thao Nguyen1, Sicheng Mo3, Krishna Kumar Singh2, Yilin Wang2, Jing Shi2, Nicholas Kolkin2, Eli Shechtman2, Yong Jae Lee1,2, ★, Yuheng Li1, ★
(★ Equal advising)
1- University of Wisconsin–Madison | 2- Adobe Research | 3- UCLA

TL;DR: We introduce a new visual similarity notion: relational visual similarity, which complements traditional attribute-based perceptual similarity (e.g., LPIPS, CLIP, DINO).


🔗 Jump to: Installation | 🛠️ Usage | 🫥 Anonymous Captioning Model | 📁 Data | BibTeX |


📦 Installation

Option 1: Install from GitHub (recommended)

pip install git+https://github.com/thaoshibe/relsim.git

Option 2: Install from local directory

git clone https://github.com/thaoshibe/relsim.git
cd relsim
pip install -e .  # Install in editable mode

Option 3: Install from PyPI (when published)

pip install relsim

🛠️ Usage

Given two images, you can compute their relational visual similarity like this:

from relsim.relsim_score import relsim
from PIL import Image

# Load model
model, preprocess = relsim(pretrained=True, checkpoint_dir="thaoshibe/relsim-qwenvl25-lora")

img1 = preprocess(Image.open("./anonymous_caption/bo.jpg"))
img2 = preprocess(Image.open("./anonymous_caption/mam.jpg"))
similarity = model(img1, img2)  # Returns similarity score (higher = more similar)
print(f"✅ Similarity score: {similarity:.4f}")
# you should see 

For example, you should see reult

reference image test image 1 test image 2 test image 3 test image 4 test image 5 test image 6
(to itself: 1.000) 0.981 0.830 0.808 0.767 0.465 0.223

🤗 You're welcome to improve the current relsim model! The training code is provided in ./relsim/ folder. For a quick jump to the training script: (Reminder: you need to download data here to run this code sucessfully)

cd relsim
# pip install -r requirements_train.txt
bash train.sh # this assume you have the dataset alrerady

### you might want to export WANDB and HF_TOKEN
# export WANDB_API_KEY='your_wandb_api_key'
# export HF_TOKEN='your_hf_token'
If you use wandb to log the result, your wandb should look like this

🫥 Anonymous Caption Model

Anonymous captions are image captions that do not refer to specific visible objects but instead capture the relational logic conveyed by the image.

The pretrained anonymous caption model (Qwen-VL-2.5 7B) is provided in ./anonymous_caption. This model is trained on a limited number of seed groups and their corresponding generated captions (you can see the training data here).


# run on default test image (mam.jpg)
python anonymous_caption/anonymous_caption.py

# run on your own images
python anonymous_caption/anonymous_caption.py --image_path $PATH_TO_IMAGE_OR_IMAGE_FOLDER

# if you need to see all arguments (e.g., batch size)
python anonymous_caption/anonymous_caption.py --help

Here is example of the generated captions with different runs.

Input image Generated captions (Different run)
Example: python anonymous_caption/anonymous_caption.py --image_path anonymous_caption/mam.jpg
Run 1: "Curious {Animal} peering out from behind a {Object}."
Run 2: "Curious {Animal} peeking out from behind the {Object} in an unexpected and playful way."
Run 3: "Curious {Cat} looking through a {Doorway} into the {Room}."
Run 4: "A curious {Animal} peeking from behind a {Barrier}."
Run 5: "A {Cat} peeking out from behind a {Door} with curious eyes."
...
Example: python anonymous_caption/anonymous_caption.py --image_path anonymous_caption/bo.jpg
Run 1: "Animals with {Leaf} artfully placed on their {Head}."
Run 2: "A {Dog} with a {Leaf} delicately placed on its head."
Run 3: "A {Dog} with a {Leaf} artfully placed on its head."
Run 4: "A {Dog} with a {Leaf} delicately placed on their head, representing the beauty of {Season}."
Run5: "Animals adorned with {Leaf} in a {Seasonal} setting."
...

You are more than welcome to help improve the anonymous caption model! The current model may hallucinate or produce incorrect results, and sometimes it may generate captions that are not "anonymous enough"...

The training script for the anonymous caption model is shown below. Please check config.yaml for config details.

#########################################
#
#     train anonymous caption model 
#
#########################################

# (optional) install git lfs if you don't have
sudo apt update
sudo apt install git-lfs
git lfs install

# clone repo if you havent do that
git clone https://github.com/thaoshibe/relsim.git
cd relsim

# download the training data
cd anonymous_caption
git clone https://huggingface.co/datasets/thaoshibe/seed-groups
pip install -r requirements.txt
# run train
python anonymous_caption_train.py
*If you choose to log to wandb, your wandb should look like image below. Checkpoints will be saved in `./anonymous_caption/ckpt`.*

And your console should look like this:

📁 Data

🔍 You can see the snapshot of the data on this live website: 🔍🔍🔍 relsim: data viewer

Dataset name Short description JSON file 🔍 Data viewer
seed-groups HuggingFace Dataset Use to train the anonymous captioning model seed_group.json See Seed Groups Dataset
anonymous-captions-114k HuggingFace Dataset Use to train the relational similarity model anonymous_captions_train.jsonl, anonymous_captions_test.jsonl See Anonymous Captions Dataset

Each image will be given by their corresponding Image URL. Please see the json files in ./data.

(Optional) Depending on your internet speed, it should take under 0.5 hours to download all images with the default MAX_WORKER = 64. You can increase MAX_WORKER to speed up the download or reduce it depending on your machine (see the data/download_data.sh)

To download, please run this the data/download_data.sh

#########################################
#
#            download data
#
#########################################

git clone https://github.com/thaoshibe/relsim.git
cd relsim
bash data/download_data.sh # this script will download all dataset

Disclaimer

All images are extracted from LAION dataset. We do NOT own any of the images and we acknowledge the rights and contributions of the original creators. Please respect the authors of all images. These images are used for research purposes only.


BibTeX

@article{nguyen2025relsim,
  title={Relational Visual Similarity},
  author={Nguyen, Thao and Mo, Sicheng and Singh, Krishna Kumar and Wang, Yilin and Shi, Jing and Kolkin, Nicholas and Shechtman, Eli and Lee, Yong Jae and Li, Yuheng},
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2025}
}

---
The end; Thank you!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

relsim-0.1.0.tar.gz (20.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

relsim-0.1.0-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file relsim-0.1.0.tar.gz.

File metadata

  • Download URL: relsim-0.1.0.tar.gz
  • Upload date:
  • Size: 20.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.16

File hashes

Hashes for relsim-0.1.0.tar.gz
Algorithm Hash digest
SHA256 894585e1c740e9eb2795a57b55e83c8a783a4fe67c5698516a4568c4b0f2b459
MD5 e4792bf518cdc76a5de4965f340ecf92
BLAKE2b-256 b7ab3fce41200ab4c507f8f536922a411bdf260c3e7bffc6a45dc6111a0563b4

See more details on using hashes here.

File details

Details for the file relsim-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: relsim-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.16

File hashes

Hashes for relsim-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0b0ebd01c862e267604bf05f6019185607f4572996bacb30b1f026c415340a7e
MD5 ad947edec53147ccab99374e07527fbe
BLAKE2b-256 ab3687606597e751ff817134331d387bc88e8e8e440428a3ab72786008c0f129

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page