Relational Visual Similarity - A new visual similarity notion that captures the internal relational logic of a scene
Project description
relsim
| We introduce a new visual similarity notion — relational visual similarity (relsim) — which captures the internal relational logic of a scene rather than its surface appearance. |
Relational Visual Similarity (arXiv 2025)
Thao Nguyen1, Sicheng Mo3, Krishna Kumar Singh2, Yilin Wang2, Jing Shi2, Nicholas Kolkin2, Eli Shechtman2, Yong Jae Lee1,2, ★, Yuheng Li1, ★
(★ Equal advising)
1- University of Wisconsin–Madison | 2- Adobe Research | 3- UCLA
TL;DR: We introduce a new visual similarity notion: relational visual similarity, which complements traditional attribute-based perceptual similarity (e.g., LPIPS, CLIP, DINO).
🔗 Jump to: Installation | 🛠️ Usage | 🫥 Anonymous Captioning Model | 📁 Data | BibTeX |
📦 Installation
Option 1: Install from GitHub (recommended)
pip install git+https://github.com/thaoshibe/relsim.git
Option 2: Install from local directory
git clone https://github.com/thaoshibe/relsim.git
cd relsim
pip install -e . # Install in editable mode
Option 3: Install from PyPI (when published)
pip install relsim
🛠️ Usage
Given two images, you can compute their relational visual similarity like this:
from relsim.relsim_score import relsim
from PIL import Image
# Load model
model, preprocess = relsim(pretrained=True, checkpoint_dir="thaoshibe/relsim-qwenvl25-lora")
img1 = preprocess(Image.open("./anonymous_caption/bo.jpg"))
img2 = preprocess(Image.open("./anonymous_caption/mam.jpg"))
similarity = model(img1, img2) # Returns similarity score (higher = more similar)
print(f"✅ Similarity score: {similarity:.4f}")
# you should see
For example, you should see reult
| reference image | test image 1 | test image 2 | test image 3 | test image 4 | test image 5 | test image 6 |
|---|---|---|---|---|---|---|
| (to itself: 1.000) | 0.981 | 0.830 | 0.808 | 0.767 | 0.465 | 0.223 |
🤗 You're welcome to improve the current relsim model! The training code is provided in ./relsim/ folder. For a quick jump to the training script: (Reminder: you need to download data here to run this code sucessfully)
cd relsim
# pip install -r requirements_train.txt
bash train.sh # this assume you have the dataset alrerady
### you might want to export WANDB and HF_TOKEN
# export WANDB_API_KEY='your_wandb_api_key'
# export HF_TOKEN='your_hf_token'
If you use wandb to log the result, your wandb should look like this
🫥 Anonymous Caption Model
Anonymous captions are image captions that do not refer to specific visible objects but instead capture the relational logic conveyed by the image.
The pretrained anonymous caption model (Qwen-VL-2.5 7B) is provided in ./anonymous_caption. This model is trained on a limited number of seed groups and their corresponding generated captions (you can see the training data here).
# run on default test image (mam.jpg)
python anonymous_caption/anonymous_caption.py
# run on your own images
python anonymous_caption/anonymous_caption.py --image_path $PATH_TO_IMAGE_OR_IMAGE_FOLDER
# if you need to see all arguments (e.g., batch size)
python anonymous_caption/anonymous_caption.py --help
Here is example of the generated captions with different runs.
| Input image | Generated captions (Different run) |
|---|---|
Example: python anonymous_caption/anonymous_caption.py --image_path anonymous_caption/mam.jpgRun 1: "Curious {Animal} peering out from behind a {Object}." Run 2: "Curious {Animal} peeking out from behind the {Object} in an unexpected and playful way." Run 3: "Curious {Cat} looking through a {Doorway} into the {Room}." Run 4: "A curious {Animal} peeking from behind a {Barrier}." Run 5: "A {Cat} peeking out from behind a {Door} with curious eyes." ... |
|
Example: python anonymous_caption/anonymous_caption.py --image_path anonymous_caption/bo.jpgRun 1: "Animals with {Leaf} artfully placed on their {Head}." Run 2: "A {Dog} with a {Leaf} delicately placed on its head." Run 3: "A {Dog} with a {Leaf} artfully placed on its head." Run 4: "A {Dog} with a {Leaf} delicately placed on their head, representing the beauty of {Season}." Run5: "Animals adorned with {Leaf} in a {Seasonal} setting." ... |
You are more than welcome to help improve the anonymous caption model! The current model may hallucinate or produce incorrect results, and sometimes it may generate captions that are not "anonymous enough"...
The training script for the anonymous caption model is shown below. Please check config.yaml for config details.
#########################################
#
# train anonymous caption model
#
#########################################
# (optional) install git lfs if you don't have
sudo apt update
sudo apt install git-lfs
git lfs install
# clone repo if you havent do that
git clone https://github.com/thaoshibe/relsim.git
cd relsim
# download the training data
cd anonymous_caption
git clone https://huggingface.co/datasets/thaoshibe/seed-groups
pip install -r requirements.txt
# run train
python anonymous_caption_train.py
*If you choose to log to wandb, your wandb should look like image below. Checkpoints will be saved in `./anonymous_caption/ckpt`.*
And your console should look like this:
📁 Data
🔍 You can see the snapshot of the data on this live website: 🔍🔍🔍 relsim: data viewer
| Dataset name | Short description | JSON file | 🔍 Data viewer |
|---|---|---|---|
| seed-groups |
Use to train the anonymous captioning model | seed_group.json | See Seed Groups Dataset |
| anonymous-captions-114k |
Use to train the relational similarity model | anonymous_captions_train.jsonl, anonymous_captions_test.jsonl | See Anonymous Captions Dataset |
Each image will be given by their corresponding Image URL. Please see the json files in ./data.
(Optional) Depending on your internet speed, it should take under 0.5 hours to download all images with the default MAX_WORKER = 64.
You can increase MAX_WORKER to speed up the download or reduce it depending on your machine (see the data/download_data.sh)
To download, please run this the data/download_data.sh
#########################################
#
# download data
#
#########################################
git clone https://github.com/thaoshibe/relsim.git
cd relsim
bash data/download_data.sh # this script will download all dataset
Disclaimer
All images are extracted from LAION dataset. We do NOT own any of the images and we acknowledge the rights and contributions of the original creators. Please respect the authors of all images. These images are used for research purposes only.
BibTeX
@article{nguyen2025relsim,
title={Relational Visual Similarity},
author={Nguyen, Thao and Mo, Sicheng and Singh, Krishna Kumar and Wang, Yilin and Shi, Jing and Kolkin, Nicholas and Shechtman, Eli and Lee, Yong Jae and Li, Yuheng},
journal={arXiv preprint arXiv:XXXX.XXXXX},
year={2025}
}
---
The end; Thank you!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file relsim-0.1.0.tar.gz.
File metadata
- Download URL: relsim-0.1.0.tar.gz
- Upload date:
- Size: 20.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
894585e1c740e9eb2795a57b55e83c8a783a4fe67c5698516a4568c4b0f2b459
|
|
| MD5 |
e4792bf518cdc76a5de4965f340ecf92
|
|
| BLAKE2b-256 |
b7ab3fce41200ab4c507f8f536922a411bdf260c3e7bffc6a45dc6111a0563b4
|
File details
Details for the file relsim-0.1.0-py3-none-any.whl.
File metadata
- Download URL: relsim-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b0ebd01c862e267604bf05f6019185607f4572996bacb30b1f026c415340a7e
|
|
| MD5 |
ad947edec53147ccab99374e07527fbe
|
|
| BLAKE2b-256 |
ab3687606597e751ff817134331d387bc88e8e8e440428a3ab72786008c0f129
|