DreamSim similarity metric

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data

Project Page | Paper | Bibtex

Stephanie Fu* $^{1}$, Netanel Tamir* $^{2}$, Shobhita Sundaram* $^{1}$, Lucy Chai $^1$, Richard Zhang $^3$, Tali Dekel $^2$, Phillip Isola $^1$.
(*equal contribution, order decided by random seed)
$^1$ MIT, $^2$ Weizmann Institute of Science, $^3$ Adobe Research.

teaser

Summary

Current metrics for perceptual image similarity operate at the level of pixels and patches. These metrics compare images in terms of their low-level colors and textures, but fail to capture mid-level differences in layout, pose, semantic content, etc. Models that use image-level embeddings such as DINO and CLIP capture high-level and semantic judgements, but may not be aligned with human perception of more finegrained attributes.

DreamSim is a new metric for perceptual image similarity that bridges the gap between "low-level" metrics (e.g. LPIPS, PSNR, SSIM) and "high-level" measures (e.g. CLIP). Our model was trained by concatenating CLIP, OpenCLIP, and DINO embeddings, and then finetuning on human perceptual judgements. We gathered these judgements on a dataset of ~20k image triplets, generated by diffusion models. Our model achieves better alignment with human similarity judgements than existing metrics, and can be used for downstream applications such as image retrieval.

🚀 Newest Updates

10/14/24: We released 4 new variants of DreamSim! These new checkpoints are:

DINOv2 B/14 and SynCLR B/16 as backbones
Models trained with the original contrastive loss on both CLS and dense features.

These models (and the originals) are further evaluated in our new NeurIPS 2024 paper, When Does Perceptual Alignment Benefit Vision Representations?

We find that our perceptually-aligned models outperform the baseline models on a variety of standard computer vision tasks, including semantic segmentation, depth estimation, object counting, instance retrieval, and retrieval-augmented generation. These results point towards perceptual alignment being a useful task for learning general-purpose vision representations. See the paper and our blog post for more details.

Here's how they perform on NIGHTS:

	NIGHTS - Val	NIGHTS - Test
`ensemble`	96.9%	96.2%
`dino_vitb16`	95.6%	94.8%
`open_clip_vitb32`	95.6%	95.3%
`clip_vitb32`	94.9%	93.6%
`dinov2_vitb14`	94.9%	95.0%
`synclr_vitb16`	96.0%	95.9%
`dino_vitb16 (patch)`	94.9%	94.8%
`dinov2_vitb14 (patch)`	95.5%	95.1%

9/14/24: We released new versions of the ensemble and single-branch DreamSim models compatible with peft>=0.2.0.

We also released the entire 100k (unfiltered) NIGHTS dataset and the JND (Just-Noticeable Difference) votes.

Requirements
Setup
Usage
NIGHTS Dataset
Experiments
Citation

Requirements

Linux
Python 3

Setup

Option 1: Install using pip:

pip install dreamsim

The package is used for importing and using the DreamSim model.

Option 2: Clone our repo and install dependencies. This is necessary for running our training/evaluation scripts.

python3 -m venv ds
source ds/bin/activate
pip install -r requirements.txt
export PYTHONPATH="$PYTHONPATH:$(realpath ./dreamsim)"

To install with conda:

conda create -n ds
conda activate ds
conda install pip # verify with the `which pip` command
pip install -r requirements.txt
export PYTHONPATH="$PYTHONPATH:$(realpath ./dreamsim)"

Usage

For walk-through examples of the below use-cases, check out our Colab demo.

Quickstart: Perceptual similarity metric

The basic use case is to measure the perceptual distance between two images. A higher score means more different, lower means more similar.

The following code snippet is all you need. The first time that you run dreamsim it will automatically download the model weights. The default model settings are specified in ./dreamsim/config.py.

from dreamsim import dreamsim
from PIL import Image

device = "cuda"
model, preprocess = dreamsim(pretrained=True, device=device)

img1 = preprocess(Image.open("img1_path")).to(device)
img2 = preprocess(Image.open("img2_path")).to(device)
distance = model(img1, img2) # The model takes an RGB image from [0, 1], size batch_sizex3x224x224

To run on example images, run demo.py. The script should produce distances (0.4453, 0.2756).

Single-branch models

By default, DreamSim uses an ensemble of CLIP, DINO, and OpenCLIP (all ViT-B/16). If you need a lighter-weight model you can use single-branch versions of DreamSim where only a single backbone is finetuned. The single-branch models provide a ~3x speedup over the ensemble.

The available options are OpenCLIP-ViTB/32, DINO-ViTB/16, CLIP-ViTB/32, in order of performance. To load a single-branch model, use the dreamsim_type argument. For example:

dreamsim_dino_model, preprocess = dreamsim(pretrained=True, dreamsim_type="dino_vitb16")

Feature extraction

To extract a single image embedding using dreamsim, use the embed method as shown in the following snippet:

img1 = preprocess(Image.open("img1_path")).to("cuda")
embedding = model.embed(img1)

The perceptual distance between two images is the cosine distance between their embeddings. If the embeddings are normalized (true by default) L2 distance can also be used.

Image retrieval

Our model can be used for image retrieval, and plugged into existing such pipelines. The code below ranks a dataset of images based on their similarity to a given query image.

To speed things up, instead of directly calling model(query, image) for each pair, we use the model.embed(image) method to pre-compute single-image embeddings, and then take the cosine distance between embedding pairs.

import pandas as pd
from tqdm import tqdm
import torch.nn.functional as F

# let query be a sample image.
# let images be a list of images we are searching.

# Compute the query image embedding
query_embed = model.embed(preprocess(query).to("cuda"))
dists = {}

# Compute the (cosine) distance between the query and each search image
for i, im in tqdm(enumerate(images), total=len(images)):
   img_embed = model.embed(preprocess(im).to("cuda"))
   dists[i] = (1 - F.cosine_similarity(query_embed, img_embed, dim=-1)).item()

# Return results sorted by distance
df = pd.DataFrame({"ids": list(dists.keys()), "dists": list(dists.values())})
return df.sort_values(by="dists")

Perceptual loss function

Our model can be used as a loss function for iterative optimization (similarly to the LPIPS metric). These are the key lines; for the full example, refer to the Colab.

for i in range(n_iters):
    dist = model(predicted_image, reference_image)
    dist.backward()
    optimizer.step()

Citation

If you find our work or any of our materials useful, please cite our paper:

@misc{fu2023dreamsim,
      title={DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data}, 
      author={Stephanie Fu and Netanel Tamir and Shobhita Sundaram and Lucy Chai and Richard Zhang and Tali Dekel and Phillip Isola},
      year={2023},
      eprint={2306.09344},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgements

Our code borrows from the "Deep ViT Features as Dense Visual Descriptors" repository for ViT feature extraction, and takes inspiration from the UniverSeg respository for code structure.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.2.1

Oct 15, 2024

0.2.0

Sep 14, 2024

0.1.3

Jul 27, 2023

0.1.2

Jul 20, 2023

0.1.1

Jun 16, 2023

0.1.0

Jun 16, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dreamsim-0.2.1.tar.gz (25.0 kB view details)

Uploaded Oct 15, 2024 Source

File details

Details for the file dreamsim-0.2.1.tar.gz.

File metadata

Download URL: dreamsim-0.2.1.tar.gz
Upload date: Oct 15, 2024
Size: 25.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for dreamsim-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`36c655ee1bb5dbbf1730f03a59ac0d0180922f2151dfbdfb4f058b1756903d14`
MD5	`bfecf5f60a4902831ee2894644f3ec34`
BLAKE2b-256	`24a8808a6ed5435fe42db80f095f24ff025094239beeb2037a8fb6d8d8e828a8`

See more details on using hashes here.

dreamsim 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Classifiers

Project description

DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data

Project Page | Paper | Bibtex

🚀 Newest Updates

Table of Contents

Requirements

Setup

Usage

Quickstart: Perceptual similarity metric

Single-branch models

Feature extraction

Image retrieval

Perceptual loss function

Citation

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes