Image-text alignment metric trained on cycle consistency preferences
Project description
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
Paper | Project Page | Dataset (I2T) | Dataset (T2I) | Dataset Viewer
Hyojin Bahng*, Caroline Chan*, Fredo Durand, Phillip Isola.
(*Equal contribution, alphabetical order.)
MIT CSAIL.
CycleReward is a reward model trained on preferences derived from cycle consistency. Given a forward mapping $$F:X \rightarrow Y$$ and a backward mapping $$G: Y \rightarrow X$$, we define cycle consistency score as the similarity between the original input $$x$$ and its reconstruction $$G(F(x))$$. This score serves as a proxy for preference: higher cycle consistency indicates a preferred output. This provides a more scalable and cheaper signal for learning image-text alignment compared to human supervision. We construct CyclePrefDB, a large-scale preference dataset comprising 866K comparison pairs spanning image-to-text and text-to-image generation, with an emphasis on dense captions and prompts. Trained on this dataset, CycleReward matches or surpasses models trained on human or AI feedback.
Quick Start
Run pip install cyclereward. The following Python code is all you need.
The basic use case is to measure the alignment between an image and a caption. A higher score means more similar, lower means more different. We release three model variants: CycleReward-Combo, CycleReward-I2T, CycleReward-T2I.
from cyclereward import cyclereward
from PIL import Image
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = cyclereward(device=device, model_type="CycleReward-Combo")
caption = "a photo of a cat"
image = preprocess(Image.open("image_path")).unsqueeze(0).to(device)
score = model.score(image, caption)
CyclePrefDB Dataset
CycleReward is trained on CyclePrefDB, a large-scale preference dataset based on cycle consistency. We provide comparison pairs for both image-to-text (I2T) and text-to-image (T2I) generation, with a focus on dense captions and prompts.
| Dataset | Number of Pairs |
|---|---|
| CyclePrefDB-I2T | 398K |
| CyclePrefDB-T2I | 468K |
You can use the Hugging Face Datasets library to load the datasets:
from datasets import load_dataset
# Load dataset
dataset = load_dataset("carolineec/CyclePrefDB-I2T", split='train')
Citation
If you find our work or any of our materials useful, please cite our paper:
@article{bahng2025cycle,
title={Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences},
author= {Bahng, Hyojin and Chan, Caroline and Durand, Fredo and Isola, Phillip},
journal={arXiv preprint arXiv:2506.02095},
year={2025}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file cyclereward-0.1.5.tar.gz.
File metadata
- Download URL: cyclereward-0.1.5.tar.gz
- Upload date:
- Size: 21.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5adfb5e8b43d0cab87dc569c2b8b9f6b938f39d88aff6f983b7f40ee99a5cc3
|
|
| MD5 |
9e0772435fc85874040603aeef870d69
|
|
| BLAKE2b-256 |
2d768e67785e1a8811a0dddb845be42c10e860c6cf3bf7baf97ab517e15dc4b5
|