Skip to main content

Image-text alignment metric trained on cycle consistency preferences

Project description

Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences

Paper | Project Page | Dataset (I2T) | Dataset (T2I) | Dataset Viewer

Hyojin Bahng*, Caroline Chan*, Fredo Durand, Phillip Isola.
(*Equal contribution, alphabetical order.)
MIT CSAIL.

CycleReward is a reward model trained on preferences derived from cycle consistency. Given a forward mapping $$F:X \rightarrow Y$$ and a backward mapping $$G: Y \rightarrow X$$, we define cycle consistency score as the similarity between the original input $$x$$ and its reconstruction $$G(F(x))$$. This score serves as a proxy for preference: higher cycle consistency indicates a preferred output. This provides a more scalable and cheaper signal for learning image-text alignment compared to human supervision. We construct CyclePrefDB, a large-scale preference dataset comprising 866K comparison pairs spanning image-to-text and text-to-image generation, with an emphasis on dense captions and prompts. Trained on this dataset, CycleReward matches or surpasses models trained on human or AI feedback.

Quick Start

Run pip install cyclereward. The following Python code is all you need.

The basic use case is to measure the alignment between an image and a caption. A higher score means more similar, lower means more different. We release three model variants: CycleReward-Combo, CycleReward-I2T, CycleReward-T2I.

from cyclereward import cyclereward
from PIL import Image
import torch 

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = cyclereward(device=device, model_type="CycleReward-Combo")

caption = "a photo of a cat"
image = preprocess(Image.open("image_path")).unsqueeze(0).to(device)
score = model.score(image, caption) 

CyclePrefDB Dataset

CycleReward is trained on CyclePrefDB, a large-scale preference dataset based on cycle consistency. We provide comparison pairs for both image-to-text (I2T) and text-to-image (T2I) generation, with a focus on dense captions and prompts.

Dataset Number of Pairs
CyclePrefDB-I2T 398K
CyclePrefDB-T2I 468K

You can use the Hugging Face Datasets library to load the datasets:

from datasets import load_dataset

# Load dataset
dataset = load_dataset("carolineec/CyclePrefDB-I2T", split='train')

Citation

If you find our work or any of our materials useful, please cite our paper:

@article{bahng2025cycle,
    title={Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences},
    author= {Bahng, Hyojin and Chan, Caroline and Durand, Fredo and Isola, Phillip},
    journal={arXiv preprint arXiv:2506.02095},
    year={2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cyclereward-0.1.3.tar.gz (21.3 kB view details)

Uploaded Source

File details

Details for the file cyclereward-0.1.3.tar.gz.

File metadata

  • Download URL: cyclereward-0.1.3.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.13

File hashes

Hashes for cyclereward-0.1.3.tar.gz
Algorithm Hash digest
SHA256 032c2d7a468173425dd2080018ee70e27e98f223ea5ec15ae271d55256a4604f
MD5 8463ed95655fbcb64ab2c2509333e1a5
BLAKE2b-256 d56cbd31b31e3289ee0f4c733b863cc27565dc1c3ac6fe974dbc3cbb9c2f3227

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page