Skip to main content

Image-text alignment metric trained on cycle consistency preferences

Project description

Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences

Paper | Project Page | Dataset (I2T) | Dataset (T2I) | Dataset Viewer

Hyojin Bahng*, Caroline Chan*, Fredo Durand, Phillip Isola.
(*Equal contribution, alphabetical order.)
MIT CSAIL.

CycleReward is a reward model trained on preferences derived from cycle consistency. Given a forward mapping $$F:X \rightarrow Y$$ and a backward mapping $$G: Y \rightarrow X$$, we define cycle consistency score as the similarity between the original input $$x$$ and its reconstruction $$G(F(x))$$. This score serves as a proxy for preference: higher cycle consistency indicates a preferred output. This provides a more scalable and cheaper signal for learning image-text alignment compared to human supervision. We construct CyclePrefDB, a large-scale preference dataset comprising 866K comparison pairs spanning image-to-text and text-to-image generation, with an emphasis on dense captions and prompts. Trained on this dataset, CycleReward matches or surpasses models trained on human or AI feedback.

Quick Start

Run pip install cyclereward. The following Python code is all you need.

The basic use case is to measure the alignment between an image and a caption. A higher score means more similar, lower means more different. We release three model variants: CycleReward-Combo, CycleReward-I2T, CycleReward-T2I.

from cyclereward import cyclereward
from PIL import Image
import torch 

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = cyclereward(device=device, model_type="CycleReward-Combo")

caption = "a photo of a cat"
image = preprocess(Image.open("image_path")).unsqueeze(0).to(device)
score = model.score(image, caption) 

CyclePrefDB Dataset

CycleReward is trained on CyclePrefDB, a large-scale preference dataset based on cycle consistency. We provide comparison pairs for both image-to-text (I2T) and text-to-image (T2I) generation, with a focus on dense captions and prompts.

Dataset Number of Pairs
CyclePrefDB-I2T 398K
CyclePrefDB-T2I 468K

You can use the Hugging Face Datasets library to load the datasets:

from datasets import load_dataset

# Load dataset
dataset = load_dataset("carolineec/CyclePrefDB-I2T", split='train')

Citation

If you find our work or any of our materials useful, please cite our paper:

@article{bahng2025cycle,
    title={Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences},
    author= {Bahng, Hyojin and Chan, Caroline and Durand, Fredo and Isola, Phillip},
    journal={arXiv preprint arXiv:2506.02095},
    year={2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cyclereward-0.1.4.tar.gz (21.3 kB view details)

Uploaded Source

File details

Details for the file cyclereward-0.1.4.tar.gz.

File metadata

  • Download URL: cyclereward-0.1.4.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.13

File hashes

Hashes for cyclereward-0.1.4.tar.gz
Algorithm Hash digest
SHA256 3bf46a8d4c22ca027b6ae5721ed8b2b45a4548dbe27705e66c8ed7c47120d3dd
MD5 56d904c152e54127136b3a149c3a650a
BLAKE2b-256 27e92acf84671e570dad30c54f3d004205447d5c3b42e1a6260db2d36c633338

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page