GenArena Arena Evaluation - VLM-based pairwise image generation evaluation

These details have not been verified by PyPI

Project description

GenArena

A unified evaluation framework for visual generation tasks using VLM-based pairwise comparison and Elo ranking.

Abstract

The rapid advancement of visual generation models has outpaced traditional evaluation approaches, necessitating the adoption of Vision-Language Models as surrogate judges. In this work, we systematically investigate the reliability of the prevailing absolute pointwise scoring standard, across a wide spectrum of visual generation tasks. Our analysis reveals that this paradigm is limited due to stochastic inconsistency and poor alignment with human perception. To resolve these limitations, we introduce GenArena, a unified evaluation framework that leverages a pairwise comparison paradigm to ensure stable and human-aligned evaluation. Crucially, our experiments uncover a transformative finding that simply adopting this pairwise protocol enables off-the-shelf open-source models to outperform top-tier proprietary models. Notably, our method boosts evaluation accuracy by over 20% and achieves a Spearman correlation of 0.86 with the authoritative LMArena leaderboard, drastically surpassing the 0.36 correlation of pointwise methods. Based on GenArena, we benchmark state-of-the-art visual generation models across diverse tasks, providing the community with a rigorous and automated evaluation standard for visual generation.

Quick Start

Installation

pip install genarena

Or install from source:

git clone https://github.com/ruihanglix/genarena.git
cd genarena
pip install -e .

Initialize Arena

Download benchmark data and official arena data with one command:

genarena init --arena_dir ./arena --data_dir ./data

This downloads:

Benchmark Parquet data from rhli/genarena (HuggingFace)
Official arena data (model outputs + battle logs) from rhli/genarena-battlefield

Environment Setup

Set your VLM API credentials:

export OPENAI_API_KEY="your-api-key"
export OPENAI_BASE_URL="https://api.example.com/v1"

For multi-endpoint support (load balancing and failover), use comma-separated values:

export OPENAI_BASE_URLS="https://api1.example.com/v1,https://api2.example.com/v1"
export OPENAI_API_KEYS="key1,key2,key3"

Run Evaluation

genarena run --arena_dir ./arena --data_dir ./data

View Leaderboard

genarena leaderboard --arena_dir ./arena --subset basic

Check Status

genarena status --arena_dir ./arena --data_dir ./data

Running Your Own Experiments

Directory Structure

To add your own model for evaluation, organize outputs in the following structure:

arena_dir/
└── <subset>/
    └── models/
        └── <GithubID>_<modelName>_<yyyymmdd>/
            └── <model_name>/
                ├── 000000.png
                ├── 000001.png
                └── ...

For example:

arena/basic/models/johndoe_MyNewModel_20260205/MyNewModel/

Generate Images with Diffgentor

Use Diffgentor to batch generate images for evaluation:

# Download benchmark data
hf download rhli/genarena --repo-type dataset --local-dir ./data

# Generate images with your model
diffgentor edit --backend diffusers \
    --model_name YourModel \
    --input ./data/basic/ \
    --output_dir ./arena/basic/models/yourname_YourModel_20260205/YourModel/

Run Battles for New Models

genarena run --arena_dir ./arena --data_dir ./data \
    --subset basic \
    --exp_name yourname_YourModel_20260205

GenArena automatically detects new models and schedules battles against existing models.

Submit to Official Leaderboard

Coming Soon: The genarena submit command will allow you to submit your evaluation results to the official GenArena leaderboard via GitHub PR.

The workflow will be:

Run evaluation locally with genarena run
Upload results to your HuggingFace repository
Submit via genarena submit which creates a PR for review

Documentation

Document	Description
Quick Start	Installation and basic usage guide
Architecture	System design and key concepts
CLI Reference	Complete command-line interface documentation
Experiment Management	How to organize and manage experiments
FAQ	Frequently asked questions

Citation

TBD

License

Apache License 2.0

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.2

Feb 5, 2026

This version

0.1.1

Feb 5, 2026

0.1.0

Feb 5, 2026

0.0.1

Jan 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genarena-0.1.1.tar.gz (177.8 kB view details)

Uploaded Feb 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

genarena-0.1.1-py3-none-any.whl (177.1 kB view details)

Uploaded Feb 5, 2026 Python 3

File details

Details for the file genarena-0.1.1.tar.gz.

File metadata

Download URL: genarena-0.1.1.tar.gz
Upload date: Feb 5, 2026
Size: 177.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for genarena-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`882bc2a996c7b27d0f833fe7c9358485c49103e36c59e97cce382215a4e81a5c`
MD5	`704c7acbe3ec00ff7f1119ece519a005`
BLAKE2b-256	`52f6a634887cec861a38ce6fea969a7b01eb0306dc6036f616f52cd3b13a7bbf`

See more details on using hashes here.

Provenance

The following attestation bundles were made for genarena-0.1.1.tar.gz:

Publisher: publish.yml on ruihanglix/genarena

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: genarena-0.1.1.tar.gz
- Subject digest: 882bc2a996c7b27d0f833fe7c9358485c49103e36c59e97cce382215a4e81a5c
- Sigstore transparency entry: 919515706
- Sigstore integration time: Feb 5, 2026
Source repository:
- Permalink: ruihanglix/genarena@935de8b09e298dd994b5f4048ab64732c6612d31
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/ruihanglix
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@935de8b09e298dd994b5f4048ab64732c6612d31
- Trigger Event: push

File details

Details for the file genarena-0.1.1-py3-none-any.whl.

File metadata

Download URL: genarena-0.1.1-py3-none-any.whl
Upload date: Feb 5, 2026
Size: 177.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for genarena-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d44c668c40f40c0d5ae72c7bc9aebd4298626be2ca2ef98e60e197b560c73548`
MD5	`2cdf4a8373346e0df943fcda213cbcc1`
BLAKE2b-256	`16fd47eb97af0cf9e4adaa9f89e6c20c4e050b47dbc1c056c65728ec31649147`

See more details on using hashes here.

Provenance

The following attestation bundles were made for genarena-0.1.1-py3-none-any.whl:

Publisher: publish.yml on ruihanglix/genarena

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: genarena-0.1.1-py3-none-any.whl
- Subject digest: d44c668c40f40c0d5ae72c7bc9aebd4298626be2ca2ef98e60e197b560c73548
- Sigstore transparency entry: 919515729
- Sigstore integration time: Feb 5, 2026
Source repository:
- Permalink: ruihanglix/genarena@935de8b09e298dd994b5f4048ab64732c6612d31
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/ruihanglix
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@935de8b09e298dd994b5f4048ab64732c6612d31
- Trigger Event: push

genarena 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

GenArena

Abstract

Quick Start

Installation

Initialize Arena

Environment Setup

Run Evaluation

View Leaderboard

Check Status

Running Your Own Experiments

Directory Structure

Generate Images with Diffgentor

Run Battles for New Models

Submit to Official Leaderboard

Documentation

Citation

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance