Skip to main content

Ranking methodology powering the LMArena leaderboard.

Project description

LMArena logo

Arena-Rank: The ranking methodology powering the LMArena leaderboard.

| LMArena | Blog | X | Discord | LinkedIn |

Installation

From pip: pip install arena-rank

From source:

git clone https://github.com/lmarena/arena-rank && cd arena-rank
uv sync

Examples

Below is a minimal example using Arena-Rank to produce a leaderboard on LMArena data:

import pandas as pd
import datasets
from arena_rank.utils.data_utils import PairDataset
from arena_rank.models.bradley_terry import BradleyTerry

df = datasets.load_dataset(
    "lmarena-ai/arena-human-preference-140k",
    columns=["model_a", "model_b", "winner"]
)["train"].to_pandas()

dataset = PairDataset.from_pandas(df)
model = BradleyTerry(n_competitors=len(dataset.competitors))

# compute ratings and 95% confidence intervals
results = model.compute_ratings_and_cis(dataset, significance_level=0.05)

# print top 10 competitors with ratings and confidence intervals
leaderboard = pd.DataFrame(results).sort_values("ratings", ascending=False).head(10)
print(leaderboard.to_markdown(index=False))
| competitors                         |   ratings |   rating_lower |   rating_upper |   variances |
|:------------------------------------|----------:|---------------:|---------------:|------------:|
| gemini-2.5-pro                      |   1124.07 |        1117.61 |        1130.53 |    10.8542  |
| gemini-2.5-pro-preview-03-25        |   1097.88 |        1082    |        1113.77 |    65.6717  |
| grok-4-0709                         |   1093.34 |        1078.44 |        1108.25 |    57.8409  |
| o3-2025-04-16                       |   1079.39 |        1072.86 |        1085.92 |    11.0919  |
| chatgpt-4o-latest-20250326          |   1078.14 |        1071.33 |        1084.94 |    12.0447  |
| gemini-2.5-pro-preview-05-06        |   1074.8  |        1064.55 |        1085.05 |    27.3722  |
| deepseek-r1-0528                    |   1074.48 |        1067.19 |        1081.78 |    13.8388  |
| grok-3-preview-02-24                |   1071.28 |        1063.7  |        1078.85 |    14.9286  |
| llama-4-maverick-03-26-experimental |   1067.21 |        1059.38 |        1075.04 |    15.953   |
| gemini-2.5-flash                    |   1061.26 |        1055.31 |        1067.22 |    9.21695  |

See the examples folder for notebooks with more advanced examples, covering techniques such as the style-controlled leaderboard on LMArena, analysis of voter patterns on the PRISM dataset, and analysis of sports and video game competitions using the Bradley-Terry methodology.

Contributing

We welcome and encourage contributions. To develop Arena-Rank, make sure to install the development dependencies and the git pre-commit hooks.

uv sync --group dev
pre-commit install

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arena_rank-0.1.1.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arena_rank-0.1.1-py3-none-any.whl (43.2 kB view details)

Uploaded Python 3

File details

Details for the file arena_rank-0.1.1.tar.gz.

File metadata

  • Download URL: arena_rank-0.1.1.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for arena_rank-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4e569e2534403f826031456eb1bc0056af16fbbc7840a7cead4ea35aec2d8f93
MD5 ca466e937fbd1004cabbc3dc0fd2a5e3
BLAKE2b-256 0aa7a33f8e8fd2b2b53d74da0d08554f29463522fde101662c99263e3cadd827

See more details on using hashes here.

Provenance

The following attestation bundles were made for arena_rank-0.1.1.tar.gz:

Publisher: publish.yaml on lmarena/arena-rank

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arena_rank-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: arena_rank-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 43.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for arena_rank-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e0f64e3c01a9bc141a6bd22d6548d49064ed3688286f826ce24d85a3b3dad3ce
MD5 a071bbbc1dfac0f8f86a601740448da8
BLAKE2b-256 52197e9cdcd56e5f1e3a746e23312cf54bf12966eba305f4b100aa0014989a83

See more details on using hashes here.

Provenance

The following attestation bundles were made for arena_rank-0.1.1-py3-none-any.whl:

Publisher: publish.yaml on lmarena/arena-rank

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page