Skip to main content

Ranking methodology powering the LMArena leaderboard.

Project description

LMArena logo

Arena-Rank: The ranking methodology powering the LMArena leaderboard.

| LMArena | Blog | X | Discord | LinkedIn |

Installation

From pip: pip install arena-rank

From source:

git clone https://github.com/lmarena/arena-rank && cd arena-rank
uv sync

Examples

Below is a minimal example using Arena-Rank to produce a leaderboard on LMArena data:

import pandas as pd
import datasets
from arena_rank.utils.data_utils import PairDataset
from arena_rank.models.bradley_terry import BradleyTerry

df = datasets.load_dataset(
    "lmarena-ai/arena-human-preference-140k",
    columns=["model_a", "model_b", "winner"]
)["train"].to_pandas()

dataset = PairDataset.from_pandas(df)
model = BradleyTerry(n_competitors=len(dataset.competitors))

# compute ratings and 95% confidence intervals
results = model.compute_ratings_and_cis(dataset, significance_level=0.05)

We visualize the top 10 models on the leaderboard:

# print top 10 competitors with ratings and confidence intervals
leaderboard = pd.DataFrame(results).sort_values("ratings", ascending=False).head(10)
print(leaderboard.to_markdown(index=False))
| competitors                         |   ratings |   rating_lower |   rating_upper |   variances |
|:------------------------------------|----------:|---------------:|---------------:|------------:|
| gemini-2.5-pro                      |   1124.07 |        1117.61 |        1130.53 |    10.8542  |
| gemini-2.5-pro-preview-03-25        |   1097.88 |        1082    |        1113.77 |    65.6717  |
| grok-4-0709                         |   1093.34 |        1078.44 |        1108.25 |    57.8409  |
| o3-2025-04-16                       |   1079.39 |        1072.86 |        1085.92 |    11.0919  |
| chatgpt-4o-latest-20250326          |   1078.14 |        1071.33 |        1084.94 |    12.0447  |
| gemini-2.5-pro-preview-05-06        |   1074.8  |        1064.55 |        1085.05 |    27.3722  |
| deepseek-r1-0528                    |   1074.48 |        1067.19 |        1081.78 |    13.8388  |
| grok-3-preview-02-24                |   1071.28 |        1063.7  |        1078.85 |    14.9286  |
| llama-4-maverick-03-26-experimental |   1067.21 |        1059.38 |        1075.04 |    15.953   |
| gemini-2.5-flash                    |   1061.26 |        1055.31 |        1067.22 |    9.21695  |

See the examples folder for notebooks with more advanced examples, covering techniques such as the style-controlled leaderboard on LMArena, analysis of voter patterns on the PRISM dataset, and analysis of sports and video game competitions using the Bradley-Terry methodology.

Contributing

We welcome and encourage contributions. To develop Arena-Rank, make sure to install the development dependencies and the git pre-commit hooks.

uv sync --group dev
pre-commit install

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arena_rank-0.1.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arena_rank-0.1.0-py3-none-any.whl (43.2 kB view details)

Uploaded Python 3

File details

Details for the file arena_rank-0.1.0.tar.gz.

File metadata

  • Download URL: arena_rank-0.1.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for arena_rank-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fa349a499ecb3e1566a2771eef5e7caca6c083299d9c85e3c5b3065ba2f325bc
MD5 4b2f1189939495660b2a24c046c34ecd
BLAKE2b-256 e578e24c81249ef21e794cbf7c835ee93a0607fa06442056bf62c532f93f6066

See more details on using hashes here.

Provenance

The following attestation bundles were made for arena_rank-0.1.0.tar.gz:

Publisher: publish.yaml on lmarena/arena-rank

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arena_rank-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: arena_rank-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 43.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for arena_rank-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2f105a7648f543bc22890e187d5db299cc9a412484df821e71d9ba4e6a8e8f42
MD5 8722e04b8ce1714a29811050c99c9d7b
BLAKE2b-256 c8a710db1c388cab224c31feb20c9dad7e7158b0b90739b88fd4102ae5ab6558

See more details on using hashes here.

Provenance

The following attestation bundles were made for arena_rank-0.1.0-py3-none-any.whl:

Publisher: publish.yaml on lmarena/arena-rank

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page