Ranking methodology powering the LMArena leaderboard.
Project description
Arena-Rank: The ranking methodology powering the LMArena leaderboard.
| LMArena | Blog | X | Discord | LinkedIn |
Installation
From pip:
pip install arena-rank
From source:
git clone https://github.com/lmarena/arena-rank && cd arena-rank
uv sync
Examples
Below is a minimal example using Arena-Rank to produce a leaderboard on LMArena data:
import pandas as pd
import datasets
from arena_rank.utils.data_utils import PairDataset
from arena_rank.models.bradley_terry import BradleyTerry
df = datasets.load_dataset(
"lmarena-ai/arena-human-preference-140k",
columns=["model_a", "model_b", "winner"]
)["train"].to_pandas()
dataset = PairDataset.from_pandas(df)
model = BradleyTerry(n_competitors=len(dataset.competitors))
# compute ratings and 95% confidence intervals
results = model.compute_ratings_and_cis(dataset, significance_level=0.05)
# print top 10 competitors with ratings and confidence intervals
leaderboard = pd.DataFrame(results).sort_values("ratings", ascending=False).head(10)
print(leaderboard.to_markdown(index=False))
| competitors | ratings | rating_lower | rating_upper | variances |
|:------------------------------------|----------:|---------------:|---------------:|------------:|
| gemini-2.5-pro | 1124.07 | 1117.61 | 1130.53 | 10.8542 |
| gemini-2.5-pro-preview-03-25 | 1097.88 | 1082 | 1113.77 | 65.6717 |
| grok-4-0709 | 1093.34 | 1078.44 | 1108.25 | 57.8409 |
| o3-2025-04-16 | 1079.39 | 1072.86 | 1085.92 | 11.0919 |
| chatgpt-4o-latest-20250326 | 1078.14 | 1071.33 | 1084.94 | 12.0447 |
| gemini-2.5-pro-preview-05-06 | 1074.8 | 1064.55 | 1085.05 | 27.3722 |
| deepseek-r1-0528 | 1074.48 | 1067.19 | 1081.78 | 13.8388 |
| grok-3-preview-02-24 | 1071.28 | 1063.7 | 1078.85 | 14.9286 |
| llama-4-maverick-03-26-experimental | 1067.21 | 1059.38 | 1075.04 | 15.953 |
| gemini-2.5-flash | 1061.26 | 1055.31 | 1067.22 | 9.21695 |
See the examples folder for notebooks with more advanced examples, covering techniques such as the style-controlled leaderboard on LMArena, analysis of voter patterns on the PRISM dataset, and analysis of sports and video game competitions using the Bradley-Terry methodology.
Contributing
We welcome and encourage contributions. To develop Arena-Rank, make sure to install the development dependencies and the git pre-commit hooks.
uv sync --group dev
pre-commit install
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file arena_rank-0.1.1.tar.gz.
File metadata
- Download URL: arena_rank-0.1.1.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e569e2534403f826031456eb1bc0056af16fbbc7840a7cead4ea35aec2d8f93
|
|
| MD5 |
ca466e937fbd1004cabbc3dc0fd2a5e3
|
|
| BLAKE2b-256 |
0aa7a33f8e8fd2b2b53d74da0d08554f29463522fde101662c99263e3cadd827
|
Provenance
The following attestation bundles were made for arena_rank-0.1.1.tar.gz:
Publisher:
publish.yaml on lmarena/arena-rank
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
arena_rank-0.1.1.tar.gz -
Subject digest:
4e569e2534403f826031456eb1bc0056af16fbbc7840a7cead4ea35aec2d8f93 - Sigstore transparency entry: 776122219
- Sigstore integration time:
-
Permalink:
lmarena/arena-rank@84703babe02740ec42bf6fa6174d03df243dc7c0 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/lmarena
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@84703babe02740ec42bf6fa6174d03df243dc7c0 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file arena_rank-0.1.1-py3-none-any.whl.
File metadata
- Download URL: arena_rank-0.1.1-py3-none-any.whl
- Upload date:
- Size: 43.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0f64e3c01a9bc141a6bd22d6548d49064ed3688286f826ce24d85a3b3dad3ce
|
|
| MD5 |
a071bbbc1dfac0f8f86a601740448da8
|
|
| BLAKE2b-256 |
52197e9cdcd56e5f1e3a746e23312cf54bf12966eba305f4b100aa0014989a83
|
Provenance
The following attestation bundles were made for arena_rank-0.1.1-py3-none-any.whl:
Publisher:
publish.yaml on lmarena/arena-rank
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
arena_rank-0.1.1-py3-none-any.whl -
Subject digest:
e0f64e3c01a9bc141a6bd22d6548d49064ed3688286f826ce24d85a3b3dad3ce - Sigstore transparency entry: 776122220
- Sigstore integration time:
-
Permalink:
lmarena/arena-rank@84703babe02740ec42bf6fa6174d03df243dc7c0 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/lmarena
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@84703babe02740ec42bf6fa6174d03df243dc7c0 -
Trigger Event:
workflow_dispatch
-
Statement type: