Skip to main content

A library for game-theoretic evaluation and ratings.

Project description

polarix

Unittests PyPI version

Overview

The name polarix draws from the Polaris star system, a guiding star, and ends in 'x' to reflect its ties to the JAX ecosystem.

polarix is an accelerated equilibrium solving and evaluation library for computing interpretable ratings at game-theoretic equilibria.

The game-theoretic approach dynamically adjusts the relevance of each action (e.g. an evaluation task, a candidate model, an agent) based on how they interact with each other. The rating equilibrium that is selected continually adapts to the capability frontiers of each player based on an overarching evaluation objective that you define.

What is polarix for?

  • Evaluation: polarix is designed for dynamic evaluation systems where new candidates and tasks are continually introduced and where one may wish to know the value of each candidate and each task.
  • Training: polarix can be used to identify frontier candidates and frontier tasks, making training more robust and efficient.
  • Research: polarix implements accelerated equilibrium solvers for n-player general-sum games, which can also serve as baselines for game-theory research in equilibrium solving and selection.

Installation

You can install polarix from PyPi:

pip install -U polarix

or from source, with no stability guarantees.

pip install git+git://github.com/google-deepmind/polarix.git

Quick Start

Here's a simple example of how to use polarix to rate agents based on their performance on a set of tasks.

import numpy as np
import polarix as plx

agents = np.array(['skew_a', 'skew_b', 'skew_c', 'weak', 'strong'])
tasks = np.array(['task_a', 'task_b', 'task_c'])
scores = np.asarray([
    [6.0, 4.0, 3.0],  # skew_a
    [3.0, 5.0, 2.0],  # skew_b
    [1.0, 3.0, 7.0],  # skew_c
    [3.0, 4.0, 3.0],  # weak
    [5.0, 4.0, 5.0],  # strong
])
scores_stddev = np.full_like(scores, fill_value=0.1)

# 1. Define the evaluation game from an agent-vs-task score matrix.
# From this agent-vs-task score matrix, we construct a 3-player game between a
#  'task' player and two 'agent' players.
#
# Each agent player chooses an agent and is rewarded for outperforming
#  competition on the task selected by the task player. The task player is
#  rewarded by the agent players' score difference, i.e. separating the agents.
#
# The `plx.agent_vs_task` helper function constructs such a 3-player game from
#  an agent-vs-task score matrix. Instances of `plx.Game` can be constructed
#  directly from payoff tensors as well.
game = plx.agent_vs_task_game(
    agents=agents, tasks=tasks, agent_vs_task=scores, normalizer='winrate'
)

# 2. Solve for the max-entropy correlated equilibrium strategy and ratings.
res = plx.solve(game, plx.ce_maxent)

# 3. Analyze agent ratings in terms of comparative strengths and weaknesses.
chart = plx.plot_rating_contribution(
    game,
    joint=res.joint,
    rating_player=1,
    contrib_player=0,
    use_categorical_contrib=True,
)

Executing chart.display() shows agent ratings, broken down by task.

quickstart_rating_contribution

Each model's total score (red diamond) is the sum of its comparative strengths (positive bars) and weaknesses (negative bars), all measured relative to an equilibrium strategy. By definition of our ratings, the maximum possible rating is zero, achieved by the strong generalist model. The blue dashed line shows the probability that each agent is played at the equilibrium. Note that specialist agents all received significant probability mass at the equilibrium, showing that the top-ranked agent does not dominate competing agents on all tasks.

References

If you find this library useful, please consider citing it:

@inproceedings{
  liu2025reevaluating,
  title={Re-evaluating Open-ended Evaluation of Large Language Models},
  author={Siqi Liu and Ian Gemp and Luke Marris and Georgios Piliouras and Nicolas Heess and Marc Lanctot},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
  url={https://openreview.net/forum?id=kbOAIXKWgx}
}

This project also builds on these published works:

  • Balduzzi, David, et al. "Re-evaluating evaluation." Advances in Neural Information Processing Systems 31 (2018).
  • Gemp, Ian, Luke Marris, and Georgios Piliouras. "Approximating Nash Equilibria in Normal-Form Games via Stochastic Optimization." The Twelfth International Conference on Learning Representations.
  • Marris, Luke, et al. "Multi-agent training beyond zero-sum with correlated equilibrium meta-solvers." International Conference on Machine Learning. 2021.
  • Gemp, Ian, et al. "Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent." Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. 2022.

Disclaimer

This is not an officially supported Google product.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polarix-0.1.1.tar.gz (40.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polarix-0.1.1-py3-none-any.whl (58.1 kB view details)

Uploaded Python 3

File details

Details for the file polarix-0.1.1.tar.gz.

File metadata

  • Download URL: polarix-0.1.1.tar.gz
  • Upload date:
  • Size: 40.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for polarix-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f25d5224a329df13979c14697b28d7a663cf3472e2be7e232aa61746d8382a80
MD5 aae99b49d64946954ce23a28473b4c2f
BLAKE2b-256 8fdcda5a83f7fbafcb42539a915e6f5bd07bf1a97e0e6879c64358d89e86864a

See more details on using hashes here.

File details

Details for the file polarix-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: polarix-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 58.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for polarix-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 369ca0efa554eee9c49043b4fd3d35ed05890a7eba512732f3a85742fa750dc3
MD5 4cead3f49ebb7d93f16b458ebca3b503
BLAKE2b-256 d58a6c90298b252fcb8d1e35037de7f1de5330dbc9f5fdda29e4799395a60776

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page