Skip to main content

Cross-paradigm simulation engine for game-theoretic agent research

Project description

PolicyArena

PyPI version Python 3.12+ CI License: MIT

A simulation engine for game-theoretic agent research. PolicyArena lets you pit rule-based strategies, reinforcement learning, and LLM-powered agents against each other in classic game theory scenarios — all within the same run. Define experiments in YAML, run them from Python or the CLI, and compare how different decision-making paradigms perform under identical conditions: same game, same seed, same metrics.

The engine ships with a growing library of games — from Prisoner's Dilemma to SIR Epidemic — and a plug-in system that makes adding new ones straightforward. All built-in games are deployed to policyarena.dev, and new games added to the repo will appear there automatically.

Built for anyone running game-theory simulations — researchers, students, RL practitioners, and multi-agent systems developers. Works great without LLMs too; the core engine runs rule-based and RL experiments with zero external API dependencies.

Schelling Segregation — spatial grid simulation

Schelling Segregation on policyarena.dev — agents self-organize into clusters despite mild preferences

Prisoner's Dilemma — agent detail view

Prisoner's Dilemma agent breakdown — per-opponent stats, round-by-round matchups, and cumulative payoff


Table of Contents


Features

  • Growing game library — pairwise, N-player, and spatial/network games covering classic game theory (see full list)
  • Three agent paradigms — rule-based strategies, tabular RL (Q-learning, bandit, best response), and LLM-powered agents (Claude, GPT, Gemini, Ollama)
  • Unified Brain interface — all paradigms implement decide() / update() / reset(), making them directly comparable
  • YAML-driven experiments — define games, agents, and parameters in config; built-in scenarios included for every game
  • Python API + CLIpa.run() for notebooks, policy-arena run for the terminal
  • Pluggable game system — add games as self-contained packages with auto-discovery; third-party games register via entry points
  • Built on Mesa 3 — leverages Mesa's scheduling, topologies, and data collection
  • LLM integration via LangChain — structured output with Pydantic, batch decisions, conversation history, configurable personas
  • Reproducible by default — all runs are seeded; configs are snapshot-able
  • Lightweight core — installs without LLM dependencies; [llm] extra adds provider SDKs only when needed

Installation

pip install policy-arena

This installs the core package (rule-based + RL agents). For LLM-powered agents:

pip install policy-arena[llm]

Or install everything:

pip install policy-arena[all]

With uv:

uv add policy-arena            # core only
uv add policy-arena[llm]       # + LLM support
uv add policy-arena[all]       # everything

Requires Python 3.12+

Quick Start

Run a Built-in Example (No Config Needed)

# List built-in scenarios
policy-arena examples

# Run one instantly
policy-arena run --example pd_rl_vs_rulebased --no-save

Python API

import policy_arena as pa

# Run a built-in scenario
results = pa.run(pa.get_scenario_path("pd_rl_vs_rulebased"))

# Access results as pandas DataFrames
print(results.model_metrics.tail())
print(results.agent_metrics.tail())

# Override parameters
results = pa.run(pa.get_scenario_path("pd_rl_vs_rulebased"), seed=123, rounds=500)

# List available games
pa.list_games()
# ['battle_of_sexes', 'chicken', 'commons', 'cournot', 'el_farol',
#  'auction', 'hawk_dove', 'info_cascade', 'lobbying',
#  'minority_game', 'network_formation', 'prisoners_dilemma',
#  'public_goods', 'schelling', 'sir', 'stag_hunt', 'trust_game',
#  'ultimatum', 'voting']

# Inspect a game's available strategies
registry = pa.get_registry()
reg = registry.get("prisoners_dilemma")
print(sorted(reg.brain_factories.keys()))
# ['always_cooperate', 'always_defect', 'bandit', 'best_response',
#  'llm', 'pavlov', 'q_learning', 'random', 'tit_for_tat']

# List built-in scenarios
pa.list_scenarios()
# ['battle_of_sexes_coordination', 'chicken_brinkmanship', ...]

Example Output

Running the built-in Prisoner's Dilemma scenario produces two DataFrames — model-level and agent-level metrics per round:

Model metrics (aggregate per round):

     cooperation_rate  nash_eq_distance  social_welfare  strategy_entropy
195          0.333333          0.466667        0.600000          0.918296
196          0.366667          0.533333        0.633333          0.948078
197          0.333333          0.466667        0.600000          0.918296
198          0.366667          0.533333        0.633333          0.948078
199          0.333333          0.466667        0.600000          0.918296

Agent metrics (per agent per round):

               cumulative_payoff  round_payoff  cooperation_rate                  brain_name             label
Step  AgentID
200.0 1                   1816.0           9.0               0.4                 tit_for_tat               tft
      2                   2232.0           9.0               0.0               always_defect     always_defect
      3                   1230.0           6.0               1.0            always_cooperate  always_cooperate
      4                   1516.0           8.0               0.6                      pavlov            pavlov
      5                   2190.0           9.0               0.0  q_learning(lr=0.15,e=0.01)         q_learner
      6                   2224.0          13.0               0.0               best_response         best_resp

Both are standard pandas DataFrames — filter, plot, or export however you like.

CLI

# List all games and their strategies
policy-arena games

# Show detailed info about a game
policy-arena info prisoners_dilemma

# Run from a YAML config
policy-arena run scenarios/pd_rl_vs_rulebased.yaml

# Run with overrides
policy-arena run scenarios/pd_rl_vs_rulebased.yaml --seed 42 --no-save

# Run a built-in example (no file needed)
policy-arena run --example pd_rl_vs_rulebased

# Validate a config without running
policy-arena validate scenarios/pd_rl_vs_rulebased.yaml

# Export results as JSON and YAML
policy-arena run scenarios/pd_rl_vs_rulebased.yaml --export-json --export-yaml

# Show version
policy-arena version

YAML Config

name: "PD  RL vs Rule-Based"
game: prisoners_dilemma
rounds: 200
seed: 42
agents:
  - name: tft
    strategy: tit_for_tat
    count: 3
  - name: always_defect
    strategy: always_defect
    count: 3
  - name: q_learner
    type: rl
    strategy: q_learning
    count: 2
    parameters:
      learning_rate: 0.15
      epsilon: 0.2
game_params:
  payoff_matrix:
    cc: [3, 3]
    cd: [0, 5]
    dc: [5, 0]
    dd: [1, 1]

Every game has a built-in scenario. See them with policy-arena examples.

How It Works

Every agent is controlled by a Brain — the same interface regardless of paradigm:

class Brain(ABC):
    def decide(self, observation) -> action   # Choose an action
    def update(self, result) -> None          # Learn from outcome
    def reset(self) -> None                   # Reset for new game

A Tit-for-Tat brain is 4 lines. A Q-learning brain maintains a Q-table. An LLM brain makes an API call to Claude/GPT/Gemini. The engine doesn't care — same interface, same metrics, same run loop.

YAML Config  →  Scenario  →  Mesa Model  →  RunResults
(or Python)     (dataclass)   (step loop)    (DataFrames)

Games are Mesa 3 models. Each step: agents decide simultaneously, the model resolves outcomes, brains learn. Mesa handles scheduling, topologies, and data collection.

See the architecture docs for the full design with code examples.

Games

Pairwise (Round-Robin)

Game Description
Prisoner's Dilemma Classic cooperation vs defection dilemma
Stag Hunt Risky cooperation (stag) vs safe defection (hare)
Hawk-Dove Aggression vs sharing over a resource
Chicken Anti-coordination — swerve or crash
Battle of the Sexes Coordination with conflicting preferences
Trust Game Sender sends money (multiplied), receiver returns a share
Ultimatum Proposer offers a split, responder accepts or rejects

N-Player (Collective)

Game Description
Public Goods Contribute to a shared pool, multiplied and split equally
Cournot Oligopoly Firms choose production quantities; market price falls with total output
El Farol Bar Attend only if crowd is below threshold
Tragedy of the Commons Extract from a shared renewable resource
Minority Game Choose between two options — minority wins
Voting & Election N voters elect candidates under plurality, approval, or Borda rules
Sealed-Bid Auction First-price or second-price (Vickrey) sealed-bid auction with private values
Information Cascade Sequential binary decisions with private signals — herding dynamics
Lobbying Contest Tullock rent-seeking contest — spend to win a prize, highest spender most likely wins

Spatial / Network

Game Description
Schelling Segregation Agents on a grid relocate based on neighbor similarity
SIR Epidemic Disease spread on network with strategic isolation
Network Formation Agents form links; payoffs depend on network position and link costs

All pairwise and collective games support rule-based, RL, and LLM agents. Spatial/network games support rule-based and RL.

Try these games interactively at policyarena.dev.

Agent Types

Rule-based (brains/rule_based/) — Fixed strategies: Tit-for-Tat, Always Cooperate, Always Defect, Pavlov, Random, plus game-specific heuristics. Deterministic, fast, interpretable.

Reinforcement Learning (brains/rl/) — Tabular Q-learning with epsilon-greedy exploration, best response (tracks opponent frequencies), and multi-armed bandit. Configurable learning_rate, epsilon, epsilon_decay, discount, seed.

LLM-powered (brains/llm/) — Language model agents via LangChain. Uses Pydantic schemas with with_structured_output() for reliable action parsing. Supports configurable personas, conversation history, batch decisions (one LLM call per round), and fallback actions on failure.

LLM Setup

Requires pip install policy-arena[llm]

Set API keys as environment variables:

export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
export GOOGLE_API_KEY=...

Or use a .env file. For local models, run Ollama and use provider: ollama in your config.

Provider Package Example Model
Anthropic langchain-anthropic claude-sonnet-4-6
OpenAI langchain-openai gpt-5.4
Google langchain-google-genai gemini-3.1-flash
Ollama (local) langchain-ollama llama4

Optional Langfuse tracing is supported for LLM observability.

Extending with New Games

Games self-register via the GameRegistration system. Create a new package under policy_arena/games/:

# policy_arena/games/my_game/__init__.py
from policy_arena.registration import GameRegistration
from .model import MyGameModel
from .brains import StrategyA, StrategyB

REGISTRATION = GameRegistration(
    id="my_game",
    model_class=MyGameModel,
    brain_factories={
        "strategy_a": lambda **_: StrategyA(),
        "strategy_b": lambda **kw: StrategyB(param=kw.get("param", 1.0)),
    },
)

The game is auto-discovered on next import — no need to edit any central file. Third-party packages can also register via entry points:

# In your package's pyproject.toml
[project.entry-points."policy_arena.games"]
my_game = "my_package.games.my_game"

See the architecture docs for the full game package structure and extending guide.

Error Handling

All domain errors inherit from PolicyArenaError and carry machine-readable code, message, and details fields:

from policy_arena.errors import GameNotFoundError, StrategyNotFoundError

try:
    pa.run(config)
except GameNotFoundError as e:
    print(e.code)     # "GAME_NOT_FOUND"
    print(e.details)  # {"game_id": "...", "available": [...]}
except StrategyNotFoundError as e:
    print(e.code)     # "STRATEGY_NOT_FOUND"
Error Code When
GameNotFoundError GAME_NOT_FOUND Game ID not in registry
StrategyNotFoundError STRATEGY_NOT_FOUND Strategy not registered for a game
ConfigValidationError CONFIG_VALIDATION_ERROR Scenario config fails validation
SimulationError SIMULATION_ERROR Simulation fails during execution
LLMProviderError LLM_PROVIDER_ERROR LLM provider call fails irrecoverably
LLMNotInstalledError LLM_NOT_INSTALLED LLM deps missing

Development

git clone https://github.com/BaklazhenkoNikita/policyarena.git
cd policyarena
uv sync --all-extras          # install all optional deps
uv run pre-commit install     # set up ruff check + format hooks
uv run pytest tests/ -x       # run tests
uv run ruff check src/ tests/
uv run ruff format --check src/ tests/
uv run mypy src/policy_arena/

CI runs on Python 3.12 and 3.13 with lint, format check, type check, and tests (65% coverage threshold).

Contributing

See CONTRIBUTING.md for setup, code style, and how to add new games.

Short version: fork, create a feature branch, open a PR targeting main.

Built With

  • Mesa 3 — Agent-based modeling (scheduling, topologies, data collection)
  • LangChain — Provider-agnostic LLM integration
  • Pydantic — Config validation and LLM structured output
  • Polars — Parquet output for results
  • Typer — CLI
  • Langfuse — Optional LLM tracing

License

MIT — Copyright 2026 Nikita Baklazhenko


See CHANGELOG.md for release history.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

policy_arena-0.1.5.tar.gz (191.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

policy_arena-0.1.5-py3-none-any.whl (286.0 kB view details)

Uploaded Python 3

File details

Details for the file policy_arena-0.1.5.tar.gz.

File metadata

  • Download URL: policy_arena-0.1.5.tar.gz
  • Upload date:
  • Size: 191.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for policy_arena-0.1.5.tar.gz
Algorithm Hash digest
SHA256 872ddf7456ae2fe296714d27e3b0858ea0781b831d3485377f6461613ad31d1c
MD5 a7049419d9f5f8c0431a82b70eb8b4e9
BLAKE2b-256 5150076b13e61f07c37f2190a05927e60f6fabe9997e9ec14f50ade6b65f4b4c

See more details on using hashes here.

File details

Details for the file policy_arena-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: policy_arena-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 286.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for policy_arena-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 c180d9ec5cfd2e27015050e5dd9ae0bad5e2b145fec38e21d5f352fbc308e183
MD5 af5e733b06e44553269b9bf37a9982f7
BLAKE2b-256 364625c4b3d98b8ffe4ac03f0e398f4dc3bd0ca761792218c70b717a26076db4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page