LLM-powered game agents for chess, Go, poker, social deduction, and 20+ other games
Project description
haive-games
LLM-powered game agents for chess, Go, poker, social deduction, and 20+ other games.
A curated suite of game environments where LLM agents play against each other (or themselves). Each game is a complete implementation: state manager, role-based player agents, prompt templates, and end-to-end workflows. Use it for agent evaluation, reinforcement learning research, multi-agent benchmarks, or just fun.
Why haive-games?
LLM agents need environments to be evaluated in. Traditional benchmarks (MMLU, HumanEval) test knowledge and code, but not strategic reasoning, social deduction, or long-horizon planning. Games are an ideal test bed:
- Strategic depth — Chess, Go, Risk test long-term planning
- Social reasoning — Mafia, Among Us, Clue test deception, trust, and inference
- Hidden information — Poker, Battleship test reasoning under uncertainty
- Cooperation — Debate, Hold'em test negotiation and coordination
- Bounded scope — Each game has clear rules, a defined state space, and measurable outcomes
haive-games provides 23 working game environments with the framework already built. You configure the LLMs, run a game, and get a complete trace.
Framework Architecture
Every game follows the same pattern. Three core abstractions:
GameStateManager[T]
Generic state transition manager. Each game implements four methods:
class ChessStateManager(GameStateManager[ChessState]):
def initialize_state(self) -> ChessState:
"""Create starting state."""
return ChessState(board=chess.Board())
def apply_move(self, state: ChessState, move: Move) -> ChessState:
"""Apply a move and return new state."""
new_board = state.board.copy()
new_board.push(move.to_chess_move())
return state.copy(update={"board": new_board})
def get_legal_moves(self, state: ChessState) -> list[Move]:
"""Return all legal moves for current player."""
return [Move.from_chess_move(m) for m in state.board.legal_moves]
def check_game_status(self, state: ChessState) -> GameStatus:
"""Check if game is over and who won."""
if state.board.is_checkmate():
return GameStatus(over=True, winner="white" if state.board.turn == chess.BLACK else "black")
return GameStatus(over=False)
GameAgent
Base class for game agents. Implements the standard workflow:
initialize → loop:
current_player.move(state) → apply_move → analyze → check_status → break_if_over
MultiPlayerGameAgent
Extension for multi-player games with role-based player configs. Each role gets its own AugLLMConfig (so you can have GPT-4 play white and Claude play black, for example).
Game Catalog (23 Working Demos)
🏁 Two-Player Board Games
Chess — Full chess with python-chess. White vs Black, legal move generation, checkmate detection.
poetry run python demos/games/14_chess.py
Go — 9x9 or 19x19 board with sgfmill. Territory scoring, ko rule, capture detection.
poetry run python demos/games/15_go.py
Other: Tic Tac Toe, Connect4, Checkers, Reversi (Othello), Mancala, Nim, Fox and Geese, Battleship, Mastermind.
🕵️ Multi-Player Social Deduction
Among Us — Players have roles (crewmate, impostor). Tasks, voting, sabotage, discussion. Tests deception and inference.
Mafia — Classic werewolf-style game. Day/night cycles, voting, role abilities.
Clue — Murder mystery deduction. Players make accusations and eliminate possibilities.
Debate — Structured argumentation. Tests persuasion and counter-argument generation.
🃏 Multi-Player Card/Strategy
Hold'em — Texas Hold'em poker. Betting rounds, hand evaluation, bluffing.
Poker — General poker variants.
Dominoes — Tile-based game with chains.
Risk — Strategy game with territory control and battles.
🟢 Single-Player Puzzles
Flow Free — Connect colored dots without crossing paths.
Wordle — Word guessing with feedback.
Rubiks Cube — Cube solving (uses real cube state).
2048 — Sliding tile puzzle.
Towers of Hanoi — Classic disk-moving puzzle.
Quick Start
Run a Demo
# Install
pip install haive-games
# For Chess support:
pip install haive-games[games-chess]
# For Go support:
pip install haive-games[go]
# Run any demo from the parent repo
poetry run python demos/games/14_chess.py
poetry run python demos/games/28_among_us.py
poetry run python demos/games/41_mafia.py
Programmatic Usage
from haive.games.chess.agent import ChessAgent
from haive.games.chess.config import ChessConfig
from haive.core.engine.aug_llm import AugLLMConfig
# Configure both players
config = ChessConfig(
aug_llm_configs={
"white": AugLLMConfig(
temperature=0.3,
system_message="You are a chess grandmaster playing white.",
model="gpt-4o",
),
"black": AugLLMConfig(
temperature=0.3,
system_message="You are a chess grandmaster playing black.",
model="claude-opus-4-6",
),
}
)
# Run a game
agent = ChessAgent(config)
result = agent.run_game()
# Inspect the result
print(f"Winner: {result.winner}")
print(f"Moves: {len(result.move_history)}")
print(f"Final board:\n{result.final_state.board}")
Multi-Player Game with Roles
from haive.games.among_us.agent import AmongUsAgent
from haive.games.among_us.config import AmongUsConfig
config = AmongUsConfig(
n_players=8,
n_impostors=2,
aug_llm_configs={
"crewmate": AugLLMConfig(temperature=0.5),
"impostor": AugLLMConfig(temperature=0.7), # More creative
}
)
agent = AmongUsAgent(config)
result = agent.run_game()
Use Cases
Agent Evaluation
Compare different LLMs head-to-head:
gpt4 = AugLLMConfig(model="gpt-4o")
claude = AugLLMConfig(model="claude-opus-4-6")
config = ChessConfig(aug_llm_configs={"white": gpt4, "black": claude})
results = [ChessAgent(config).run_game() for _ in range(10)]
gpt4_wins = sum(1 for r in results if r.winner == "white")
print(f"GPT-4 won {gpt4_wins}/10 games as white")
Strategic Reasoning Benchmarks
Use Chess, Go, or Risk to test long-horizon planning capabilities. The state manager provides ground-truth legal moves and outcomes.
Social Deduction Research
Among Us, Mafia, and Clue are ideal for testing theory of mind, deception, and Bayesian reasoning under uncertainty.
Creative Writing Evaluation
Use Debate to evaluate persuasion. Use Social Media Conversation to test personality consistency.
Writing a New Game
The framework makes it straightforward:
- State —
state.pywith a Pydantic model - State Manager —
state_manager.pyimplementingGameStateManager[T] - Config —
config.pyextendingGameConfigwithaug_llm_configs - Agent —
agent.pyextendingGameAgentorMultiPlayerGameAgent - Prompts —
prompts.pywith role-specific system messages - Demo —
demos/games/{NN}_{name}.py
See src/haive/games/tic_tac_toe/ for a minimal example or src/haive/games/chess/ for a complex one.
Installation
pip install haive-games
# With chess support
pip install haive-games[games-chess]
# With Go support
pip install haive-games[go]
Documentation
📖 Full documentation: https://pr1m8.github.io/haive-games/
Related Packages
| Package | Description |
|---|---|
| haive-core | Foundation: engines, graphs |
| haive-agents | Production agents (used by game agents) |
License
MIT © pr1m8
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file haive_games-1.0.1.tar.gz.
File metadata
- Download URL: haive_games-1.0.1.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d63915894b7c395a7044c833a36ea5f7193a87d686bddb5788947b1821697e56
|
|
| MD5 |
7b6876bcbbfc0adda2b04245fe0faae8
|
|
| BLAKE2b-256 |
511b2ff8aed1aff4396cd2378f8cc5071c50616be833e43dc1b85482c1158e9a
|
File details
Details for the file haive_games-1.0.1-py3-none-any.whl.
File metadata
- Download URL: haive_games-1.0.1-py3-none-any.whl
- Upload date:
- Size: 1.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd3a4eff52fe8c5e19649f6a03f0d9fe5893c94de7672eaf1364aa125ccd2285
|
|
| MD5 |
1ca80e935c629b74f0f28cc8086bcafd
|
|
| BLAKE2b-256 |
1a05787b54df0be3d5c3bd1ddf60dadd38b133a3128dacbff48c98be4b5ab429
|