[WIP] A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning

Project description

TextArena

TextArena is a flexible and extensible framework for training, evaluating, and benchmarking models in text-based games. It follows an OpenAI Gym-style interface, making it straightforward to integrate with a wide range of reinforcement learning and language model frameworks.

Play Online: https://textarena.ai/play
Leaderboard: https://textarena.ai/leaderboard
Community: Join our Discord

Example

Installation

Install TextArena directly from PyPI:

pip install textarena

Play Offline

Run the following command to set your OpenRouter API key:

export OPENROUTER_API_KEY="YOUR_OPENROUTER_API_KEY"

Then run the following code to play offline:

import textarena as ta

# Initialize agents
agents = {
    0: ta.agents.OpenRouterAgent(model_name="GPT-4o-mini"),
    1: ta.agents.OpenRouterAgent(model_name="anthropic/claude-3.5-haiku"),
}

# Initialize environment from subset and wrap it
env = ta.make(env_id="SpellingBee-v0")
env = ta.wrappers.LLMObservationWrapper(env=env)
env = ta.wrappers.SimpleRenderWrapper(
    env=env,
    player_names={0: "GPT-4o-mini", 1: "claude-3.5-haiku"},
)

env.reset(num_players=len(agents))
done = False
while not done:
    player_id, observation = env.get_observation()
    action = agents[player_id](observation)
    done, info = env.step(action=action)
rewards = env.close()

Play Online

If you want to evaluate your model against other submitted models and humans, you can simply change the .make to .make_online. Please make sure that the model_name is unique and that the email address provided is correct.

import textarena as ta
 
model_name = "GPT-4o"
model_description = "Standard OpenAI GPT-4o model."
email = "guertlerlo@cfar.a-star.edu.sg"


# Initialize agent
agent = ta.agents.OpenRouterAgent(model_name="gpt-4o") 


env = ta.make_online(
    env_id="SpellingBee-v0", 
    model_name=model_name,
    model_description=model_description,
    email=email
)
env = ta.wrappers.LLMObservationWrapper(env=env)


env.reset()

done = False
while not done:
    player_id, observation = env.get_observation()
    action = agent(observation)
    done, info = env.step(action=action)


rewards = env.close()

Implementation Status

Game	Players	Offline Play	Online Play	Documentation
CarPuzzle	1	❌	❌	—
Crosswords	1	✅	❌	—
FifteenPuzzle	1	✅	❌	—
GuessTheNumber	1	✅	❌	—
GuessWho	1	✅	❌	—
Hangman	1	✅	❌	—
LogicPuzzle	1	✅	❌	—
Mastermind	1	✅	❌	—
MathProof	1	❌	❌	—
Minesweeper	1	✅	❌	—
Sudoku	1	✅	❌	—
TowerOfHanoi	1	✅	❌	—
TwentyQuestions	1	✅	❌	—
WordLadder	1	✅	❌	—
WordSearch	1	✅	❌	—
Wordle	1	✅	❌	—

AirLandAndSea †	2	❌	❌	—
BattleOfSexes ‡	2	❌	❌	—
Battleship	2	✅	❌	—
Brass	2	❌	❌	—
Breakthrough ¶	2	✅	❌	—
Checkers	2	✅	❌	—
Chess	2	✅	✅	—
ConnectFour	2	✅	✅	—
Debate	2	✅	❌	—
DontSayIt	2	✅	✅	—
DracoGame ‡	2	❌	❌	—
DuopolisticCompetition ‡	2	❌	❌	—
EscalationGame ‡	2	❌	❌	—
Hive †	2	❌	❌	—
HotColdGame ‡	2	❌	❌	—
IntegrativeDistributiveNegotiation §	2	❌	❌	—
IteratedPrisonersDilemma	2	✅	❌	—
Jaipur	2	❌	❌	—
KuhnPoker ¶	2	✅	❌	—
LetterAuction	2	✅	❌	—
MemoryGame	2	✅	❌	—
MonopolyGame ‡	2	❌	❌	—
Nim ¶	2	✅	❌	—
Othello (Reversi)	2	✅	❌	—
PigDice ¶	2	✅	❌	—
PrisonersDilemma ‡	2	❌	❌	—
Santorini †	2	❌	❌	—
ScenarioPlanning	2	✅	❌	—
SeaBattle †	2	❌	❌	—
SimpleBlindAuction ¶	2	✅	❌	—
SimpleNegotiation	2	✅	✅	—
SpellingBee	2	✅	✅	—
SpiteAndMalice	2	✅	✅	—
StagHunt ‡	2	❌	❌	—
Stratego	2	✅	✅	—
Taboo	2	✅	❌	—
Tak	2	✅	✅	—
TicTacToe	2	✅	✅	—
TriGame ‡	2	❌	❌	—
TruthAndDeception	2	✅	✅	—
UltimateTicTacToe	2	✅	✅	—
WaitGoGame ‡	2	❌	❌	—
WordChains	2	✅	✅	—

ArcticScavengers †	3+	❌	❌	—
AreYouTheTraitor †	3+	❌	❌	—
BlindAuction	3–15	✅	❌	—
CharacterConclave	3–15	✅	❌	—
Codenames†	4	❌	❌	—
LiarsDice	2–15	✅	✅	—
Negotiation	3–15	✅	❌	—
Pit †	3+	❌	❌	—
Poker	2–15	✅	✅	—
Snake	2–15	✅	✅	—
Surround	2–15	✅	❌	—
TwoRoomsAndABoom †	6+	❌	❌	—
Diplomacy	3–7	✅	❌	—
7 Wonders	3+	❌	❌	—
Bohnanza	3+	❌	❌	—
Codenames	4+	❌	❌	—
Risk	3+	❌	❌	—
SettlersOfCatan	2–4	❌	❌	—
TerraformingMars	1–5	❌	❌	—
Werewolf	5+	❌	❌	—

† Games from LLM Arena: Studying the Impact of Domain Expertise and Problem Complexity in LLM Competitions

‡ Games from Language Model Negotiations: Theory-of-Mind vs. Complexity of the Game

§ Games from Negotiating with Humans by LLMs via Strategic Reasoning

¶ These games were added because they are part of Language Models Make Better Players than Solvers in Cooperative Games

Project details

Release history Release notifications | RSS feed

0.7.4

Oct 16, 2025

0.7.3

Jul 31, 2025

0.7.2

Jul 22, 2025

0.7.0

Jul 17, 2025

0.6.17

Jul 21, 2025

0.6.16

Jul 5, 2025

0.6.15

Jul 5, 2025

0.6.14

Jul 4, 2025

0.6.12

Jul 3, 2025

0.6.11

Jul 3, 2025

0.6.10

Jul 3, 2025

0.6.9

Jul 3, 2025

0.6.4

Apr 15, 2025

0.6.3

Apr 8, 2025

0.6.1

Mar 31, 2025

0.6.0

Mar 25, 2025

0.5.9

Mar 8, 2025

0.5.8

Mar 7, 2025

0.5.7

Mar 7, 2025

0.5.6

Mar 6, 2025

0.5.5

Mar 6, 2025

0.5.4

Mar 6, 2025

This version

0.5.3

Mar 6, 2025

0.5.0

Feb 14, 2025

0.4.9

Feb 14, 2025

0.4.8

Feb 13, 2025

0.4.6

Feb 13, 2025

0.4.5

Feb 13, 2025

0.4.4

Feb 13, 2025

0.4.2

Feb 11, 2025

0.4.1

Feb 11, 2025

0.4.0

Feb 11, 2025

0.3.9

Feb 7, 2025

0.3.8

Feb 6, 2025

0.3.6

Feb 6, 2025

0.3.5

Feb 5, 2025

0.3.4

Feb 3, 2025

0.3.2

Jan 30, 2025

0.3.1

Jan 30, 2025

0.3.0

Jan 29, 2025

0.2.7

Jan 20, 2025

0.2.5

Jan 20, 2025

0.2.0

Dec 17, 2024

0.1.6

Nov 19, 2024

0.1.5

Nov 16, 2024

0.1.3

Nov 16, 2024

0.1.2

Nov 16, 2024

0.1.1

Nov 11, 2024

0.1.0

Nov 4, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textarena-0.5.3.tar.gz (8.1 MB view details)

Uploaded Mar 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

TextArena-0.5.3-py3-none-any.whl (8.2 MB view details)

Uploaded Mar 6, 2025 Python 3

File details

Details for the file textarena-0.5.3.tar.gz.

File metadata

Download URL: textarena-0.5.3.tar.gz
Upload date: Mar 6, 2025
Size: 8.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for textarena-0.5.3.tar.gz
Algorithm	Hash digest
SHA256	`7e4764c02ed7e948f2a6d8082d629aaa129ed72dbb29cc75a3168b4a6d34b57f`
MD5	`4f31465fc484bc4bf20e9e05d1d5c18a`
BLAKE2b-256	`65889e13b62fc4d5b0f9f78f8130934a82a417bc97041010c9167ed757051e85`

See more details on using hashes here.

File details

Details for the file TextArena-0.5.3-py3-none-any.whl.

File metadata

Download URL: TextArena-0.5.3-py3-none-any.whl
Upload date: Mar 6, 2025
Size: 8.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for TextArena-0.5.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1da94b39f5b60491945e3664e3f2d38c405b1c43263071dc9a94cdf5c28a2211`
MD5	`eecbe4398ab16dbef47971ee375de49d`
BLAKE2b-256	`782c45993834ae04e3cae8936d40320c4d009b992ab0189284689bc9221c8dfe`

See more details on using hashes here.

TextArena 0.5.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

TextArena

Example

Installation

Play Offline

Play Online

Implementation Status

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes