agentodyssey

Open-Ended Long-Horizon Text Game Generation Engine and Evaluation Framework for Test-Time Continual Learning Agents

These details have not been verified by PyPI

Project links

Project description

Open-Ended Long-Horizon Text Game Generation Engine and Evaluation Framework
for Test-Time Continual Learning Agents

AgentOdyssey is a lightweight interactive environment that supports both novel game generation, a unified agent interface, and multifaceted evaluation. It is designed to evaluate test-time continual learning agents across five key abilities: exploration, world knowledge acquisition, episodic memory, skill learning, and long-horizon planning. Its main features include:

Open-Ended Long-Horizon Game Generation: Generate games with entirely new and rich entities, dynamics, and storylines from a single command.
Unified Agent Interface: All LLM-based agents maximize prompt sharing via inherited classes to ensure fair comparison. Adding a new agent can be done by simplely implementing a few methods.
Multifaceted Evaluation Metrics: Includes a range of metrics beyond game progress to probe specific failure modes of agents.

🚀 Quickstart	📦 PyPI Package	🎮 Game Generation	🤖 Agent Paradigms
📊 Evaluation Metrics	🛠️ Development	🔧 Trouble Shooting	📝 Citation

Quickstart

1. Install

conda create -n agentodyssey python=3.12 && conda activate agentodyssey
git clone https://github.com/agentodyssey/agentodyssey.git && cd agentodyssey
pip install -e .

2. Set your API key (if using proprietary LLMs). For example, for OpenAI:

export OPENAI_API_KEY="your-key"

3. Run an evaluation

# Play the game yourself
python eval.py --game_name remnant --agent HumanAgent

# Evaluate an LLM agent (i.e. Long Context Agent)
python eval.py --game_name remnant --agent LongContextAgent --llm_provider openai --llm_name gpt-5

[!NOTE] See the full parameters for evaluation → Running Evaluations

PyPI Package

1. Install

pip install agentodyssey

2. Python API

AgentOdyssey provides a Python wrapper for seamless integration into your own evaluation pipelines:

from agentodyssey import AgentOdyssey

AgentOdyssey.run(game_name="remnant", agent="LongContextAgent", llm_provider="openai", llm_name="gpt-5")

3. CLI tool

# Play the game yourself
agentodyssey run --game-name remnant --agent HumanAgent

# Evaluate an LLM agent
agentodyssey run --game-name remnant --agent LongContextAgent --llm-provider openai --llm-name gpt-5

[!NOTE] The CLI uses hyphens (--game-name) while eval.py and the Python API use underscores (--game_name).

Game Generation

The generation pipeline creates a complete game world through three stages:

Entity generation populates the world with locations, objects and NPCs.
Rule generation adds new world dynamics including action rules that describe player-invoked actions (e.g., pick up, craft, etc) and step rules that describe automatic environment dynamics (e.g., day-night cycle).
Quest generation devises the main storyline that acts as goals for the agent to pursue.

# Generate a themed game and run it
agentodyssey generate "a pirate-themed island adventure" --game-name pirate
agentodyssey run --game-name pirate --agent LongContextAgent --llm-provider openai --llm-name gpt-5

Generate with full control over world size, quest structure, and the LLM used for generation:

agentodyssey generate "a haunted castle with undead enemies" \
    --game-name haunted \
    --num-places 4 \
    --num-objects 20 \
    --num-npcs 10 \
    --num-action-rules 2 \
    --num-step-rules 1 \
    --num-quest-chapters 2 \
    --quest-description "Defeat the Lich King and restore the castle" \
    --llm-provider openai \
    --llm-name gpt-5

[!NOTE] See the full parameters for game generation → Generating Games

Learn more about the game ontology → Game Ontology

Agent Paradigms

The environment already implements agents spanning 6 paradigms as shown below. All LLM-based agents use ReAct prompting.

Paradigm	Agents
Baselines	`RandomAgent`, `NoMemoryAgent`
Long Context	`LongContextAgent`
Fixed-Size Memory	`ShortTermMemoryAgent`, `Mem1Agent`
RAG	`VanillaRAGAgent`, `Mem0RAGAgent`, `RaptorRAGAgent`, `VoyagerAgent`
SFT	`LoRASFTAgent`, `FullSFTAgent`
Latent	`MemoryLLMAgent`, `MPlusAgent`
RL	will be released

Some agents can be augmented with three optional add-ons: reflection, summarization, and short-term memory (Please refer to the paper for details.)

[!NOTE] See detailed descriptions of each paradigm → Agent Paradigms

Learn how to implement your own agent → Custom Agents

Evaluation Metrics

AgentOdyssey evaluates agents along multifaceted axes:

Game Progress measures how far the agent advances through in-game objectives. The main reward tracks main quest stage completion, while the supplementary reward captures exploration, crafting, combat, and side quest progress.

Model Cost tracks the total input and output tokens consumed by each agent during a run.

Diagnostic Testing probes specific capabilities through targeted tests:

Metric	What it measures
World Knowledge QA	Understanding of game facts, rules, and structure (evaluated before and after gameplay)
Episodic Memory QA	Recall of specific events from the agent's own trajectory
Object Exploration (OE)	Proportion of available objects the agent has acquired
Action Exploration (AE)	Proportion of available action types the agent has executed
Action Diversity (AD)	Entropy-based measure of behavioral diversity over a sliding window

[!NOTE] See metric definitions and how to run diagnostic evaluation → Evaluation Metrics

Additional Dependencies

The base requirements.txt covers most functionality, but certain agents and providers need extra packages:

Feature	Extra packages
`RaptorRAGAgent`	`tiktoken`, `umap-learn`, `tenacity`
`Mem0RAGAgent`	`mem0ai`
Gemini (`llm_provider="gemini"`)	`google-genai`
Claude (`llm_provider="claude"`)	`anthropic`

Install only what you need, e.g.:

pip install tiktoken umap-learn tenacity   # for RaptorRAGAgent

Contributing and Trouble Shooting

For common issues and solutions, please refer to our Troubleshooting Guide. We welcome contributions from everyone and appreciate your help making the project better. For bug reports, feature suggestions, code contributions, and general questions, please refer to our Contribution Guidelines.

Citation

✨ If you find AgentOdyssey useful, please cite our work:

@inproceedings{agentodyssey2026,
  title     = {AgentOdyssey: Open-Ended Long-Horizon Text Game Generation for Test-Time Continual Learning Agents},
  author    = {Zhang, Zheyuan and Wen, Zehao and Zhang, Alvin and Wang, Andrew and Xie, Jianwen and Khashabi, Daniel and Shu, Tianmin},
  year      = {2026},
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentodyssey-0.1.0.tar.gz (29.5 MB view details)

Uploaded Apr 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentodyssey-0.1.0-py3-none-any.whl (29.6 MB view details)

Uploaded Apr 20, 2026 Python 3

File details

Details for the file agentodyssey-0.1.0.tar.gz.

File metadata

Download URL: agentodyssey-0.1.0.tar.gz
Upload date: Apr 20, 2026
Size: 29.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for agentodyssey-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2da1f83060ff38e7dcf44d736304e7414071ecac8edd36099046b45e1997d5aa`
MD5	`2f9832f85ff9bcf570c0ad5f3d4759b7`
BLAKE2b-256	`538dc6faef8ab1fd3c0cec9f9f7495f67713ca8877a16aaba7a3ea94c5c24467`

See more details on using hashes here.

File details

Details for the file agentodyssey-0.1.0-py3-none-any.whl.

File metadata

Download URL: agentodyssey-0.1.0-py3-none-any.whl
Upload date: Apr 20, 2026
Size: 29.6 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for agentodyssey-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d849dd9d6ae48590aab76cad04ea7d2f65c52d96541c48a605a31e5c20b1ab94`
MD5	`b40e86d5dfd246f0a4a96a2fcbec5406`
BLAKE2b-256	`12200f1a76a6422af0dd590ccefe3e32f526e1ff394aa31f9e3213d7382e5d48`

See more details on using hashes here.

agentodyssey 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Open-Ended Long-Horizon Text Game Generation Engine and Evaluation Framework
for Test-Time Continual Learning Agents

Table of Contents

Quickstart

PyPI Package

Game Generation

Agent Paradigms

Evaluation Metrics

Additional Dependencies

Contributing and Trouble Shooting

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes