Skip to main content

Agent World Model

Project description

AWM Logo Agent World Model

Infinity Synthetic Environments for Agentic Reinforcement Learning

Zhaoyang Wang1, Canwen Xu2, Boyi Liu2, Yite Wang2, Siwei Han1,
Zhewei Yao2, Huaxiu Yao1, Yuxiong He2

1UNC-Chapel Hill   2Snowflake AI Research  

arXiv HuggingFace HuggingFace HuggingFace HuggingFace

Agent World Model (AWM) is a fully synthetic environment generation pipeline that synthesizes 1,000 executable, SQL database-backed tool-use environments exposed via unified MCP interface for large-scale multi-turn agentic reinforcement learning.


🎯 Overview

The AWM synthesis pipeline incldues:

  1. Start from a high-level scenario (e.g., "an online shopping platform")
  2. Generate user tasks that serve as functional requirements
  3. Synthesize a SQLite database (schema + sample data) as the state backend
  4. Generate a Python interface layer (FastAPI + MCP) as the action/observation space
  5. Generate verification code that inspects database state changes for reward signals

🔮 Resources

We plan to release the syntheszied 1,000 executable environments and corresponding tasks, databases, and verification in huggingface. Please checkout huggingface repo at Snowflake/AgentWorldModel-1K.

Resource Link
📄 Paper 📄 arxiv.org/abs/2602.10090
💻 Code 💻 Snowflake-Labs/agent-world-model
📦 AgentWorldModel-1K 🤗 Snowflake/AgentWorldModel-1K
🤖 Arctic-AWM-4B 🤗 Snowflake/Arctic-AWM-4B
🤖 Arctic-AWM-8B 🤗 Snowflake/Arctic-AWM-8B
🤖 Arctic-AWM-14B 🤗 Snowflake/Arctic-AWM-14B

If you want to directly use our synthesized environments, please download by

hf download Snowflake/AgentWorldModel-1K --repo-type dataset --local-dir ./outputs/

Then you can skip to Environment Management and Agent Demo to start using the environments with the agent demo.

📦 Setup

Run uv sync to setup the python environment. And set your LLM API credentials:

# OpenAI or any other compatible services
export AWM_SYN_LLM_PROVIDER="openai"
export OPENAI_API_KEY="your-api-key"
# optional, if you are using a custom base url
export OPENAI_BASE_URL="http://xxxxxx"

# Azure OpenAI
export AWM_SYN_LLM_PROVIDER="azure"
export AZURE_ENDPOINT_URL="https://your-endpoint.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your-api-key"

# configure the model/LLM for synthesis
export AWM_SYN_OVERRIDE_MODEL="your-model-name such as gpt-5"

🔥 Synthesis

AWM CLI

All synthesis is exposed through the awm command-line tool. Run awm --help to see available commands:

awm --help

Available commands:
  gen        Synthesis pipeline commands
  ├── scenario   Generate scenario names from seed set
  ├── task       Generate user tasks per scenario
  ├── db         Generate database schema and create SQLite databases
  ├── sample     Generate and insert sample data into databases
  ├── spec       Generate API specification for each scenario
  ├── env        Generate MCP environment code
  ├── verifier   Generate verification code for tasks
  └── all        Run the full synthesis pipeline
  env        Environment management commands
  ├── start      Start MCP server for a scenario
  ├── check      Check if an MCP server is running and list its tools
  ├── check_all  Check all generated environments
  └── reset_db   Reset databases to initial state
  agent      Run a tool-use agent to solve a task by interacting with the environment

Use awm <command> --help to see options for any command, e.g. awm gen task --help.

Step 1: Scenario Generation

We start with a seed set of scenarios and generate 1,000 unique scenario descriptions. Note that only the names are used as seeds; the descriptions are included in the seed file for ease of use.

export EMBEDDING_OPENAI_API_KEY="your-api-key for the embedding model"

awm gen scenario \
    --input_path outputs/seed_scenario.jsonl \
    --output_path outputs/gen_scenario.jsonl \
    --target_count 1000

Step 2: Task Generation

We generate 10 tasks per scenario, which are also serving as the requirements for building the environment.

awm gen task \
    --input outputs/gen_scenario.jsonl \
    --output outputs/gen_tasks.jsonl

Step 3: Database Synthesis

We define the database schema and complete the initial state to fully support the generated tasks.

# database schema
awm gen db \
    --input outputs/gen_tasks.jsonl \
    --output outputs/gen_db.jsonl

# sample data for initial state
awm gen sample \
    --input_task outputs/gen_tasks.jsonl \
    --input_db outputs/gen_db.jsonl \
    --output outputs/gen_sample.jsonl

Step 4: Interface Synthesis

We first generate API spec for better generating the Python code of the environment in MCP interface.

# API spec (interface schema)
awm gen spec \
    --input_task outputs/gen_tasks.jsonl \
    --input_db outputs/gen_db.jsonl \
    --output outputs/gen_spec.jsonl

# Environment code
awm gen env \
    --input_spec outputs/gen_spec.jsonl \
    --input_db outputs/gen_db.jsonl \
    --output outputs/gen_envs.jsonl

Step 5: Verification Synthesis

We provide two options for verification:

  1. code-augmented LLM-as-a-Judge (sql)
  2. purely code-based Judge (code)
awm gen verifier \
    --mode sql \
    --input_task outputs/gen_tasks.jsonl \
    --output outputs/gen_verifier.jsonl

Environment Management

Run and check each environment. The MCP endpoint will be available at http://localhost:8001/mcp.

# Reset databases to initial state
awm env reset_db \
    --input_db outputs/gen_db.jsonl \
    --input_sample outputs/gen_sample.jsonl

# Start MCP server for a scenario
awm env start \
    --scenario "scenario_name" \
    --envs_load_path outputs/gen_envs.jsonl \
    --port 8001

# Check if MCP server is running
awm env check --url http://localhost:8001/mcp

# Batch test all generated environments
awm env check_all --output outputs/gen_envs.jsonl

Agent Demo

AWM includes a simple agent demo that connects to an MCP environment to solve tasks via multi-turn tool calling. Please start the environment and use vLLM to serve the model before running the agent.

# serve the model
vllm serve Snowflake/Arctic-AWM-4B --host 127.0.0.1 --port 8000

# start the environment
awm env start --scenario e_commerce_33 --envs_load_path outputs/gen_envs.jsonl --port 8001

# run the agent
awm agent \
    --task "show me the top 10 most expensive products" \
    --mcp_url http://localhost:8001/mcp \
    --vllm_url http://localhost:8000/v1 \
    --model Snowflake/Arctic-AWM-4B

Citation

If you find this work useful, please kindly cite:

@article{wang2026agentworldmodelinfinity,
      title={Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning}, 
      author={Zhaoyang Wang and Canwen Xu and Boyi Liu and Yite Wang and Siwei Han and Zhewei Yao and Huaxiu Yao and Yuxiong He},
      year={2026},
      eprint={2602.10090},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2602.10090}, 
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_world_model-0.1.0.tar.gz (1.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_world_model-0.1.0-py3-none-any.whl (70.5 kB view details)

Uploaded Python 3

File details

Details for the file agent_world_model-0.1.0.tar.gz.

File metadata

  • Download URL: agent_world_model-0.1.0.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for agent_world_model-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ae7ba432c3fd9ccae61b877f53b56bbce44c6fad265792e97615ef39b8821bd0
MD5 c1fd1d69ce0f45e11d6e592414a20392
BLAKE2b-256 40bbe403ff0c065f46dee4116bec89a50ff17d1ab96d1af5782a27c93c09761d

See more details on using hashes here.

File details

Details for the file agent_world_model-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_world_model-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e5eda124d46764e72eb60b5c6a2202c612fc566377287e3f05cb2226a24cfb3b
MD5 368e19c583e3979744b816c6de74b2c6
BLAKE2b-256 2cebed05ec95fbe7e3a34b0d103cc7bd3ffb5fa0fd72e7f0922c9c71ebb9d144

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page