A Gym for Generalist LLMs.

These details have not been verified by PyPI

Project links

Project description

🌍 GEM: A Gym for Agentic LLMs

Overview

We’re entering the era of experience, where large language models (LLMs) learn not just from static datasets, but from interactive experience gathered in complex, expressive environments.

As a step toward this, we introduce GEM — a General Experience Maker for LLMs — an open-source environment suite designed for training agentic LLMs via online reinforcement learning.

Like OpenAI Gym for traditional RL, GEM provides a standardized API and a growing collection of diverse environments. It is training framework-agnostic and supports seamless integration with six popular RL training frameworks including Oat and Tinker, offering:

🧩 Clean, composable environment APIs
⚙️ Async vectorized execution for high-throughput simulation
🔧 Tool integration & custom wrappers
🧠 Multi-environment training
🎈 Ready-to-use benchmark environments and algorithms

Installation

pip install -U gem-llm

Or install from source for the latest version:

git clone https://github.com/axon-rl/gem.git
cd gem
pip install -e .

Please check Getting Started for more setup details.

🔥 You can jump into examples to quickly start your agentic RL training with GEM & your favorite training framework.

Interface

GEM's interface closely follows OpenAI-Gym's API. Here's an example using the game:GuessTheNumber-v0 environment:

import gem

# List all supported environments
gem.print_envs()

# Initialize the environment
env = gem.make("game:GuessTheNumber-v0")

# Reset the environment to generate the first observation
observation, info = env.reset()

# Start the agent-environment loop
while True:
    action = env.sample_random_action() # insert policy here, e.g.,
    # (pseudocode) action = llm.generate(observation)

    # apply action and receive next observation, reward
    # and whether the episode has ended
    next_observation, reward, terminated, truncated, info = env.step(action)
    print("OBS", observation)
    print("ACT", action)

    # update the policy (online) here
    # e.g., policy = learn(policy, observation, action, reward, info)

    observation = next_observation
    # Exit when the episode terminates
    if terminated or truncated:
        break

Features

Environments consist of tasks and (optional) tools. Tool-calling is achieved via an environment wrapper, as demonstrated here.
GEM is training framework-agnostic, and we demonstrate its integration with six popular RL training frameworks.
We provide implementations and benchmarking results for different algorithms across a diverse set of environments.

Supported Tasks

Category	Example Environments	Description
Games	`game:GuessTheNumber-v0-hard`, `game:Sudoku-v0-easy`	Classic language games
Math	`math:Math12K`, `math:DeepScaleR40K`	Mathematical reasoning
Code	`code:CodeContest`, `code:Taco8k`	Competitive coding
QA	`qa:NaturalQuestions`, `qa:HotpotQA`	Knowledge-intensive question answering
ReasoningGym	`rg:arc_1d`, `rg:letter_counting`	Diverse synthetic reasoning tasks

Supported Tools

Tool	Description
Python	Python code executor that parses code blocks, executes them, and returns outputs
Search	Calls a search engine to retrieve documents for any query
MCP	Calls the general MCP API to train tool-use agents

Supported Frameworks

Framework	Description
Oat	vLLM + DeepSpeed, modular, no ray
Tinker	SDK provided by Thinking Machines, frees you from infra issues
Verl	Support diverse backends, models, and algorithms
RL2	SGLang + FSDP, no ray, easy to hack
ROLL	Support diverse backends, models, and algorithms
OpenRLHF	Support diverse backends, models, and algorithms

Examples of training agents on GEM environments with all above frameworks can be found in here!

Supported Algorithms

Algorithm	Description
REINFORCE	A general policy gradient algorithm that can be applied to single- and multi-turn environments
GRPO	Mainly for bandits (single-turn), using group advantage normalization
PPO	Learns a turn-level critic to compute generalized advantage estimation (GAE)
REINFORCE + ReBN	REINFORCE with return batch normalization as introduced in our paper

Please check out our paper for a more detailed description for each algorithm and empirical results showing their tradeoffs.

Contributing

We welcome all forms of contribution — from adding new environments to integrating additional training frameworks. We're planning to write a community-driven technical report, and major contributors will be recognized with authorship. Join discord to discuss more!

Acknowledgement

This work is supported by Sea AI Lab for computing resources.
Our code learns from and builds on several awesome projects such as gym, rllm, TextArena, Search-R1, ReasoningGym.
The training example code is built on Oat, Tinker, Verl, RL2, ROLL, OpenRLHF.

Citation

If you find our works useful for your research, please consider citing:

GEM paper (please prioritize citing the paper unless you believe the blog is a better fit):

@article{liu2025gem,
  title={GEM: A Gym for Agentic LLMs},
  author={Liu, Zichen and Sims, Anya and Duan, Keyu and Chen, Changyu and Yu, Simon and Zhou, Xiangxin and Xu, Haotian and Xiong, Shaopan and Liu, Bo and Tan, Chenmien and others},
  journal={arXiv preprint arXiv:2510.01051},
  year={2025}
}

GEM blog:

@misc{liu2025gemblog,
  title={GEM: A Gym for Generalist LLMs},
  author={Liu, Zichen and Sims, Anya and Duan, Keyu and Chen, Changyu and Yang, Diyi and Lee, Wee Sun and Lin, Min},
  year={2025},
  howpublished={\url{https://axon-rl.notion.site/gem}},
  note={Notion Blog},
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Oct 5, 2025

0.0.4

Aug 23, 2025

0.0.3.post1

Aug 1, 2025

0.0.3

Aug 1, 2025

0.0.2

Aug 1, 2025

0.0.1 yanked

Jul 31, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gem_llm-0.1.0-py3-none-any.whl (162.1 kB view details)

Uploaded Oct 5, 2025 Python 3

File details

Details for the file gem_llm-0.1.0-py3-none-any.whl.

File metadata

Download URL: gem_llm-0.1.0-py3-none-any.whl
Upload date: Oct 5, 2025
Size: 162.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for gem_llm-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4f0320fd8e3a7838bd8385e95e5928af688cd007ee7c61357c7216fcac5cbb13`
MD5	`b8ac058bbe7a24f4a970d03cef1a5c28`
BLAKE2b-256	`9ff0243ab1eef1539b733a625c2857ce4d03be3293d5ab0def4518e7e67315e3`

See more details on using hashes here.

gem-llm 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🌍 GEM: A Gym for Agentic LLMs

Overview

Links

Installation

Interface

Features

Supported Tasks

Supported Tools

Supported Frameworks

Supported Algorithms

Contributing

Acknowledgement

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes