A Gym for Generalist LLMs.
Project description
Overview
We’re entering the era of experience, where large language models (LLMs) learn not just from static datasets, but from interactive experience gathered in complex, expressive environments.
As a step toward this, we introduce GEM — a General Experience Maker for LLMs — an open-source environment suite designed for training agentic LLMs via online reinforcement learning.
Like OpenAI Gym for traditional RL, GEM provides a standardized API and a growing collection of diverse environments. It is training framework-agnostic and supports seamless integration with six popular RL training frameworks including Oat and Tinker, offering:
- 🧩 Clean, composable environment APIs
- ⚙️ Async vectorized execution for high-throughput simulation
- 🔧 Tool integration & custom wrappers
- 🧠 Multi-environment training
- 🎈 Ready-to-use benchmark environments and algorithms
Links
Installation
pip install -U gem-llm
Or install from source for the latest version:
git clone https://github.com/axon-rl/gem.git
cd gem
pip install -e .
Please check Getting Started for more setup details.
🔥 You can jump into examples to quickly start your agentic RL training with GEM & your favorite training framework.
Interface
GEM's interface closely follows OpenAI-Gym's API. Here's an example using the game:GuessTheNumber-v0 environment:
import gem
# List all supported environments
gem.print_envs()
# Initialize the environment
env = gem.make("game:GuessTheNumber-v0")
# Reset the environment to generate the first observation
observation, info = env.reset()
# Start the agent-environment loop
while True:
action = env.sample_random_action() # insert policy here, e.g.,
# (pseudocode) action = llm.generate(observation)
# apply action and receive next observation, reward
# and whether the episode has ended
next_observation, reward, terminated, truncated, info = env.step(action)
print("OBS", observation)
print("ACT", action)
# update the policy (online) here
# e.g., policy = learn(policy, observation, action, reward, info)
observation = next_observation
# Exit when the episode terminates
if terminated or truncated:
break
Features
- Environments consist of tasks and (optional) tools. Tool-calling is achieved via an environment wrapper, as demonstrated here.
- GEM is training framework-agnostic, and we demonstrate its integration with six popular RL training frameworks.
- We provide implementations and benchmarking results for different algorithms across a diverse set of environments.
Supported Tasks
| Category | Example Environments | Description |
|---|---|---|
| Games | game:GuessTheNumber-v0-hard, game:Sudoku-v0-easy |
Classic language games |
| Math | math:Math12K, math:DeepScaleR40K |
Mathematical reasoning |
| Code | code:CodeContest, code:Taco8k |
Competitive coding |
| QA | qa:NaturalQuestions, qa:HotpotQA |
Knowledge-intensive question answering |
| ReasoningGym | rg:arc_1d, rg:letter_counting |
Diverse synthetic reasoning tasks |
Supported Tools
| Tool | Description |
|---|---|
| Python | Python code executor that parses code blocks, executes them, and returns outputs |
| Search | Calls a search engine to retrieve documents for any query |
| MCP | Calls the general MCP API to train tool-use agents |
Supported Frameworks
| Framework | Description |
|---|---|
| Oat | vLLM + DeepSpeed, modular, no ray |
| Tinker | SDK provided by Thinking Machines, frees you from infra issues |
| Verl | Support diverse backends, models, and algorithms |
| RL2 | SGLang + FSDP, no ray, easy to hack |
| ROLL | Support diverse backends, models, and algorithms |
| OpenRLHF | Support diverse backends, models, and algorithms |
Examples of training agents on GEM environments with all above frameworks can be found in here!
Supported Algorithms
| Algorithm | Description |
|---|---|
| REINFORCE | A general policy gradient algorithm that can be applied to single- and multi-turn environments |
| GRPO | Mainly for bandits (single-turn), using group advantage normalization |
| PPO | Learns a turn-level critic to compute generalized advantage estimation (GAE) |
| REINFORCE + ReBN | REINFORCE with return batch normalization as introduced in our paper |
Please check out our paper for a more detailed description for each algorithm and empirical results showing their tradeoffs.
Contributing
We welcome all forms of contribution — from adding new environments to integrating additional training frameworks. We're planning to write a community-driven technical report, and major contributors will be recognized with authorship. Join discord to discuss more!
Acknowledgement
- This work is supported by Sea AI Lab for computing resources.
- Our code learns from and builds on several awesome projects such as gym, rllm, TextArena, Search-R1, ReasoningGym.
- The training example code is built on Oat, Tinker, Verl, RL2, ROLL, OpenRLHF.
Citation
If you find our works useful for your research, please consider citing:
-
GEM paper (please prioritize citing the paper unless you believe the blog is a better fit):
@article{liu2025gem, title={GEM: A Gym for Agentic LLMs}, author={Liu, Zichen and Sims, Anya and Duan, Keyu and Chen, Changyu and Yu, Simon and Zhou, Xiangxin and Xu, Haotian and Xiong, Shaopan and Liu, Bo and Tan, Chenmien and others}, journal={arXiv preprint arXiv:2510.01051}, year={2025} }
-
GEM blog:
@misc{liu2025gemblog, title={GEM: A Gym for Generalist LLMs}, author={Liu, Zichen and Sims, Anya and Duan, Keyu and Chen, Changyu and Yang, Diyi and Lee, Wee Sun and Lin, Min}, year={2025}, howpublished={\url{https://axon-rl.notion.site/gem}}, note={Notion Blog}, }
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gem_llm-0.1.0-py3-none-any.whl.
File metadata
- Download URL: gem_llm-0.1.0-py3-none-any.whl
- Upload date:
- Size: 162.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f0320fd8e3a7838bd8385e95e5928af688cd007ee7c61357c7216fcac5cbb13
|
|
| MD5 |
b8ac058bbe7a24f4a970d03cef1a5c28
|
|
| BLAKE2b-256 |
9ff0243ab1eef1539b733a625c2857ce4d03be3293d5ab0def4518e7e67315e3
|