benchmax

Framework-Agnostic RL Environments for LLM Fine-Tuning

These details have not been verified by PyPI

Project description

benchmax: Framework-Agnostic RL Environments for LLM Fine-Tuning

A lightweight, training-framework agnostic library for defining, running, and parallelizing environments, to fine-tune OSS LLMs with reinforcement learning.

📌 News

[29 Oct 2025] 🎉 Added support for easy multi-node parallelization across all major cloud providers using SkyPilot
[29 Oct 2025] 🎉 Integration with SkyRL for distributed RL training across clusters
[Upcoming] 🛠️ Integration with Tinker API.

📘 Quickstart

Example: Multi-node parallelization of Excel Env with SkyRL and SkyPilot

RL environments can be computationally expensive to run (e.g. running tests). To handle these workloads efficiently, we distribute rollouts across multiple nodes using SkyPilot, horizontally scaling benchmax across cloud providers like GCP, AWS, Azure, etc.

SkyRL is a training framework benchmax is currently integrated with. Use our SkyRL integration to RL finetune Qwen-2.5 to do spreadsheet manipulation using a excel MCP parallelized across multiple nodes. The environment is defined in benchmax.envs.excel.excel_env.ExcelEnvSkypilot

Prepare the dataset

uv run src/benchmax/adapters/skyrl/benchmax_data_process.py \
  --local_dir ~/data/excel \
  --dataset_name spreadsheetbench \
  --env_path benchmax.envs.excel.excel_env.ExcelEnvLocal

Note: We are using ExcelEnvLocal instead of ExcelEnvSkypilot because the MCP is only used for listing tools to prepare the system prompt.

Run training and parallelize Excel environment
```
bash examples/skyrl/run_benchmax_excel.sh
```

This excel env example will spin up 5 nodes with 20 servers per node (total 100 MCP server in parallel). For more details, check out multi-node parallelization and SkyRL integration.

ℹ️ Overview

benchmax comes with:

A collection of ready-to-use reinforcement learning (RL) environments for LLM fine-tuning ranging from multi-hop search to spreadsheet manipulation to CRM agents
An easy to define, compose, and parallelize your own environments, including leveraging the existing ecosystem of MCP servers
Built-in integrations with popular RL training libraries (skyrl, etc.). benchmax is trainer-agnostic by design

Define your environment as:

A toolset (LLM calls, external APIs, calculators, MCPs, etc.).
Output parsing logic to extract structured observations.
Reward functions to score model outputs.

Rollout management, parallel execution, etc. comes out of the box.

⭐ Star our repository to show your support!

💡 Core Features

Built-in examples & templates

Get started with ready to use recipes, from Wikipedia search to spreadsheet manipulation. Easy to copy, customize, and extend. And yes, more are on the way.

Trainer integrations

Use your own trainer or training framework - no lock-in. benchmax is already integrated into SkyRL, with more integrations (Tinker, etc.) coming soon!

MCP support

Tap into the growing MCP ecosystem and integrate them as tools within your environments.

Multi-node parallel execution

Multi-node parallelization enabled out of the box with state isolation across roll-outs (e.g. editing files on filesystem, etc.).

🌐 Creating & Training with Environments

What is an environment?

An environment consists of:

A list of tools that an LLM can call
A list of reward functions that evaluate the quality & correctness of the model's final output.

We also support MCP servers natively, allowing you to easily leverage the many servers built by the community.

Pre-built environments

Ready-to-use environments with pre-configured tools and reward functions.

How do I create a custom environment?

With existing MCP servers (Built-in support for multi-node parallelization)
Extend BaseEnv

How about more complex environments?

Check out our excel spreadsheet RL environment: benchmax.envs.excel.excel_env.ExcelEnv

How do I use an environment with my preferred RL Trainer?

We currently have integrations with SkyRL. More incoming!

benchmax environments with skyrl

I want a specific environment

Open an issue and tag us & we will look into building you one!

🎯 Motivation

Modularity and Simplicity:

We set out to build a lightweight, modular system for defining RL environments—breaking them down into simple, composable parts: tools, tool output parsing, and reward functions.

The goal’s to make it easy for software engineers to build and experiment with RL environments without needing deep RL expertise.
Trainer Integrations:

There’s been lots of new RL training frameworks popping up (e.g., numerous forks of verl) & we expect this to continue. They are often tightly coupled with specific environments, leading to fragmentation and limited compatibility.

We are building benchmax as a standalone library with integrations to these different training frameworks & as an easy way for new frameworks to tap into an existing pool of environments. We're already integrated with SkyRL (Tinker coming soon)!
Task Recipes and Ideas:

We want benchmax to be a living library of reusable, RL-compatible task recipes, ready to inspire and extend beyond the usual suspects like math and coding. We aim to support more real-world workflows, including open-ended and long-horizon tasks.
Parallelization and Cloud Compatibility:
- Enable efficient parallelization with maintained statefulness between rollouts.
- Facilitate easy deployment and scalability in cloud environments.
MCP as a first class citizen:

There has been an explosion of MCP servers/tools built out for use-cases ranging from browser use to excel to game creation.benchmax allows folks to leverage and compose these existing MCP servers to build environments integrated with real world systems e.g. excel

🤝 Contributing

We welcome new environment recipes, bug reports, and trainer integrations!

⭐ Star our repository to show your support!

📜 License

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.2.dev33 pre-release

Jun 11, 2026

0.1.2.dev31 pre-release

Jun 11, 2026

0.1.2.dev30 pre-release

Jun 10, 2026

0.1.2.dev29 pre-release

Jun 5, 2026

0.1.2.dev28 pre-release

Jun 4, 2026

0.1.2.dev27 pre-release

May 30, 2026

0.1.2.dev26 pre-release

May 30, 2026

0.1.2.dev25 pre-release

May 26, 2026

0.1.2.dev23 pre-release

Apr 22, 2026

0.1.2.dev22 pre-release

Apr 18, 2026

0.1.2.dev21 pre-release

Mar 30, 2026

0.1.2.dev20 pre-release

Mar 30, 2026

0.1.2.dev19 pre-release

Mar 26, 2026

0.1.2.dev18 pre-release

Mar 11, 2026

This version

0.1.2.dev17 pre-release

Mar 10, 2026

0.1.2.dev16 pre-release

Feb 26, 2026

0.1.2.dev15 pre-release

Feb 25, 2026

0.1.2.dev14 pre-release

Feb 19, 2026

0.1.2.dev13 pre-release

Feb 14, 2026

0.1.2.dev12 pre-release

Feb 13, 2026

0.1.2.dev11 pre-release

Feb 13, 2026

0.1.2.dev10 pre-release

Feb 13, 2026

0.1.2.dev9 pre-release

Feb 9, 2026

0.1.2.dev8 pre-release

Jan 29, 2026

0.1.2.dev7 pre-release yanked

Jan 29, 2026

Reason this release was yanked:

bad

0.1.2.dev6 pre-release

Jan 14, 2026

0.1.2.dev5 pre-release

Nov 29, 2025

0.1.2.dev4 pre-release

Nov 22, 2025

0.1.2.dev3 pre-release

Nov 17, 2025

0.1.2.dev2 pre-release

Nov 16, 2025

0.1.2.dev1 pre-release

Oct 30, 2025

0.1.2.dev0 pre-release

Oct 27, 2025

0.1.1.dev7 pre-release

Sep 19, 2025

0.1.1.dev6 pre-release

Sep 18, 2025

0.1.1.dev5 pre-release

Aug 25, 2025

0.1.1.dev4 pre-release

Jul 29, 2025

0.1.1.dev3 pre-release

Jul 29, 2025

0.1.1.dev2 pre-release

Jul 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

benchmax-0.1.2.dev17.tar.gz (72.7 kB view details)

Uploaded Mar 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

benchmax-0.1.2.dev17-py3-none-any.whl (85.6 kB view details)

Uploaded Mar 10, 2026 Python 3

File details

Details for the file benchmax-0.1.2.dev17.tar.gz.

File metadata

Download URL: benchmax-0.1.2.dev17.tar.gz
Upload date: Mar 10, 2026
Size: 72.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for benchmax-0.1.2.dev17.tar.gz
Algorithm	Hash digest
SHA256	`ea0452a3e166fd941ca5ac5a038c3619376d5bd87a0c0c1fec6e3a6c0f61cdaa`
MD5	`ede1ea1f85f4802df07195022ea0eac6`
BLAKE2b-256	`aec2049c2cfa1e4c472177b0c577b3b53e4f025d1d74b4d155ea011ed0964951`

See more details on using hashes here.

File details

Details for the file benchmax-0.1.2.dev17-py3-none-any.whl.

File metadata

Download URL: benchmax-0.1.2.dev17-py3-none-any.whl
Upload date: Mar 10, 2026
Size: 85.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for benchmax-0.1.2.dev17-py3-none-any.whl
Algorithm	Hash digest
SHA256	`31dedd1422578c0822919bdcc4f1e18621d08127809a95493d8d85a2cbbdd124`
MD5	`c9caa2d09ddbd04c94cc71686461afc9`
BLAKE2b-256	`7ec464da10eb6bf90d2b45443dbb23ce8107510ac754c4a54e0a6bf7bdb9692a`

See more details on using hashes here.

benchmax 0.1.2.dev17

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

benchmax: Framework-Agnostic RL Environments for LLM Fine-Tuning

📌 News

📘 Quickstart

ℹ️ Overview

💡 Core Features

🌐 Creating & Training with Environments

What is an environment?

Pre-built environments

How do I create a custom environment?

How about more complex environments?

How do I use an environment with my preferred RL Trainer?

I want a specific environment

🎯 Motivation

🤝 Contributing

📜 License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes