Framework-Agnostic RL Environments for LLM Fine-Tuning
Project description
benchmax: Framework-Agnostic RL Environments for LLM Fine-Tuning
A lightweight, training-framework agnostic library for defining, running, and parallelizing environments, to fine-tune OSS LLMs with reinforcement learning.
📌 News
- [29 Oct 2025] 🎉 Added support for easy multi-node parallelization across all major cloud providers using SkyPilot
ℹ️ Overview
benchmax comes with:
- A collection of ready-to-use reinforcement learning (RL) environments for LLM fine-tuning ranging from multi-hop search to spreadsheet manipulation to CRM agents
- An easy way to define, compose, and parallelize your own environments, including leveraging the existing ecosystem of MCP servers
- Trainer-agnostic by design —
BaseEnvexposes a small async interface (list_tools,run_tool,compute_reward, plus optional rollout lifecycle hooks) that any rollout loop can drive - Optional batteries-included add-ons: synthetic RAG dataset generation (
benchmax[rag]), agent trace import (benchmax[traces]), and clients for the Castform training platform (benchmax.platform)
Define your environment as:
- A toolset (LLM calls, external APIs, calculators, MCPs, etc.).
- Output parsing logic to extract structured observations.
- Reward functions to score model outputs.
Rollout management, parallel execution, etc. comes out of the box.
⭐ Star our repository to show your support!
💡 Core Features
Built-in examples & templates
Get started with ready to use recipes, from Wikipedia search to spreadsheet manipulation. Easy to copy, customize, and extend. And yes, more are on the way.
MCP support
Tap into the growing MCP ecosystem and integrate them as tools within your environments.
Multi-node parallel execution
Multi-node parallelization enabled out of the box with state isolation across roll-outs (e.g. editing files on filesystem, etc.).
🌐 Creating Environments
What is an environment?
An environment consists of:
- A list of tools that an LLM can call
- A list of reward functions that evaluate the quality & correctness of the model's final output.
We also support MCP servers natively, allowing you to easily leverage the many servers built by the community.
Pre-built environments
Ready-to-use environments with pre-configured tools and reward functions.
- CRM
- Excel
- Math
- Wikipedia
- PostgreSQL search (
benchmax[rag])
How do I create a custom environment?
-
With existing MCP servers (Built-in support for multi-node parallelization)
How about more complex environments?
- Check out our excel spreadsheet RL environment:
benchmax.envs.excel.excel_env.ExcelEnv
I want a specific environment
Open an issue and tag us & we will look into building you one!
🎯 Motivation
-
Modularity and Simplicity:
We set out to build a lightweight, modular system for defining RL environments—breaking them down into simple, composable parts: tools, tool output parsing, and reward functions.
The goal's to make it easy for software engineers to build and experiment with RL environments without needing deep RL expertise.
-
Task Recipes and Ideas:
We want
benchmaxto be a living library of reusable, RL-compatible task recipes, ready to inspire and extend beyond the usual suspects like math and coding. We aim to support more real-world workflows, including open-ended and long-horizon tasks. -
Parallelization and Cloud Compatibility:
- Enable efficient parallelization with maintained statefulness between rollouts.
- Facilitate easy deployment and scalability in cloud environments.
-
MCP as a first class citizen:
There has been an explosion of MCP servers/tools built out for use-cases ranging from browser use to excel to game creation.
benchmaxallows folks to leverage and compose these existing MCP servers to build environments integrated with real world systems e.g. excel
🤝 Contributing
We welcome new environment recipes and bug reports!
⭐ Star our repository to show your support!
📦 Add-ons
In addition to the core env library, benchmax ships several optional
modules behind extras:
| Extra | Module | Purpose |
|---|---|---|
benchmax[rag] |
benchmax.rag.* |
Markdown chunking, corpus indexing (Postgres / Chroma / Pinecone / Turbopuffer), synthetic QA dataset generation, RAG-specific reward rubrics |
benchmax[traces] |
benchmax.traces |
Agentic trace import (Braintrust today, Langfuse coming) and provider-agnostic processing pipeline |
benchmax[chroma] / [pinecone] / [turbopuffer] |
benchmax.rag.corpus.* |
Corpus-backend pins (combine with [rag]) |
| (core) | benchmax.platform |
HTTP clients for the Castform platform — storage uploads, training-job launch, rollout server. Used both internally by benchmax.rag and by the high-level castform-sdk. |
All platform URLs derive from CASTFORM_BASE_DOMAIN (default
castform.com) with per-component overrides; see
benchmax.config.
📜 License
Apache 2.0 © 2026 Castform
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file benchmax-0.1.2.dev26.tar.gz.
File metadata
- Download URL: benchmax-0.1.2.dev26.tar.gz
- Upload date:
- Size: 352.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c1a4774630f401d5b572b3537a43729aa210bd0f61663d2c4050cf684bdc796
|
|
| MD5 |
73483546ce9e9c242cdc094ae50b9100
|
|
| BLAKE2b-256 |
2409a9da0588fee5f33926d66ecca9198e6a544adf9b455f3545065c6af84484
|
File details
Details for the file benchmax-0.1.2.dev26-py3-none-any.whl.
File metadata
- Download URL: benchmax-0.1.2.dev26-py3-none-any.whl
- Upload date:
- Size: 421.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e2a8b7c4fbe1feab27d14b9273f790ceac71bbef180f738668e4dbbc3843e4b
|
|
| MD5 |
56da47284eb3ead47a115e182ca2dc79
|
|
| BLAKE2b-256 |
8f411e35ac2d6b37cd9f369cf37902111f87e8b4fc93ce8b27e6f2d538ecec50
|