Skip to main content

Framework-Agnostic RL Environments for LLM Fine-Tuning

Project description

Benchmax

benchmax: Framework-Agnostic RL Environments for LLM Fine-Tuning

A lightweight, training-framework agnostic library for defining, running, and parallelizing environments, to fine-tune OSS LLMs with reinforcement learning.

License

📌 News

  • [29 Oct 2025] 🎉 Added support for easy multi-node parallelization across all major cloud providers using SkyPilot

ℹ️ Overview

benchmax comes with:

  • A collection of ready-to-use reinforcement learning (RL) environments for LLM fine-tuning ranging from multi-hop search to spreadsheet manipulation to CRM agents
  • An easy way to define, compose, and parallelize your own environments, including leveraging the existing ecosystem of MCP servers
  • Trainer-agnostic by design — BaseEnv exposes a small async interface (list_tools, run_tool, compute_reward, plus optional rollout lifecycle hooks) that any rollout loop can drive
  • Optional batteries-included add-ons: synthetic RAG dataset generation (benchmax[rag]), agent trace import (benchmax[traces]), and clients for the Castform training platform (benchmax.platform)

Define your environment as:

  1. A toolset (LLM calls, external APIs, calculators, MCPs, etc.).
  2. Output parsing logic to extract structured observations.
  3. Reward functions to score model outputs.

Rollout management, parallel execution, etc. comes out of the box.

⭐ Star our repository to show your support!

💡 Core Features

Built-in examples & templates

Get started with ready to use recipes, from Wikipedia search to spreadsheet manipulation. Easy to copy, customize, and extend. And yes, more are on the way.

MCP support

Tap into the growing MCP ecosystem and integrate them as tools within your environments.

Multi-node parallel execution

Multi-node parallelization enabled out of the box with state isolation across roll-outs (e.g. editing files on filesystem, etc.).

🌐 Creating Environments

What is an environment?

An environment consists of:

  • A list of tools that an LLM can call
  • A list of reward functions that evaluate the quality & correctness of the model's final output.

We also support MCP servers natively, allowing you to easily leverage the many servers built by the community.

Pre-built environments

Ready-to-use environments with pre-configured tools and reward functions.

How do I create a custom environment?

  1. With existing MCP servers (Built-in support for multi-node parallelization)

  2. Extend BaseEnv

How about more complex environments?

  • Check out our excel spreadsheet RL environment: benchmax.envs.excel.excel_env.ExcelEnv

I want a specific environment

Open an issue and tag us & we will look into building you one!


🎯 Motivation

  • Modularity and Simplicity:

    We set out to build a lightweight, modular system for defining RL environments—breaking them down into simple, composable parts: tools, tool output parsing, and reward functions.

    The goal's to make it easy for software engineers to build and experiment with RL environments without needing deep RL expertise.

  • Task Recipes and Ideas:

    We want benchmax to be a living library of reusable, RL-compatible task recipes, ready to inspire and extend beyond the usual suspects like math and coding. We aim to support more real-world workflows, including open-ended and long-horizon tasks.

  • Parallelization and Cloud Compatibility:

    • Enable efficient parallelization with maintained statefulness between rollouts.
    • Facilitate easy deployment and scalability in cloud environments.
  • MCP as a first class citizen:

    There has been an explosion of MCP servers/tools built out for use-cases ranging from browser use to excel to game creation. benchmax allows folks to leverage and compose these existing MCP servers to build environments integrated with real world systems e.g. excel

🤝 Contributing

We welcome new environment recipes and bug reports!

⭐ Star our repository to show your support!

📦 Add-ons

In addition to the core env library, benchmax ships several optional modules behind extras:

Extra Module Purpose
benchmax[rag] benchmax.rag.* Markdown chunking, corpus indexing (Postgres / Chroma / Pinecone / Turbopuffer), synthetic QA dataset generation, RAG-specific reward rubrics
benchmax[traces] benchmax.traces Agentic trace import (Braintrust today, Langfuse coming) and provider-agnostic processing pipeline
benchmax[chroma] / [pinecone] / [turbopuffer] benchmax.rag.corpus.* Corpus-backend pins (combine with [rag])
(core) benchmax.platform HTTP clients for the Castform platform — storage uploads, training-job launch, rollout server. Used both internally by benchmax.rag and by the high-level castform-sdk.

All platform URLs derive from CASTFORM_BASE_DOMAIN (default castform.com) with per-component overrides; see benchmax.config.

📜 License

Apache 2.0 © 2026 Castform

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

benchmax-0.1.2.dev27.tar.gz (352.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

benchmax-0.1.2.dev27-py3-none-any.whl (422.0 kB view details)

Uploaded Python 3

File details

Details for the file benchmax-0.1.2.dev27.tar.gz.

File metadata

  • Download URL: benchmax-0.1.2.dev27.tar.gz
  • Upload date:
  • Size: 352.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for benchmax-0.1.2.dev27.tar.gz
Algorithm Hash digest
SHA256 91c48ddd2b9c9987a7d9d1614b5da2ea4de0a7941e97d7cb68a6140fe70c0eab
MD5 3cdc50b06fc12c3d99c0e899e5b06454
BLAKE2b-256 0ee469e95b385a46631016c604775db6d5e840f42555fe0a03e853c7917df702

See more details on using hashes here.

File details

Details for the file benchmax-0.1.2.dev27-py3-none-any.whl.

File metadata

  • Download URL: benchmax-0.1.2.dev27-py3-none-any.whl
  • Upload date:
  • Size: 422.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for benchmax-0.1.2.dev27-py3-none-any.whl
Algorithm Hash digest
SHA256 424270cce4b851f679bdd54bc8c74fd241d4a00459bb39bca16d0ad86c108f8b
MD5 98f5ffcec1baae9240daa346128f83af
BLAKE2b-256 4590db8e0ed01d5d72f3feb4f6c7dedac5895b9e611914e64a0316fb2bfb50b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page