Skip to main content

A framework for training LLM-powered agents to use tools more effectively using Reinforcement Learning

Project description

ToolBrain 🧠

PyPI Version Monthly Downloads

ToolBrain is a lightweight open-source Python library for training agentic systems with effective tool usage and built-in reinforcement learning.
📚 Our website: toolbrain.org and Documentation & tutorials

📚 Watch Introduction Video

Support us by giving ToolBrain a ⭐ on GitHub.

✨ Key Features

🚀 Getting Started

Prerequisites

  • Python 3.10+

Installation

Create conda env (optional)

conda create --name toolbrain python=3.12
conda activate toolbrain

from PyPi:

pip install toolbrain

Or from the source code:

git clone git@github.com:ToolBrain/ToolBrain.git

Enter the cloned folder and type:

pip install .

Run the Example

Run the complete example to see ToolBrain in action (please see under examples folder for more advanced usage examples):

python examples/01_run_hello_world.py

This will:

  • Initialize a CodeAgent with simple math tools
  • Define a customised reward function
  • Run the GRPO algorithm

📖 Usage Example

Here's a minimal example of how to use ToolBrain. This script demonstrates simplified ToolBrain API:

  1. Create a smolagent CodeAgent
  2. Create a brain with our main class Brain()
  3. Train the agent with the GRPO algorithm
from smolagents import tool, TransformersModel, CodeAgent
from toolbrain import Brain
from toolbrain.rewards import reward_exact_match

# --- 1. Define Tools and Reward Function (User-defined) ---
@tool
def add(a: int, b: int) -> int:
    """
    Add two integers.

    Args:
        a (int): First addend.
        b (int): Second addend.

    Returns:
        int: Sum of a and b.
    """
    return a + b


# --- 2. Prepare Training Data ---
training_dataset = [
    {
        "query": "Use the add tool to calculate 5 + 7",
        "gold_answer": "12"
    }
]


# 3. Create agent
model = TransformersModel(
    model_id="Qwen/Qwen2.5-0.5B-Instruct",  # use a bigger model for better results
    max_new_tokens=128
)

agent = CodeAgent(
    model=model,
    tools=[add],
    max_steps=1
)

# 4. Create Brain

brain = Brain(
    agent,                          # Agent instance
    algorithm="GRPO",                # Algorithm choice
    reward_func=reward_exact_match  # A reward function, you can customise any python function as reward
)

# 5. Train the agent with GRPO steps
brain.train(training_dataset, num_iterations=10)

Results

The following plot illustrates how ToolBrain enhances the tool usage accuracy of the small Qwen/Qwen2.5-0.5B-Instruct model after just 20 training steps using GRPO.

GRPO learning curve

📄 License

This project is licensed under the MIT License - see the LICENSE for details.

🌍 Community contributions

Our vision is for ToolBrain to become the universal Reinforcement Learning layer for any agentic framework. Whether you build your agents with LangChain, LlamaIndex, AutoGen, or a custom solution, you should be able to make them smarter with ToolBrain.

The key to this vision is our modular Adapter architecture. Adding support for a new framework is as simple as implementing a new adapter that translates the agent's internal state into ToolBrain's standard Execution Trace.

We welcome community contributions!
If you are using an agent framework not yet supported, we encourage you to build an adapter for it.
Check out our CONTRIBUTING.md guide and the existing implementations in the toolbrain/adapters/ directory to get started.

Contributors

Quy Minh Le, Minh Sao Khue Luu, Khanh-Tung Tran, Duc-Hai Nguyen, Hoang-Quoc-Viet Pham, Quan Le, Hoang Thanh Lam and Harry Nguyen


🚀 Spread the Word

If you believe in ToolBrain's vision of making agent training accessible to everyone, please consider sharing it with your network!

Share on Twitter Share on LinkedIn Share on Facebook Share on Reddit


References

Please cite our paper with the following bibtex:

@misc{le2025toolbrainflexiblereinforcementlearning,
      title={ToolBrain: A Flexible Reinforcement Learning Framework for Agentic Tools}, 
      author={Quy Minh Le and Minh Sao Khue Luu and Khanh-Tung Tran and Duc-Hai Nguyen and Hoang-Quoc-Viet Pham and Quan Le and Hoang Thanh Lam and Hoang D. Nguyen},
      year={2025},
      eprint={2510.00023},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2510.00023}, 
}

Made with ❤️ by the ToolBrain Team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toolbrain-0.1.1.tar.gz (53.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toolbrain-0.1.1-py3-none-any.whl (60.3 kB view details)

Uploaded Python 3

File details

Details for the file toolbrain-0.1.1.tar.gz.

File metadata

  • Download URL: toolbrain-0.1.1.tar.gz
  • Upload date:
  • Size: 53.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for toolbrain-0.1.1.tar.gz
Algorithm Hash digest
SHA256 66b2353de4e1112e18c1030e00164ddb04bec9cc4251827884c3872b401e0f98
MD5 76a04ef44d6895a46b4f0b530417f917
BLAKE2b-256 92a1f03f1b87ced52b67fa9e35e66b924cd2992592d9f8735c102a3ba46a3d2e

See more details on using hashes here.

File details

Details for the file toolbrain-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: toolbrain-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 60.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for toolbrain-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 76936c47c81e4cf4bd40cd50ebe6d9d8d8535f09e9130d7a6972d2cf21dc409c
MD5 0f89713e4d3dd8f04349d97206d56a24
BLAKE2b-256 4f66351b0efa788ab3f276525b8c116c3f4c81b3e0897944fed266e91a261d5e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page