A framework for training LLM-powered agents to use tools more effectively using Reinforcement Learning
Project description
ToolBrain 🧠
ToolBrain is a lightweight open-source Python library for training agentic systems with effective tool usage and built-in reinforcement learning.
📚 Our website: toolbrain.org and Documentation & tutorials
Support us by giving ToolBrain a ⭐ on GitHub.
✨ Key Features
- 🤖 Learning algorithms: Supports GRPO, DPO, and supervised learning.
- 🎯 Flexible rewards: Define your own reward functions or use LLM-as-judge.
- 🔧 Tool management: Scalable retrieval for managing large tool collections.
- 📊 Knowledge distillation: Distill large teacher models into smaller student models for efficiency.
- 🚀 Zero-learn: Automatically generate training tasks.
- ⚡ Efficient training: Supports FP16 finetuning, LoRA, Unsloth, and BitsAndBytes for resource-efficient training.
- 🧠 Multiple agent frameworks: Supports SmolAgent and LangChain, with more coming soon.
🚀 Getting Started
Prerequisites
- Python 3.10+
Installation
Create conda env (optional)
conda create --name toolbrain python=3.12
conda activate toolbrain
from PyPi:
pip install toolbrain
Or from the source code:
git clone https://github.com/ToolBrain/ToolBrain.git
Enter the cloned folder and type:
pip install .
Run the Example
Run the complete example to see ToolBrain in action (please see under examples folder for more advanced usage examples):
python examples/01_run_hello_world.py
This will:
- Initialize a
CodeAgentwith simple math tools - Define a customised reward function
- Run the GRPO algorithm
📖 Usage Example
Here's a minimal example of how to use ToolBrain. This script demonstrates simplified ToolBrain API:
- Create a smolagent CodeAgent
- Create a brain with our main class Brain()
- Train the agent with the GRPO algorithm
from smolagents import tool, TransformersModel, CodeAgent
from toolbrain import Brain
from toolbrain.rewards import reward_exact_match
# --- 1. Define Tools and Reward Function (User-defined) ---
@tool
def add(a: int, b: int) -> int:
"""
Add two integers.
Args:
a (int): First addend.
b (int): Second addend.
Returns:
int: Sum of a and b.
"""
return a + b
# --- 2. Prepare Training Data ---
training_dataset = [
{
"query": "Use the add tool to calculate 5 + 7",
"gold_answer": "12"
}
]
# 3. Create agent
model = TransformersModel(
model_id="Qwen/Qwen2.5-0.5B-Instruct", # use a bigger model for better results
max_new_tokens=128
)
agent = CodeAgent(
model=model,
tools=[add],
max_steps=1
)
# 4. Create Brain
brain = Brain(
agent, # Agent instance
algorithm="GRPO", # Algorithm choice
reward_func=reward_exact_match # A reward function, you can customise any python function as reward
)
# 5. Train the agent with GRPO steps
brain.train(training_dataset, num_iterations=10)
Results
The following plot illustrates how ToolBrain enhances the tool usage accuracy of the small Qwen/Qwen2.5-0.5B-Instruct model after just 20 training steps using GRPO.
📄 License
This project is licensed under the MIT License - see the LICENSE for details.
🌍 Community contributions
Our vision is for ToolBrain to become the universal Reinforcement Learning layer for any agentic framework. Whether you build your agents with LangChain, SmolAgents, LlamaIndex, AutoGen, or a custom solution, you should be able to make them smarter with ToolBrain.
The key to this vision is our modular Adapter architecture. Adding support for a new framework is as simple as implementing a new adapter that translates the agent's internal state into ToolBrain's standard Execution Trace.
We welcome community contributions!
If you are using an agent framework not yet supported, we encourage you to build an adapter for it.
Check out our CONTRIBUTING.md guide and the existing implementations in the toolbrain/adapters/ directory to get started.
Contributors
Quy Minh Le, Minh Sao Khue Luu, Khanh-Tung Tran, Duc-Hai Nguyen, Hoang-Quoc-Viet Pham, Quan Le, Hoang Thanh Lam and Harry Nguyen
🚀 Spread the Word
If you believe in ToolBrain's vision of making agent training accessible to everyone, please consider sharing it with your network!
References
Please cite our paper with the following bibtex:
@misc{le2025toolbrainflexiblereinforcementlearning,
title={ToolBrain: A Flexible Reinforcement Learning Framework for Agentic Tools},
author={Quy Minh Le and Minh Sao Khue Luu and Khanh-Tung Tran and Duc-Hai Nguyen and Hoang-Quoc-Viet Pham and Quan Le and Hoang Thanh Lam and Hoang D. Nguyen},
year={2025},
eprint={2510.00023},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2510.00023},
}
Made with ❤️ by the ToolBrain Team
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file toolbrain-0.1.4.tar.gz.
File metadata
- Download URL: toolbrain-0.1.4.tar.gz
- Upload date:
- Size: 55.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6fa1c062a3964137d3dc1ebd0b11a7756bf3994c59bd616778d15c2bde991381
|
|
| MD5 |
3afdd1562023ccfc1ab56c240d6ba3dd
|
|
| BLAKE2b-256 |
76fdc68d229d07f048f006ed7d9811675598386881b0e9bd40f3c27772277ee1
|
File details
Details for the file toolbrain-0.1.4-py3-none-any.whl.
File metadata
- Download URL: toolbrain-0.1.4-py3-none-any.whl
- Upload date:
- Size: 62.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ad159a70295ce6d0f1e7aeb892ff8f496879afa7a6a19aa46ddf65fc9b17c089
|
|
| MD5 |
ddb4287580f81c443c2410418b332fe1
|
|
| BLAKE2b-256 |
ca8689791dc65506bbf3b551b3dd7b0d90a7d78532f72688355e41f4f7b72ae0
|