Python implementation of ToolTree (ICLR 2026): MCTS-based tool planning for LLM agents with dual-feedback evaluation and bidirectional pruning
Project description
ToolTree
This is an open-source Python implementation of the paper: ToolTree: Efficient LLM Agent Tool Planning via Dual-Feedback Monte Carlo Tree Search and Bidirectional Pruning Shuo Yang, Soyeon Caren Han, Yihao Ding, Shuhe Wang, Eduard Hoy — ICLR 2026
All credit for the algorithm, equations, and core ideas belongs to the original authors. This repo provides a pip-installable implementation for practical use.
MCTS-based tool planning for LLM agents. Instead of picking tools greedily, ToolTree runs a Monte Carlo Tree Search over possible tool sequences, scores each path with dual LLM feedback, and returns the optimal multi-step plan.
Why ToolTree
When an LLM agent needs to answer a question using tools, it typically tries one tool at a time and hopes for the best. This works for simple tasks but fails on multi-hop reasoning — where the output of tool A determines which tool to call next, and a wrong first choice wastes all subsequent calls.
ToolTree treats tool selection as a search problem:
- Explores multiple tool sequences in parallel
- Prunes bad branches before executing them (saves API calls)
- Prunes branches after seeing bad outputs (stops wasting rollouts)
- Caches duplicate tool calls across the tree
- Stops early when the search converges
Install
# Core library (no LLM provider)
pip install tooltree
# With OpenAI (GPT-4o, GPT-5.1, etc.)
pip install tooltree[openai]
# With Google Gemini
pip install tooltree[gemini]
# With Anthropic Claude
pip install tooltree[anthropic]
# Everything
pip install tooltree[all]
Requires Python 3.10+.
Quick Start
from tooltree import ToolTreePlanner, Tool
# 1. Define your tools
def search_web(query: str) -> str:
# your implementation
return f"Results for {query}"
search_tool = Tool(
name="search_web",
description="Search the web for factual information.",
parameters={
"query": {"type": "string", "description": "Search query"},
},
execute=search_web,
)
# 2. Create the planner
planner = ToolTreePlanner(
tools=[search_tool],
llm="openai:gpt-4o-mini", # or "gemini:gemini-2.0-flash", "anthropic:claude-haiku-4-5-20251001"
)
# 3. Solve
result = planner.solve_sync("What is the boiling point of ethanol in Celsius?")
print(result.answer)
print(result.tool_trace) # list of {tool, arguments, output}
Set your API key before running:
# OpenAI
export OPENAI_API_KEY="sk-..."
# Gemini
export GEMINI_API_KEY="..."
# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
Multi-hop Example
The real value of ToolTree shows on tasks that require chaining tools — where you can't know which tool to call second until you see the output of the first.
from tooltree import ToolTreePlanner, Tool, Config
tools = [
Tool(
name="search_in_file",
description="Search for a regex pattern in a file. Returns matching lines with line numbers.",
parameters={
"path": {"type": "string", "description": "File path"},
"pattern": {"type": "string", "description": "Regex pattern"},
},
execute=search_in_file,
),
Tool(
name="read_file_lines",
description="Read a range of lines from a file. Use after search_in_file.",
parameters={
"path": {"type": "string", "description": "File path"},
"start_line": {"type": "string", "description": "First line (1-indexed)"},
"end_line": {"type": "string", "description": "Last line (inclusive)"},
},
execute=read_file_lines,
),
]
config = Config(
max_rollouts=20,
max_depth=6,
tau_pre=0.3,
tau_post=0.4,
)
planner = ToolTreePlanner(tools=tools, llm="openai:gpt-4o-mini", config=config)
result = planner.solve_sync(
"What is the default value of max_rollouts in the Config class?"
)
# MCTS discovers: search_in_file -> read_file_lines (2-hop chain)
See examples/integration_test.py for a full runnable version with 4 test queries.
Configuration
from tooltree import Config
config = Config(
max_rollouts=60, # Max MCTS iterations (default: 60)
max_depth=15, # Max tool chain length (default: 15)
tau_pre=0.3, # Pre-pruning threshold — candidates below this are dropped before execution
tau_post=0.4, # Post-pruning threshold — branches below this stop expanding
top_k=20, # Keep top-K candidates after pre-pruning
lambda_exploration=1.4, # UCT exploration constant (paper Eq. 1)
early_stop_patience=10, # Stop after N rollouts with no Q improvement
early_stop_delta=1e-3, # Minimum Q improvement to count as progress
tool_timeout=30.0, # Max seconds per tool call
max_concurrent_llm_calls=5 # Parallel LLM calls during expand phase
)
| Parameter | Default | What it controls |
|---|---|---|
max_rollouts |
60 | How many MCTS iterations to run. More = better quality, higher cost. |
max_depth |
15 | Maximum number of tool calls in a single path. |
tau_pre |
0.3 | Pre-execution pruning threshold. Raise to prune more aggressively. |
tau_post |
0.4 | Post-execution pruning threshold. Raise to abandon bad branches faster. |
top_k |
20 | Max candidates kept after pre-pruning. |
lambda_exploration |
1.4 | UCT exploration constant. Higher = more exploration of untested paths. |
early_stop_patience |
10 | Rollouts without improvement before stopping early. |
tool_timeout |
30.0 | Seconds before a tool call is killed. |
max_concurrent_llm_calls |
5 | Semaphore limit for parallel LLM calls (rate limit protection). |
LLM Providers
| Provider | Spec prefix | Extra | API key env var |
|---|---|---|---|
| OpenAI | openai: |
tooltree[openai] |
OPENAI_API_KEY |
| Google Gemini | gemini: |
tooltree[gemini] |
GEMINI_API_KEY or GOOGLE_API_KEY |
| Anthropic Claude | anthropic: |
tooltree[anthropic] |
ANTHROPIC_API_KEY |
Pass as a string to ToolTreePlanner:
planner = ToolTreePlanner(tools=tools, llm="openai:gpt-4o-mini")
planner = ToolTreePlanner(tools=tools, llm="gemini:gemini-2.0-flash")
planner = ToolTreePlanner(tools=tools, llm="anthropic:claude-haiku-4-5-20251001")
Or pass a provider instance directly:
from tooltree.llm import create_llm
llm = create_llm("openai:gpt-4o-mini", temperature=0.2, max_tokens=512)
planner = ToolTreePlanner(tools=tools, llm=llm)
Use a separate, stronger model as the judge (evaluates tool outputs while a cheaper model drives search):
planner = ToolTreePlanner(
tools=tools,
llm="openai:gpt-4o-mini", # drives search (more calls)
judge_llm="openai:gpt-4o", # evaluates quality (fewer calls)
)
Custom / Local LLMs
Any class that extends LLMProvider works:
from tooltree.llm.base import LLMProvider
class MyProvider(LLMProvider):
async def complete(self, messages: list[dict], **kwargs) -> str:
# call your local model / API
...
planner = ToolTreePlanner(tools=tools, llm=MyProvider())
API Reference
ToolTreePlanner
ToolTreePlanner(
tools: Sequence[Tool],
llm: str | LLMProvider = "gemini:gemini-2.0-flash",
config: Config | None = None,
judge_llm: str | LLMProvider | None = None,
)
| Method | Description |
|---|---|
solve(query) -> Awaitable[SearchResult] |
Async search |
solve_sync(query) -> SearchResult |
Sync wrapper — works in scripts, Jupyter, and nested event loops |
SearchResult
result.answer # str — final answer synthesized from best trajectory
result.tool_trace # list[dict] — [{tool, arguments, output}, ...]
result.total_rollouts # int — iterations run
result.best_q # float — best Q-value (0.0 to 1.0)
result.tree_stats # dict — {total_nodes, max_depth_reached, early_stopped}
Tool
Tool(
name: str, # e.g. "search_web"
description: str, # Shown to the LLM — be specific
parameters: dict[str, Any], # JSON Schema-style: {param: {type, description}}
execute: Callable, # Sync or async function
)
How It Works
ToolTree runs MCTS over the space of possible tool call sequences:
- Select — Traverse the tree via UCT score until a leaf node
- Expand — Ask the LLM to generate candidate tool calls; pre-evaluate each (score before executing); drop candidates below
tau_pre; keep top-K - Execute — Run the chosen tool with timeout; cache the result
- Post-evaluate — Score the actual output; if below
tau_post, mark the branch as non-expandable - Backpropagate — Update Q-values up to root
- Repeat until
max_rolloutsor early stop
After search: extract the highest-Q path, ask the LLM to synthesize a final answer from the trajectory.
Dual pruning means bad tool calls are rejected at two checkpoints — before they waste API calls, and after they waste search budget. This keeps rollout cost low on multi-hop tasks where the naive approach would try many dead-end paths.
Running Tests
# Install dev dependencies
pip install -e ".[dev]"
# Run all unit tests (no API key needed — uses MockLLM)
pytest tests/ -v
# Run integration test (requires API key)
TOOLTREE_MODEL=openai:gpt-4o-mini python -u examples/integration_test.py
Project Structure
src/tooltree/
├── planner.py # ToolTreePlanner — main entry point
├── config.py # Config dataclass
├── core/
│ ├── tree.py # Action, SearchState, MCTSNode
│ ├── mcts.py # MCTSPlanner — full search loop
│ ├── pruning.py # pre_prune(), post_prune()
│ └── cache.py # ResultCache — deduplicates tool calls
├── evaluation/
│ ├── pre_evaluator.py # Scores tool calls before execution
│ ├── post_evaluator.py # Scores tool outputs after execution
│ └── prompts.py # LLM judge prompts (paper Appendix B)
├── llm/
│ ├── base.py # LLMProvider ABC, create_llm() factory
│ ├── openai_provider.py
│ ├── gemini_provider.py
│ └── anthropic_provider.py
└── tools/
├── schema.py # Tool dataclass
├── registry.py # ToolRegistry
└── executor.py # ToolExecutor with timeout
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tooltree_mcts-0.1.0.tar.gz.
File metadata
- Download URL: tooltree_mcts-0.1.0.tar.gz
- Upload date:
- Size: 70.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95bd8e6fc9c027ddf7fac9eb756cb6fac9d6878309b86cb28f457266f4894c12
|
|
| MD5 |
93ca2eee3b10a22d22012eccfeb90026
|
|
| BLAKE2b-256 |
850f6aec7cf012a404be7c27f8ee4e3802693a5d4cb003bd8005015b21ba2a0c
|
File details
Details for the file tooltree_mcts-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tooltree_mcts-0.1.0-py3-none-any.whl
- Upload date:
- Size: 35.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f05d4e5103ad9d0b6f2f8eae35209e9a376b12f7825b644660856bfc1a6f82d
|
|
| MD5 |
a57c0896cd109ddd472f8e9e928d231b
|
|
| BLAKE2b-256 |
a00a2341e96588227d34575dcf63d0f77d872eaf8ab2358234460331ddd993fa
|