Python implementation of ToolTree (ICLR 2026): MCTS-based tool planning for LLM agents with dual-feedback evaluation and bidirectional pruning

These details have not been verified by PyPI

Project description

ToolTree

This is an open-source Python implementation of the paper: ToolTree: Efficient LLM Agent Tool Planning via Dual-Feedback Monte Carlo Tree Search and Bidirectional Pruning Shuo Yang, Soyeon Caren Han, Yihao Ding, Shuhe Wang, Eduard Hoy — ICLR 2026

All credit for the algorithm, equations, and core ideas belongs to the original authors. This repo provides a pip-installable implementation for practical use.

MCTS-based tool planning for LLM agents. Instead of picking tools greedily, ToolTree runs a Monte Carlo Tree Search over possible tool sequences, scores each path with dual LLM feedback, and returns the optimal multi-step plan.

Why ToolTree

When an LLM agent needs to answer a question using tools, it typically tries one tool at a time and hopes for the best. This works for simple tasks but fails on multi-hop reasoning — where the output of tool A determines which tool to call next, and a wrong first choice wastes all subsequent calls.

ToolTree treats tool selection as a search problem:

Explores multiple tool sequences in parallel
Prunes bad branches before executing them (saves API calls)
Prunes branches after seeing bad outputs (stops wasting rollouts)
Caches duplicate tool calls across the tree
Stops early when the search converges

Install

# Core library (no LLM provider)
pip install tooltree

# With OpenAI (GPT-4o, GPT-5.1, etc.)
pip install tooltree[openai]

# With Google Gemini
pip install tooltree[gemini]

# With Anthropic Claude
pip install tooltree[anthropic]

# Everything
pip install tooltree[all]

Requires Python 3.10+.

Quick Start

from tooltree import ToolTreePlanner, Tool

# 1. Define your tools
def search_web(query: str) -> str:
    # your implementation
    return f"Results for {query}"

search_tool = Tool(
    name="search_web",
    description="Search the web for factual information.",
    parameters={
        "query": {"type": "string", "description": "Search query"},
    },
    execute=search_web,
)

# 2. Create the planner
planner = ToolTreePlanner(
    tools=[search_tool],
    llm="openai:gpt-4o-mini",   # or "gemini:gemini-2.0-flash", "anthropic:claude-haiku-4-5-20251001"
)

# 3. Solve
result = planner.solve_sync("What is the boiling point of ethanol in Celsius?")

print(result.answer)
print(result.tool_trace)   # list of {tool, arguments, output}

Set your API key before running:

# OpenAI
export OPENAI_API_KEY="sk-..."

# Gemini
export GEMINI_API_KEY="..."

# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

Multi-hop Example

The real value of ToolTree shows on tasks that require chaining tools — where you can't know which tool to call second until you see the output of the first.

from tooltree import ToolTreePlanner, Tool, Config

tools = [
    Tool(
        name="search_in_file",
        description="Search for a regex pattern in a file. Returns matching lines with line numbers.",
        parameters={
            "path":    {"type": "string", "description": "File path"},
            "pattern": {"type": "string", "description": "Regex pattern"},
        },
        execute=search_in_file,
    ),
    Tool(
        name="read_file_lines",
        description="Read a range of lines from a file. Use after search_in_file.",
        parameters={
            "path":       {"type": "string", "description": "File path"},
            "start_line": {"type": "string", "description": "First line (1-indexed)"},
            "end_line":   {"type": "string", "description": "Last line (inclusive)"},
        },
        execute=read_file_lines,
    ),
]

config = Config(
    max_rollouts=20,
    max_depth=6,
    tau_pre=0.3,
    tau_post=0.4,
)

planner = ToolTreePlanner(tools=tools, llm="openai:gpt-4o-mini", config=config)

result = planner.solve_sync(
    "What is the default value of max_rollouts in the Config class?"
)
# MCTS discovers: search_in_file -> read_file_lines (2-hop chain)

See examples/integration_test.py for a full runnable version with 4 test queries.

Configuration

from tooltree import Config

config = Config(
    max_rollouts=60,           # Max MCTS iterations (default: 60)
    max_depth=15,              # Max tool chain length (default: 15)
    tau_pre=0.3,               # Pre-pruning threshold — candidates below this are dropped before execution
    tau_post=0.4,              # Post-pruning threshold — branches below this stop expanding
    top_k=20,                  # Keep top-K candidates after pre-pruning
    lambda_exploration=1.4,    # UCT exploration constant (paper Eq. 1)
    early_stop_patience=10,    # Stop after N rollouts with no Q improvement
    early_stop_delta=1e-3,     # Minimum Q improvement to count as progress
    tool_timeout=30.0,         # Max seconds per tool call
    max_concurrent_llm_calls=5 # Parallel LLM calls during expand phase
)

Parameter	Default	What it controls
`max_rollouts`	60	How many MCTS iterations to run. More = better quality, higher cost.
`max_depth`	15	Maximum number of tool calls in a single path.
`tau_pre`	0.3	Pre-execution pruning threshold. Raise to prune more aggressively.
`tau_post`	0.4	Post-execution pruning threshold. Raise to abandon bad branches faster.
`top_k`	20	Max candidates kept after pre-pruning.
`lambda_exploration`	1.4	UCT exploration constant. Higher = more exploration of untested paths.
`early_stop_patience`	10	Rollouts without improvement before stopping early.
`tool_timeout`	30.0	Seconds before a tool call is killed.
`max_concurrent_llm_calls`	5	Semaphore limit for parallel LLM calls (rate limit protection).

LLM Providers

Provider	Spec prefix	Extra	API key env var
OpenAI	`openai:`	`tooltree[openai]`	`OPENAI_API_KEY`
Google Gemini	`gemini:`	`tooltree[gemini]`	`GEMINI_API_KEY` or `GOOGLE_API_KEY`
Anthropic Claude	`anthropic:`	`tooltree[anthropic]`	`ANTHROPIC_API_KEY`

Pass as a string to ToolTreePlanner:

planner = ToolTreePlanner(tools=tools, llm="openai:gpt-4o-mini")
planner = ToolTreePlanner(tools=tools, llm="gemini:gemini-2.0-flash")
planner = ToolTreePlanner(tools=tools, llm="anthropic:claude-haiku-4-5-20251001")

Or pass a provider instance directly:

from tooltree.llm import create_llm

llm = create_llm("openai:gpt-4o-mini", temperature=0.2, max_tokens=512)
planner = ToolTreePlanner(tools=tools, llm=llm)

Use a separate, stronger model as the judge (evaluates tool outputs while a cheaper model drives search):

planner = ToolTreePlanner(
    tools=tools,
    llm="openai:gpt-4o-mini",      # drives search (more calls)
    judge_llm="openai:gpt-4o",     # evaluates quality (fewer calls)
)

Custom / Local LLMs

Any class that extends LLMProvider works:

from tooltree.llm.base import LLMProvider

class MyProvider(LLMProvider):
    async def complete(self, messages: list[dict], **kwargs) -> str:
        # call your local model / API
        ...

planner = ToolTreePlanner(tools=tools, llm=MyProvider())

API Reference

`ToolTreePlanner`

ToolTreePlanner(
    tools: Sequence[Tool],
    llm: str | LLMProvider = "gemini:gemini-2.0-flash",
    config: Config | None = None,
    judge_llm: str | LLMProvider | None = None,
)

Method	Description
`solve(query) -> Awaitable[SearchResult]`	Async search
`solve_sync(query) -> SearchResult`	Sync wrapper — works in scripts, Jupyter, and nested event loops

`SearchResult`

result.answer          # str — final answer synthesized from best trajectory
result.tool_trace      # list[dict] — [{tool, arguments, output}, ...]
result.total_rollouts  # int — iterations run
result.best_q          # float — best Q-value (0.0 to 1.0)
result.tree_stats      # dict — {total_nodes, max_depth_reached, early_stopped}

`Tool`

Tool(
    name: str,                          # e.g. "search_web"
    description: str,                   # Shown to the LLM — be specific
    parameters: dict[str, Any],         # JSON Schema-style: {param: {type, description}}
    execute: Callable,                  # Sync or async function
)

How It Works

ToolTree runs MCTS over the space of possible tool call sequences:

Select — Traverse the tree via UCT score until a leaf node
Expand — Ask the LLM to generate candidate tool calls; pre-evaluate each (score before executing); drop candidates below tau_pre; keep top-K
Execute — Run the chosen tool with timeout; cache the result
Post-evaluate — Score the actual output; if below tau_post, mark the branch as non-expandable
Backpropagate — Update Q-values up to root
Repeat until max_rollouts or early stop

After search: extract the highest-Q path, ask the LLM to synthesize a final answer from the trajectory.

Dual pruning means bad tool calls are rejected at two checkpoints — before they waste API calls, and after they waste search budget. This keeps rollout cost low on multi-hop tasks where the naive approach would try many dead-end paths.

Running Tests

# Install dev dependencies
pip install -e ".[dev]"

# Run all unit tests (no API key needed — uses MockLLM)
pytest tests/ -v

# Run integration test (requires API key)
TOOLTREE_MODEL=openai:gpt-4o-mini python -u examples/integration_test.py

Project Structure

src/tooltree/
├── planner.py              # ToolTreePlanner — main entry point
├── config.py               # Config dataclass
├── core/
│   ├── tree.py             # Action, SearchState, MCTSNode
│   ├── mcts.py             # MCTSPlanner — full search loop
│   ├── pruning.py          # pre_prune(), post_prune()
│   └── cache.py            # ResultCache — deduplicates tool calls
├── evaluation/
│   ├── pre_evaluator.py    # Scores tool calls before execution
│   ├── post_evaluator.py   # Scores tool outputs after execution
│   └── prompts.py          # LLM judge prompts (paper Appendix B)
├── llm/
│   ├── base.py             # LLMProvider ABC, create_llm() factory
│   ├── openai_provider.py
│   ├── gemini_provider.py
│   └── anthropic_provider.py
└── tools/
    ├── schema.py           # Tool dataclass
    ├── registry.py         # ToolRegistry
    └── executor.py         # ToolExecutor with timeout

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Apr 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tooltree_mcts-0.1.0.tar.gz (70.8 kB view details)

Uploaded Apr 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tooltree_mcts-0.1.0-py3-none-any.whl (35.7 kB view details)

Uploaded Apr 9, 2026 Python 3

File details

Details for the file tooltree_mcts-0.1.0.tar.gz.

File metadata

Download URL: tooltree_mcts-0.1.0.tar.gz
Upload date: Apr 9, 2026
Size: 70.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for tooltree_mcts-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`95bd8e6fc9c027ddf7fac9eb756cb6fac9d6878309b86cb28f457266f4894c12`
MD5	`93ca2eee3b10a22d22012eccfeb90026`
BLAKE2b-256	`850f6aec7cf012a404be7c27f8ee4e3802693a5d4cb003bd8005015b21ba2a0c`

See more details on using hashes here.

File details

Details for the file tooltree_mcts-0.1.0-py3-none-any.whl.

File metadata

Download URL: tooltree_mcts-0.1.0-py3-none-any.whl
Upload date: Apr 9, 2026
Size: 35.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for tooltree_mcts-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3f05d4e5103ad9d0b6f2f8eae35209e9a376b12f7825b644660856bfc1a6f82d`
MD5	`a57c0896cd109ddd472f8e9e928d231b`
BLAKE2b-256	`a00a2341e96588227d34575dcf63d0f77d872eaf8ab2358234460331ddd993fa`

See more details on using hashes here.

tooltree-mcts 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

ToolTree

Why ToolTree

Install

Quick Start

Multi-hop Example

Configuration

LLM Providers

Custom / Local LLMs

API Reference

`ToolTreePlanner`

`SearchResult`

`Tool`

How It Works

Running Tests

Project Structure

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes