Semantic tool routing for MCP. Stop injecting all schemas. Inject only what matters.
Project description
tool-rot
Semantic tool routing for MCP. Keep every tool callable. Stop sending every schema.
tool-rot is a transparent MCP proxy for teams running many MCP servers. It indexes all upstream tools locally, returns only the most relevant tool schemas on each turn, and forwards tool calls to the original server unchanged.
$ tool-rot bench --k 3
Queries tested 100
K (tools injected) 3
Recall@K 91.3% (correct tool was in top-3)
Avg token reduction 94.6%
Avg routing latency 8.7ms
Index build time 420ms
Why This Exists
MCP tool schemas are useful context, but they get expensive fast. A few servers can add thousands of schema tokens before the model sees the user's actual request. Most of those tools are irrelevant on any given turn.
That creates three problems:
- Cost: repeated schema tokens are paid for every turn.
- Attention: the model has to scan unused tools before solving the task.
- Accuracy: large tool menus make wrong tool selection more likely.
tool-rot treats tool schemas like a retrieval problem. The full toolset stays available behind the proxy, while the model only sees the schemas that are likely to matter right now.
What It Guarantees
- Execution is never blocked by routing. Filtering only affects
tools/list;tools/callstill forwards to the upstream server. - Tool names are namespaced. Exposed tools use MCP-compliant
server.toolnames, so duplicate names across servers do not collide. - Resources and prompts pass through. Tool schemas are routed; resources and prompts are aggregated and forwarded without semantic filtering.
- Routing is local. MiniLM ONNX embeddings and FAISS run locally; no routing API key is required.
- Schemas are slimmed. Nonessential display metadata is removed before indexing and injection.
- Misses are observable. Tool-call hits and misses are logged, and
statuscan recommend a higherk.
Install
pip install tool-rot
For SSE transport support:
pip install "tool-rot[sse]"
Quick Start
One-line Cursor integration:
pip install tool-rot && tool-rot init cursor --filesystem .
This creates tool-rot.toml and .cursor/mcp.json with local filesystem MCP routing enabled.
Create tool-rot.toml:
[proxy]
transport = "stdio"
k = 3
max_k = 8
auto_tune_k = false
always_inject = ["filesystem.read_file"]
[routing]
embedder = "all-MiniLM-L6-v2"
index_type = "flat"
ranking_mode = "hybrid"
cache_dir = ".tool-rot"
[logging]
session_log = ".tool-rot/session.jsonl"
verbose = false
[[server]]
name = "filesystem"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-filesystem", "."]
[[server]]
name = "github"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-github"]
env = { GITHUB_TOKEN = "${GITHUB_TOKEN}" }
Generate client config:
# Claude Code
tool-rot cc-snippet
# Cursor
tool-rot cursor-snippet
Then start using your MCP client normally. The client connects to tool-rot; tool-rot connects to your real MCP servers.
Architecture
MCP client
|
| tools/list, tools/call
v
tool-rot proxy
|
| list_tools / call_tool
+--> filesystem MCP
+--> github MCP
+--> slack MCP
+--> ...
Routing sidecar:
tool schemas -> schema slimming -> embeddings -> FAISS index
user query -> hybrid semantic + lexical retrieval -> top-K schemas
Turn flow:
- A client asks for
tools/list. tool-rotroutes against the current query context.- It returns the top-K namespaced schemas plus
always_injecttools. - If the model calls a tool,
tool-rotforwards the call to the matching upstream server. - Hits, misses, token estimates, and latency are logged to
.tool-rot/session.jsonl.
Claude Code can provide per-turn query context through the packaged tool-rot-hook. Other hosts can POST {"query": "..."} to http://127.0.0.1:4748/query before tools/list. Without query context, tool-rot falls back to default routing.
CLI
# Start the proxy
tool-rot serve --config tool-rot.toml
# Show session stats and K recommendations
tool-rot status
tool-rot status --json
# Benchmark Recall@K and token reduction
tool-rot bench --k 3
tool-rot bench --tools tools.json --queries queries.jsonl --k 3 --json
# Compare routing against all-tools and compression baselines
tool-rot eval-report --k 3
tool-rot eval-report --k 3 --output eval.json
# Regenerate real open-source MCP savings reports
tool-rot mcp-smoke-report --output-dir reports
# Inspect or clear the local index
tool-rot index show
tool-rot index show --server github
tool-rot index rebuild
# Generate client snippets
tool-rot cc-snippet
tool-rot cursor-snippet
See docs/CLIENT_SETUP.md for exact Cursor and Claude Code configuration examples.
Benchmark And Eval
bench answers: "If I inject K tools, how often is the correct tool included?"
tool-rot bench --k 3
Custom eval files:
[
{
"server_name": "github",
"name": "create_pull_request",
"description": "Create a pull request in a GitHub repository.",
"inputSchema": {
"type": "object",
"properties": {
"repo": { "type": "string" },
"title": { "type": "string" }
}
}
}
]
{"query":"open a pull request for this branch","correct_tool":"github.create_pull_request"}
eval-report compares:
all_tools: sends every schema, 100% tool visibility, no token reduction.compression: slims every schema, still sends all tools.semantic: vector retrieval only.hybrid: vector retrieval plus lexical/BM25-like scoring.
Use this before rolling out a large MCP setup. Real tool names, descriptions, and query patterns matter.
Real MCP Results
These results were measured against official open-source MCP servers from npm. Token counts use len(minified_json) // 4, so treat them as comparable estimates rather than provider billing numbers.
| MCP setup | Direct tools | tool-rot tools | Tokens saved | Reduction |
|---|---|---|---|---|
| Filesystem | 14 | 3 | 2,094 | 79.2% |
| Memory | 9 | 2 | 1,736 | 75.5% |
| Sequential Thinking | 1 | 1 | 10 | 0.9% |
| Everything | 13 | 2 | 1,267 | 87.3% |
| Combined official stack | 37 | 7 | 5,436 | 71.6% |
Full reports live in reports/. The single-tool Sequential Thinking server is a useful negative control: routing cannot save much when a server only exposes one tool.
To reproduce these results, run:
tool-rot mcp-smoke-report --output-dir reports
See docs/REPRODUCING_RESULTS.md for details.
Configuration Reference
[proxy]
transport = "stdio" # stdio | sse
port = 4747 # SSE listener port
context_api_port = 4748 # Local query-context API
context_api_token = "" # Optional bearer token; env TOOL_ROT_CONTEXT_TOKEN also works
k = 3 # Initial tools injected per turn
max_k = 8 # Upper bound for auto-tuned K
auto_tune_k = false # Increase effective K after routing misses
always_inject = ["server.tool"] # Tools always returned by tools/list
[routing]
embedder = "all-MiniLM-L6-v2" # all-MiniLM-L6-v2 | tfidf
index_type = "flat" # flat | hnsw
ranking_mode = "hybrid" # hybrid | semantic
cache_dir = ".tool-rot"
[logging]
session_log = ".tool-rot/session.jsonl"
verbose = false
log_query_preview = false # Avoid logging prompt text by default
[[server]]
name = "local-server"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-name"]
env = { API_KEY = "${API_KEY}" }
[[server]]
name = "remote-server"
url = "https://example.com/sse"
Production Notes
- Use namespaced
always_injectvalues such asfilesystem.read_file. - Keep tool descriptions short but specific; routing quality depends on schema quality.
- Start with
k = 3, runbench, then adjust based on Recall@K and token savings. - Enable
auto_tune_kwhen correctness matters more than maximum token savings. - Watch
.tool-rot/session.jsonlortool-rot statusfor routing misses. - Upstream
notifications/tools/list_changedevents trigger a tool refresh and index rebuild. - See
docs/ROUTING_GUIDE.mdfor recommendedkvalues by tool count.
Client Setup
Claude Code
tool-rot cc-snippet
Add the generated MCP config and hook config to .claude/settings.json. The hook sends the latest user query to the local context API so routing is accurate from the first turn.
Cursor
tool-rot cursor-snippet
Add the generated JSON to .cursor/mcp.json or your user-level Cursor MCP config. Cursor can use tool-rot as a normal MCP server over stdio or sse.
Development
python3.12 -m venv .venv
.venv/bin/python -m pip install -e ".[dev,sse]"
.venv/bin/python -m pytest tests -q
Focused checks:
.venv/bin/python -m pytest tests/test_prod_hardening.py tests/test_p0_p1.py -q
.venv/bin/python -m compileall tool_rot tests -q
Release steps are documented in docs/RELEASE_CHECKLIST.md.
Status
tool-rot is intended for local and team MCP workflows where large tool menus are creating measurable context overhead. It is designed to be conservative: route schemas aggressively, but keep execution passthrough intact.
Known limits:
- Query context is best when the host can send the latest user message before
tools/list. - Resource and prompt passthrough is aggregation-only; semantic routing is currently tools-only.
- HNSW is available as a config option, but you should benchmark it on your own tool corpus before using it for large deployments.
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tool_rot-0.1.0.tar.gz.
File metadata
- Download URL: tool_rot-0.1.0.tar.gz
- Upload date:
- Size: 58.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7a04da16afc89bce501b004dacb618a26f9f395a7b7ec9629596661bda01c0e
|
|
| MD5 |
fb10184f955c4e3a26396ff727d6f8ac
|
|
| BLAKE2b-256 |
0762f582cd53047dd593714545135a808fe7a458fbbbea9b5bcdbd66b8edc350
|
File details
Details for the file tool_rot-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tool_rot-0.1.0-py3-none-any.whl
- Upload date:
- Size: 42.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b70791b0dcbdd5df8285848ef0f616150f16713f10713b5d835d9f9ed9b9633
|
|
| MD5 |
5c729db19abe7f3c3cab54448bd6c7a6
|
|
| BLAKE2b-256 |
c7e472af2a538164bc9e2b457efeb68e3689e5607a51f0a67355667d40078319
|