Intelligent middleware for AI agent tool orchestration
Project description
ToolFusion
Stop wasting tokens. Start fusing results.
ToolFusion is async-first middleware that sits between your AI agent framework and your tools. It eliminates redundant tool calls, deduplicates overlapping results, resolves conflicting outputs across tools, and fuses multi-source data — making every agent faster, cheaper, and more reliable.
The Problem
Every production AI agent system hits the same wall:
Agent: "What's the weather in NYC?"
→ Tool A: calls OpenWeatherMap API ← $0.001, 200ms
→ Tool B: calls WeatherAPI ← $0.001, 300ms
→ Tool A again (retry/loop): same API call ← $0.001, 200ms ← WASTED
→ Agent context now has 3 overlapping results with conflicting temperatures
→ LLM processes all 3 → extra tokens → higher cost → possible hallucination
| Problem | Impact | How ToolFusion Fixes It |
|---|---|---|
| Duplicate tool calls — LLMs call the same tool repeatedly with identical/similar params | Wasted API calls, latency, cost | Exact + semantic caching, request coalescing |
| Redundant results in context — multiple tools return overlapping info | Token waste, context pollution | Two-stage deduplication (SimHash → semantic) |
| Conflicting tool outputs — two tools report different values for the same fact | LLM hallucinates or picks arbitrarily | Source-weighted conflict resolution |
| Thundering herds — cache expiry causes simultaneous re-execution storms | Backend overload, spikes | TTL jitter + single-flight coalescing |
| No visibility — developers can't see what tools are doing | Silent failures, impossible debugging | Structured telemetry + result envelopes |
| Framework lock-in — caching techniques are framework-specific | Can't reuse across LangChain, CrewAI, OpenAI, etc. | Framework-agnostic middleware with adapters |
Before vs After ToolFusion
# ❌ WITHOUT ToolFusion — every call hits the API, duplicates pile up
results = []
for query in ["python async", "python asynchronous", "async python"]:
result = await web_search(query) # 3 API calls for ~same query
results.append(result) # 3 overlapping results in context
# Agent processes all 3 → wasted tokens, possible conflicts
# ✅ WITH ToolFusion — duplicates caught, results fused
tf = ToolFusion(preset="balanced")
@tf.tool(cache_mode="semantic_ok", ttl=600)
async def web_search(query: str) -> dict:
return await call_search_api(query)
r1 = await tf.call("web_search", {"query": "python async"}) # executes
r2 = await tf.call("web_search", {"query": "python async"}) # L1 cache hit (0ms)
r3 = await tf.call("web_search", {"query": "python asynchronous"}) # L2 semantic hit (~1ms)
fused = await tf.fuse([r1, r3]) # deduplicates + merges
# → 1 API call instead of 3, clean unified result, full provenance
Install
pip install toolfusion
Optional extras for production use:
# Redis cache backend + FAISS vector search
pip install "toolfusion[redis,faiss]"
# Full stack: Redis, FAISS, spaCy NER, sentence-transformers, OpenTelemetry, LLM fusion
pip install "toolfusion[all]"
Available extras
| Extra | Packages | Use Case |
|---|---|---|
redis |
redis[hiredis] |
Distributed cache backend + vector search |
faiss |
faiss-cpu |
Fast similarity search for semantic cache |
spacy |
spacy |
Entity extraction for fusion/conflict resolution |
transformers |
sentence-transformers, torch |
Higher-accuracy embeddings (accurate preset) |
otel |
opentelemetry-api, opentelemetry-sdk |
Distributed tracing and metrics |
llm |
openai |
LLM-based fusion strategy |
all |
All of the above | Everything |
dev |
pytest, pytest-asyncio, ruff |
Development/testing |
Quick Start
Async (recommended)
import asyncio
from toolfusion import ToolFusion
async def main():
async with ToolFusion(preset="balanced") as tf:
@tf.tool(cache_mode="semantic_ok", ttl=300, freshness="daily")
async def search(query: str) -> dict:
# Your actual tool implementation
return {"results": [f"Result for: {query}"]}
# First call — executes the tool
r1 = await tf.call("search", {"query": "python async patterns"})
print(r1.cache_info.source) # "miss"
# Identical call — instant L1 cache hit
r2 = await tf.call("search", {"query": "python async patterns"})
print(r2.cache_info.source) # "l1_cache"
# Similar call — semantic L2 cache hit
r3 = await tf.call("search", {"query": "python asynchronous patterns"})
print(r3.cache_info.source) # "l2_cache"
asyncio.run(main())
Sync
from toolfusion import ToolFusion
with ToolFusion(preset="fast") as tf:
@tf.tool(cache_mode="exact_only", ttl=60)
def calculate(x: int, y: int) -> int:
return x + y
r = calculate(2, 3)
print(r.result) # 5
Framework Adapters
ToolFusion works with any agent framework:
# LangChain
from toolfusion.adapters import langchain_adapter
wrapped_tools = langchain_adapter.wrap(your_langchain_tools, preset="balanced")
# OpenAI Agents SDK
from toolfusion.adapters import openai_adapter
wrapped_fn = openai_adapter.wrap(your_tool_function, preset="balanced")
# CrewAI
from toolfusion.adapters import crewai_adapter
wrapped_tools = crewai_adapter.wrap(your_crewai_tools, preset="balanced")
# AutoGen
from toolfusion.adapters import autogen_adapter
wrapped_fn = autogen_adapter.wrap(your_autogen_function, preset="balanced")
# MCP (Model Context Protocol)
from toolfusion.adapters import mcp_adapter
wrapped_server = mcp_adapter.wrap(your_mcp_server, preset="balanced")
# Haystack
from toolfusion.adapters import haystack_adapter
wrapped_component = haystack_adapter.wrap(your_component, preset="balanced")
How It Works
Agent Tool Call
│
▼
┌─────────────────────────────┐
│ Request Interceptor │ canonical key + secret redaction
│ ┌───────────────────────┐ │
│ │ L1 Cache (Exact) │ │ sub-ms hash lookup
│ └───────────┬───────────┘ │
│ miss │ │
│ ┌───────────────────────┐ │
│ │ Single-Flight Gate │ │ coalesce concurrent duplicate calls
│ └───────────┬───────────┘ │
│ ┌───────────────────────┐ │
│ │ L2 Cache (Semantic) │ │ embedding similarity search
│ └───────────┬───────────┘ │
│ miss │ │
│ ┌───────────────────────┐ │
│ │ Tool Execution │ │ actual tool call + circuit breaker
│ └───────────┬───────────┘ │
│ ┌───────────────────────┐ │
│ │ VAAC Admission │ │ should we cache this result?
│ └───────────┬───────────┘ │
│ ┌───────────────────────┐ │
│ │ Dedup + Fusion │ │ remove overlaps, resolve conflicts
│ └───────────┬───────────┘ │
│ ▼ │
│ Result Envelope │ structured output with provenance
└─────────────────────────────┘
Configuration
Presets
Choose a preset that controls the speed/accuracy tradeoff:
| Preset | Embedder | Semantic Threshold | Dedup | Fusion | Best For |
|---|---|---|---|---|---|
fast |
Model2Vec | 0.88 | SimHash only | Heuristic | High-throughput, cost-sensitive |
balanced |
Model2Vec | 0.92 | Hybrid (SimHash→semantic) | Heuristic | General production (default) |
accurate |
sentence-transformers | 0.95 | Hybrid, conservative | Heuristic + optional LLM | Medical, financial, research |
exact_only |
None | N/A | Exact hash only | Union | Maximum safety, latency-critical |
tf = ToolFusion(preset="balanced") # or "fast", "accurate", "exact_only"
Per-Tool Policy
Every tool can override the global preset:
@tf.tool(
cache_mode="semantic_ok", # off | exact_only | semantic_ok | semantic_verify
risk="medium", # low | medium | high (high → no semantic cache)
freshness="daily", # static | daily | realtime | evented
ttl=600, # cache TTL in seconds
reliability_weight=0.8, # 0.0–1.0, used in conflict resolution
cacheable=True, # False → never cache (for write/mutation tools)
max_result_size=524288, # max bytes to cache
dedup_strategy="hybrid", # exact | simhash | minhash | semantic | hybrid | none
volatile_fields=["request_id", "timestamp"], # stripped before cache key
depends_on=["other_tool"], # invalidation cascade
)
async def my_tool(query: str) -> dict:
...
YAML Configuration
Generate a config file:
toolfusion init --preset balanced
This creates toolfusion.yaml with all settings. See USAGE.md for the full config reference.
tf = ToolFusion(config="toolfusion.yaml")
Result Envelope
Every call returns a ToolFusionResult — never a raw value:
result = await tf.call("my_tool", {"query": "test"})
result.result # The actual tool output
result.cache_info.source # "miss" | "l1_cache" | "l1_cache_stale" | "l2_cache"
result.cache_info.hit # True/False
result.latency.total_ms # End-to-end latency
result.latency.execute_ms # Time spent in actual tool execution
result.sources # Provenance: which tools contributed
result.conflicts # Detected conflicts with resolution details
result.dedup_stats # How many duplicates were removed
result.errors # Per-tool errors with retryable flag
result.degraded # True if result is partial due to failures
result.tokens_saved # Estimated tokens saved by caching/dedup
result.metadata # Your custom metadata
Multi-Tool Fusion
When multiple tools return data for the same query, fuse them:
r1 = await tf.call("weather_api_a", {"city": "NYC"})
r2 = await tf.call("weather_api_b", {"city": "NYC"})
fused = await tf.fuse([r1, r2])
print(fused.result) # Merged, deduplicated result
print(fused.conflicts) # Any conflicting values with resolution
print(fused.dedup_stats.duplicates_removed) # Overlapping data removed
CLI
toolfusion doctor # Check dependencies and runtime config
toolfusion init # Generate toolfusion.yaml
toolfusion config --interactive # Interactive config wizard
toolfusion stats # Runtime statistics
toolfusion stats --live # Live dashboard
toolfusion bench # Run benchmark suite
toolfusion bench --compare # Compare all presets
toolfusion cache inspect # Inspect cache entries
toolfusion cache clear # Clear all caches
toolfusion explain <key> # Explain a specific cache entry
Key Features
Single-Flight Request Coalescing
When 10 concurrent calls request the same tool with the same parameters, only 1 actually executes. The other 9 wait and share the result:
# 10 concurrent identical calls → only 1 execution
results = await asyncio.gather(*[
tf.call("slow_api", {"q": "test"}) for _ in range(10)
])
# All 10 get the same result, but the API was called only once
- Uses
asyncio.shield()to prevent follower cancellation from killing the leader - Configurable leader timeout prevents stuck requests
- Background sweeper cleans up orphaned entries
Two-Stage Deduplication
When multiple results overlap, ToolFusion removes redundancy in two stages:
- Stage 1 (Fast): SimHash/MinHash fingerprinting for O(1) near-duplicate detection
- Stage 2 (Precise): Embedding-based semantic similarity for confirmed duplicates
From each cluster of duplicates, the most informative representative is kept (longest, most entities, most recent).
VAAC Cache Admission
Not every result is worth caching. VAAC (Value-Aware Admission Control) uses a multi-armed bandit to decide:
- High-latency tools → more valuable to cache
- Stable results → more cacheable
- Frequently called tools → benefit more from caching
- Write/mutation tools → never cached (enforced)
Stale-While-Revalidate
For eligible tools, expired cache entries are served immediately while a background refresh happens:
@tf.tool(freshness="daily", cache_mode="semantic_ok")
async def news_feed(topic: str) -> dict:
...
# After TTL expires: serves stale result instantly, refreshes in background
Circuit Breaker
Tools that fail repeatedly are temporarily disabled:
- After 5 consecutive failures → circuit opens (tool calls rejected immediately)
- After 30s recovery timeout → circuit half-opens (one test call allowed)
- On success → circuit closes (normal operation resumes)
Project Layout
toolfusion/
├── core.py # Main ToolFusion class
├── cli.py # CLI commands (typer)
├── bench.py # Benchmark suite
├── adapters/ # Framework adapters (LangChain, OpenAI, CrewAI, etc.)
├── cache/ # L1/L2 cache backends (memory, SQLite, Redis)
│ ├── backends/ # Cache storage implementations
│ └── vector/ # Vector index implementations (numpy, FAISS, Redis)
├── components/ # Core algorithms
│ ├── admission.py # VAAC cache admission policy
│ ├── dedup.py # Two-stage deduplication
│ ├── embedder.py # Embedding providers (Model2Vec, sentence-transformers)
│ ├── fusion.py # Cross-tool fusion + conflict resolution
│ └── key_builder.py # Canonical cache key construction
├── config/ # Configuration system + presets
├── engine/ # Runtime, single-flight, telemetry
├── execution/ # Tool executor + circuit breaker
├── infra/ # Factories, utilities
├── orchestration/ # Pipeline orchestration logic
├── schema/ # Data models, protocols, errors
└── security/ # HMAC signing for cache integrity
Docs
| Document | Description |
|---|---|
| API Reference | Complete API documentation with all parameters |
| Usage Guide | Practical how-to guide with examples |
| Spec | Full technical specification |
| Changelog | Version history |
| Contributing | How to contribute |
Research
ToolFusion's design is informed by peer-reviewed research:
| Paper / Source | Key Finding | ToolFusion Application |
|---|---|---|
| ToolCaching (arXiv:2601.15335) | VAAC: 11% higher hit ratio, 34% lower latency vs LRU/LFU | Cache admission engine |
| GPT Semantic Cache (arXiv:2411.05276) | 68.8% API call reduction, 97%+ accuracy | L2 semantic cache |
| Model2Vec (MinishLab) | 500x faster embeddings, ~30MB, numpy-only | Default embedder |
| Discord singleflight | 7.6x RPS improvement | Request coalescing |
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file toolfusion-0.1.0.tar.gz.
File metadata
- Download URL: toolfusion-0.1.0.tar.gz
- Upload date:
- Size: 86.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba5c98ddb86b4fe7165b49c9638ef077b250ca292bc1408ec4d974f0267446e6
|
|
| MD5 |
b14f15ef0d2c220c09494d2164f5ad98
|
|
| BLAKE2b-256 |
4300a6ebd2f2908cd9f7ea85c052b3900c20103b2b3dc8ae895e80f76152d5e1
|
File details
Details for the file toolfusion-0.1.0-py3-none-any.whl.
File metadata
- Download URL: toolfusion-0.1.0-py3-none-any.whl
- Upload date:
- Size: 81.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9f3acb9b7aaeae9e2fe8937112f094a99a63dc6df3a6c43a242e926e5f30424
|
|
| MD5 |
6871586c393277fd0e0bc5a9df435e16
|
|
| BLAKE2b-256 |
b4617d8fb04d2973f29e04053703d7afbf59e567ab1800a9de7c2fb39e3e19b8
|