Skip to main content

Intelligent middleware for AI agent tool orchestration

Project description

ToolFusion

Stop wasting tokens. Start fusing results.

PyPI Python License


ToolFusion is async-first middleware that sits between your AI agent framework and your tools. It eliminates redundant tool calls, deduplicates overlapping results, resolves conflicting outputs across tools, and fuses multi-source data — making every agent faster, cheaper, and more reliable.

The Problem

Every production AI agent system hits the same wall:

Agent: "What's the weather in NYC?"
  → Tool A: calls OpenWeatherMap API          ← $0.001, 200ms
  → Tool B: calls WeatherAPI                  ← $0.001, 300ms
  → Tool A again (retry/loop): same API call  ← $0.001, 200ms   ← WASTED
  → Agent context now has 3 overlapping results with conflicting temperatures
  → LLM processes all 3 → extra tokens → higher cost → possible hallucination
Problem Impact How ToolFusion Fixes It
Duplicate tool calls — LLMs call the same tool repeatedly with identical/similar params Wasted API calls, latency, cost Exact + semantic caching, request coalescing
Redundant results in context — multiple tools return overlapping info Token waste, context pollution Two-stage deduplication (SimHash → semantic)
Conflicting tool outputs — two tools report different values for the same fact LLM hallucinates or picks arbitrarily Source-weighted conflict resolution
Thundering herds — cache expiry causes simultaneous re-execution storms Backend overload, spikes TTL jitter + single-flight coalescing
No visibility — developers can't see what tools are doing Silent failures, impossible debugging Structured telemetry + result envelopes
Framework lock-in — caching techniques are framework-specific Can't reuse across LangChain, CrewAI, OpenAI, etc. Framework-agnostic middleware with adapters

Before vs After ToolFusion

# ❌ WITHOUT ToolFusion — every call hits the API, duplicates pile up
results = []
for query in ["python async", "python asynchronous", "async python"]:
    result = await web_search(query)   # 3 API calls for ~same query
    results.append(result)             # 3 overlapping results in context
# Agent processes all 3 → wasted tokens, possible conflicts

# ✅ WITH ToolFusion — duplicates caught, results fused
tf = ToolFusion(preset="balanced")

@tf.tool(cache_mode="semantic_ok", ttl=600)
async def web_search(query: str) -> dict:
    return await call_search_api(query)

r1 = await tf.call("web_search", {"query": "python async"})        # executes
r2 = await tf.call("web_search", {"query": "python async"})        # L1 cache hit (0ms)
r3 = await tf.call("web_search", {"query": "python asynchronous"}) # L2 semantic hit (~1ms)

fused = await tf.fuse([r1, r3])  # deduplicates + merges
# → 1 API call instead of 3, clean unified result, full provenance

Install

pip install toolfusion

Optional extras for production use:

# Redis cache backend + FAISS vector search
pip install "toolfusion[redis,faiss]"

# Full stack: Redis, FAISS, spaCy NER, sentence-transformers, OpenTelemetry, LLM fusion
pip install "toolfusion[all]"
Available extras
Extra Packages Use Case
redis redis[hiredis] Distributed cache backend + vector search
faiss faiss-cpu Fast similarity search for semantic cache
spacy spacy Entity extraction for fusion/conflict resolution
transformers sentence-transformers, torch Higher-accuracy embeddings (accurate preset)
otel opentelemetry-api, opentelemetry-sdk Distributed tracing and metrics
llm openai LLM-based fusion strategy
all All of the above Everything
dev pytest, pytest-asyncio, ruff Development/testing

Quick Start

Async (recommended)

import asyncio
from toolfusion import ToolFusion

async def main():
    async with ToolFusion(preset="balanced") as tf:

        @tf.tool(cache_mode="semantic_ok", ttl=300, freshness="daily")
        async def search(query: str) -> dict:
            # Your actual tool implementation
            return {"results": [f"Result for: {query}"]}

        # First call — executes the tool
        r1 = await tf.call("search", {"query": "python async patterns"})
        print(r1.cache_info.source)  # "miss"

        # Identical call — instant L1 cache hit
        r2 = await tf.call("search", {"query": "python async patterns"})
        print(r2.cache_info.source)  # "l1_cache"

        # Similar call — semantic L2 cache hit
        r3 = await tf.call("search", {"query": "python asynchronous patterns"})
        print(r3.cache_info.source)  # "l2_cache"

asyncio.run(main())

Sync

from toolfusion import ToolFusion

with ToolFusion(preset="fast") as tf:

    @tf.tool(cache_mode="exact_only", ttl=60)
    def calculate(x: int, y: int) -> int:
        return x + y

    r = calculate(2, 3)
    print(r.result)  # 5

Framework Adapters

ToolFusion works with any agent framework:

# LangChain
from toolfusion.adapters import langchain_adapter
wrapped_tools = langchain_adapter.wrap(your_langchain_tools, preset="balanced")

# OpenAI Agents SDK
from toolfusion.adapters import openai_adapter
wrapped_fn = openai_adapter.wrap(your_tool_function, preset="balanced")

# CrewAI
from toolfusion.adapters import crewai_adapter
wrapped_tools = crewai_adapter.wrap(your_crewai_tools, preset="balanced")

# AutoGen
from toolfusion.adapters import autogen_adapter
wrapped_fn = autogen_adapter.wrap(your_autogen_function, preset="balanced")

# MCP (Model Context Protocol)
from toolfusion.adapters import mcp_adapter
wrapped_server = mcp_adapter.wrap(your_mcp_server, preset="balanced")

# Haystack
from toolfusion.adapters import haystack_adapter
wrapped_component = haystack_adapter.wrap(your_component, preset="balanced")

How It Works

Agent Tool Call
      │
      ▼
┌─────────────────────────────┐
│  Request Interceptor        │  canonical key + secret redaction
│  ┌───────────────────────┐  │
│  │  L1 Cache (Exact)     │  │  sub-ms hash lookup
│  └───────────┬───────────┘  │
│         miss │              │
│  ┌───────────────────────┐  │
│  │  Single-Flight Gate   │  │  coalesce concurrent duplicate calls
│  └───────────┬───────────┘  │
│  ┌───────────────────────┐  │
│  │  L2 Cache (Semantic)  │  │  embedding similarity search
│  └───────────┬───────────┘  │
│         miss │              │
│  ┌───────────────────────┐  │
│  │  Tool Execution       │  │  actual tool call + circuit breaker
│  └───────────┬───────────┘  │
│  ┌───────────────────────┐  │
│  │  VAAC Admission       │  │  should we cache this result?
│  └───────────┬───────────┘  │
│  ┌───────────────────────┐  │
│  │  Dedup + Fusion       │  │  remove overlaps, resolve conflicts
│  └───────────┬───────────┘  │
│              ▼              │
│  Result Envelope            │  structured output with provenance
└─────────────────────────────┘

Configuration

Presets

Choose a preset that controls the speed/accuracy tradeoff:

Preset Embedder Semantic Threshold Dedup Fusion Best For
fast Model2Vec 0.88 SimHash only Heuristic High-throughput, cost-sensitive
balanced Model2Vec 0.92 Hybrid (SimHash→semantic) Heuristic General production (default)
accurate sentence-transformers 0.95 Hybrid, conservative Heuristic + optional LLM Medical, financial, research
exact_only None N/A Exact hash only Union Maximum safety, latency-critical
tf = ToolFusion(preset="balanced")  # or "fast", "accurate", "exact_only"

Per-Tool Policy

Every tool can override the global preset:

@tf.tool(
    cache_mode="semantic_ok",     # off | exact_only | semantic_ok | semantic_verify
    risk="medium",                # low | medium | high (high → no semantic cache)
    freshness="daily",            # static | daily | realtime | evented
    ttl=600,                      # cache TTL in seconds
    reliability_weight=0.8,       # 0.0–1.0, used in conflict resolution
    cacheable=True,               # False → never cache (for write/mutation tools)
    max_result_size=524288,       # max bytes to cache
    dedup_strategy="hybrid",      # exact | simhash | minhash | semantic | hybrid | none
    volatile_fields=["request_id", "timestamp"],  # stripped before cache key
    depends_on=["other_tool"],    # invalidation cascade
)
async def my_tool(query: str) -> dict:
    ...

YAML Configuration

Generate a config file:

toolfusion init --preset balanced

This creates toolfusion.yaml with all settings. See USAGE.md for the full config reference.

tf = ToolFusion(config="toolfusion.yaml")

Result Envelope

Every call returns a ToolFusionResult — never a raw value:

result = await tf.call("my_tool", {"query": "test"})

result.result              # The actual tool output
result.cache_info.source   # "miss" | "l1_cache" | "l1_cache_stale" | "l2_cache"
result.cache_info.hit      # True/False
result.latency.total_ms    # End-to-end latency
result.latency.execute_ms  # Time spent in actual tool execution
result.sources             # Provenance: which tools contributed
result.conflicts           # Detected conflicts with resolution details
result.dedup_stats          # How many duplicates were removed
result.errors              # Per-tool errors with retryable flag
result.degraded            # True if result is partial due to failures
result.tokens_saved        # Estimated tokens saved by caching/dedup
result.metadata            # Your custom metadata

Multi-Tool Fusion

When multiple tools return data for the same query, fuse them:

r1 = await tf.call("weather_api_a", {"city": "NYC"})
r2 = await tf.call("weather_api_b", {"city": "NYC"})

fused = await tf.fuse([r1, r2])
print(fused.result)                          # Merged, deduplicated result
print(fused.conflicts)                       # Any conflicting values with resolution
print(fused.dedup_stats.duplicates_removed)  # Overlapping data removed

CLI

toolfusion doctor              # Check dependencies and runtime config
toolfusion init                # Generate toolfusion.yaml
toolfusion config --interactive  # Interactive config wizard
toolfusion stats               # Runtime statistics
toolfusion stats --live        # Live dashboard
toolfusion bench               # Run benchmark suite
toolfusion bench --compare     # Compare all presets
toolfusion cache inspect       # Inspect cache entries
toolfusion cache clear         # Clear all caches
toolfusion explain <key>       # Explain a specific cache entry

Key Features

Single-Flight Request Coalescing

When 10 concurrent calls request the same tool with the same parameters, only 1 actually executes. The other 9 wait and share the result:

# 10 concurrent identical calls → only 1 execution
results = await asyncio.gather(*[
    tf.call("slow_api", {"q": "test"}) for _ in range(10)
])
# All 10 get the same result, but the API was called only once
  • Uses asyncio.shield() to prevent follower cancellation from killing the leader
  • Configurable leader timeout prevents stuck requests
  • Background sweeper cleans up orphaned entries
Two-Stage Deduplication

When multiple results overlap, ToolFusion removes redundancy in two stages:

  1. Stage 1 (Fast): SimHash/MinHash fingerprinting for O(1) near-duplicate detection
  2. Stage 2 (Precise): Embedding-based semantic similarity for confirmed duplicates

From each cluster of duplicates, the most informative representative is kept (longest, most entities, most recent).

VAAC Cache Admission

Not every result is worth caching. VAAC (Value-Aware Admission Control) uses a multi-armed bandit to decide:

  • High-latency tools → more valuable to cache
  • Stable results → more cacheable
  • Frequently called tools → benefit more from caching
  • Write/mutation tools → never cached (enforced)
Stale-While-Revalidate

For eligible tools, expired cache entries are served immediately while a background refresh happens:

@tf.tool(freshness="daily", cache_mode="semantic_ok")
async def news_feed(topic: str) -> dict:
    ...
# After TTL expires: serves stale result instantly, refreshes in background
Circuit Breaker

Tools that fail repeatedly are temporarily disabled:

  • After 5 consecutive failures → circuit opens (tool calls rejected immediately)
  • After 30s recovery timeout → circuit half-opens (one test call allowed)
  • On success → circuit closes (normal operation resumes)

Project Layout

toolfusion/
├── core.py               # Main ToolFusion class
├── cli.py                # CLI commands (typer)
├── bench.py              # Benchmark suite
├── adapters/             # Framework adapters (LangChain, OpenAI, CrewAI, etc.)
├── cache/                # L1/L2 cache backends (memory, SQLite, Redis)
│   ├── backends/         # Cache storage implementations
│   └── vector/           # Vector index implementations (numpy, FAISS, Redis)
├── components/           # Core algorithms
│   ├── admission.py      # VAAC cache admission policy
│   ├── dedup.py          # Two-stage deduplication
│   ├── embedder.py       # Embedding providers (Model2Vec, sentence-transformers)
│   ├── fusion.py         # Cross-tool fusion + conflict resolution
│   └── key_builder.py    # Canonical cache key construction
├── config/               # Configuration system + presets
├── engine/               # Runtime, single-flight, telemetry
├── execution/            # Tool executor + circuit breaker
├── infra/                # Factories, utilities
├── orchestration/        # Pipeline orchestration logic
├── schema/               # Data models, protocols, errors
└── security/             # HMAC signing for cache integrity

Docs

Document Description
API Reference Complete API documentation with all parameters
Usage Guide Practical how-to guide with examples
Spec Full technical specification
Changelog Version history
Contributing How to contribute

Research

ToolFusion's design is informed by peer-reviewed research:

Paper / Source Key Finding ToolFusion Application
ToolCaching (arXiv:2601.15335) VAAC: 11% higher hit ratio, 34% lower latency vs LRU/LFU Cache admission engine
GPT Semantic Cache (arXiv:2411.05276) 68.8% API call reduction, 97%+ accuracy L2 semantic cache
Model2Vec (MinishLab) 500x faster embeddings, ~30MB, numpy-only Default embedder
Discord singleflight 7.6x RPS improvement Request coalescing

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toolfusion-0.1.0.tar.gz (86.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toolfusion-0.1.0-py3-none-any.whl (81.8 kB view details)

Uploaded Python 3

File details

Details for the file toolfusion-0.1.0.tar.gz.

File metadata

  • Download URL: toolfusion-0.1.0.tar.gz
  • Upload date:
  • Size: 86.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for toolfusion-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ba5c98ddb86b4fe7165b49c9638ef077b250ca292bc1408ec4d974f0267446e6
MD5 b14f15ef0d2c220c09494d2164f5ad98
BLAKE2b-256 4300a6ebd2f2908cd9f7ea85c052b3900c20103b2b3dc8ae895e80f76152d5e1

See more details on using hashes here.

File details

Details for the file toolfusion-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: toolfusion-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 81.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for toolfusion-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d9f3acb9b7aaeae9e2fe8937112f094a99a63dc6df3a6c43a242e926e5f30424
MD5 6871586c393277fd0e0bc5a9df435e16
BLAKE2b-256 b4617d8fb04d2973f29e04053703d7afbf59e567ab1800a9de7c2fb39e3e19b8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page