Efficient MCP server aggregation with HTTP/SSE support and 98% schema token reduction

These details have not been verified by PyPI

Project links

Project description

ToolMux v2.2

Efficient MCP server aggregation with FastMCP 3.x foundation

ToolMux proxies multiple MCP (Model Context Protocol) servers through a single interface, reducing token overhead while maintaining full tool access. It supports three operating modes optimized for different use cases.

Features

FastMCP 3.x Foundation — Proper MCP protocol compliance via FastMCP framework
Three Operating Modes — Meta (80%+ savings), Gateway (60%+ savings), Proxy (native fastmcp)
Native Proxy Mode — Uses fastmcp 3.0's create_proxy() for true transparent proxying with session isolation and MCP feature forwarding
CondenseTransform — Token optimization via fastmcp's Transform system: condensed descriptions/schemas in tools/list, full details on demand via helper tools
Smart Description Condensation — First-sentence extraction with filler phrase removal
Schema Condensation — Strips verbose extras, keeps names/types/required
Progressive Disclosure — Full descriptions via list_all_tools() and get_tool_schema(), condensed in tools/list
Self-Healing Bundle Resolution — Auto-resolves broken server configs from mcp-registry, user bundles, XDG, Claude Desktop, and Cursor bundles
Parallel Backend Init — Thread pool (10 workers, 30s timeout) for fast startup
MCP Instructions — All modes embed instructions in the MCP initialize response telling the LLM to call list_all_tools() first
LLM-Powered Description Optimization — optimize_descriptions tool lets the connected LLM generate high-quality tool descriptions, replacing algorithmic condensation
Tool Collision Resolution — Automatic server-name prefixing for duplicate tool names

Installation

# Via PyPI
pip install toolmux

# Via uvx (recommended, no install needed)
uvx toolmux

# From source
git clone https://github.com/subnetangel/ToolMux.git
cd ToolMux
pip install -e .

# Verify
toolmux --version

Quick Start

1. Configure backend servers

Create ~/shared/toolmux/mcp.json (or ~/toolmux/mcp.json):

{
  "mode": "gateway",
  "servers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/home/user"]
    },
    "git": {
      "command": "uvx",
      "args": ["mcp-server-git", "--repository", "/path/to/repo"]
    }
  }
}

2. Run ToolMux

# Default gateway mode
toolmux

# Specific mode
toolmux --mode meta
toolmux --mode proxy

# Custom config
toolmux --config /path/to/mcp.json

3. Use with any MCP client

Add to your MCP client configuration (e.g., Claude Desktop, Cursor, Kiro, VS Code):

{
  "mcpServers": {
    "toolmux": {
      "command": "toolmux",
      "args": ["--mode", "gateway"]
    }
  }
}

Operating Modes

ToolMux offers three modes that trade off between token savings and tool transparency. All modes share a common set of helper tools (list_all_tools, get_tool_schema, get_tool_count, manage_servers, optimize_descriptions) and embed MCP instructions telling the LLM to call list_all_tools() first.

Mode Comparison

	Gateway (default)	Meta	Proxy
Token savings	~60-85%	~80-93%	~69%
tools/list size	1 tool per server + helpers	5 meta-tools	All backend tools (condensed)
Tool invocation	`server(tool="name", arguments={...})`	`invoke(name="name", args={...})`	`tool_name(param="value")`
Backend init	BackendManager (parallel threads)	BackendManager (parallel threads)	fastmcp `create_proxy()`
Session isolation	Shared subprocess per server	Shared subprocess per server	Persistent sessions, reused across calls (fastmcp 3.1.1+)
MCP feature forwarding	No (stdio relay)	No (stdio relay)	Yes (sampling, elicitation, logging, progress)
Best for	Balanced savings + usability	Maximum savings, many servers	Full MCP compliance, advanced features

Gateway Mode (Default) — ~60-85% Token Savings

Collapses each backend server into a single tool. The LLM sees one tool per server (e.g., filesystem, git) instead of dozens of individual tools. Each server-tool's description lists all its sub-tools with their purpose and required parameters.

How it works:

On startup, BackendManager initializes all backends in parallel (10-worker thread pool, 30s timeout). If a build cache exists (.toolmux_cache.json), tools are loaded instantly from cache and backends init in the background.
Tools are grouped by server. For each server, a single FastMCP tool is registered with a rich description listing all sub-tools (e.g., "Tools: read_file (Read complete file contents; required: path), write_file (...), ...").
The LLM calls list_all_tools() first to discover all available tools with full descriptions.
To invoke a sub-tool, the LLM calls the server-tool with tool= and arguments= parameters: filesystem(tool="read_file", arguments={"path": "/tmp/example.txt"}).
On first invocation of each sub-tool, the response is enriched with the full description and parameter schema (progressive disclosure). On errors, the full schema is always appended.

tools/list returns:
  - filesystem (server-tool): "Tools: read_file, write_file, ..."
  - git (server-tool): "Tools: git_status, git_log, ..."
  - list_all_tools (native): MUST call first — full descriptions grouped by server
  - get_tool_schema (native): Get full parameter details for any tool
  - get_tool_count (native): Get tool count statistics by server
  - manage_servers (native): Add, remove, validate, test backend servers
  - optimize_descriptions (native): LLM-powered description optimization

Calling pattern:
  list_all_tools()  # discover all tools with full descriptions
  filesystem(tool="read_file", arguments={"path": "/tmp/example.txt"})

Token savings mechanism: Instead of exposing N tools with full descriptions and schemas in tools/list, gateway exposes ~S server-tools (where S << N) plus helper tools. Descriptions are condensed to first-sentence + required params. Full details are disclosed progressively on first use.

Meta Mode — ~80-93% Token Savings

Exposes only 5 generic meta-tools regardless of how many backend tools exist. The LLM discovers tools via list_all_tools() / catalog_tools(), inspects schemas via get_tool_schema(), and executes via invoke().

How it works:

Same BackendManager parallel init as gateway mode.
Instead of registering per-server or per-tool entries, 5 fixed tools are registered: list_all_tools, catalog_tools, get_tool_schema, invoke, get_tool_count.
catalog_tools() returns a JSON array of all backend tools with name, server, condensed description, and parameter names.
get_tool_schema(name="tool_name") returns the full description and inputSchema for a specific tool.
invoke(name="tool_name", args={...}) routes the call to the correct backend server. Results are enriched with full docstrings on first invocation.

tools/list returns:
  - list_all_tools: MUST call first — full descriptions grouped by server
  - catalog_tools: List all backend tools with name, server, description
  - get_tool_schema: Get full schema for a tool
  - invoke: Execute a backend tool
  - get_tool_count: Tool count by server
  - manage_servers: Add, remove, validate, test backend servers
  - optimize_descriptions: LLM-powered description optimization

Workflow: list_all_tools() → get_tool_schema("tool") → invoke("tool", args)

Token savings mechanism: tools/list always returns exactly 7 tools (5 meta + 2 management) regardless of backend count. A setup with 200 backend tools still only shows 7 in tools/list. The tradeoff is an extra round-trip: the LLM must call get_tool_schema() before invoke() to know the parameters.

Proxy Mode — Native FastMCP Proxy with Token Optimization

Uses fastmcp 3.0's native create_proxy() for true transparent proxying. All backend tools are exposed directly — the LLM calls them by name just like normal MCP tools. Token optimization is applied via CondenseTransform, which condenses descriptions and schemas in tools/list while helper tools return full uncondensed details.

How it works:

Server configs are converted to standard mcpServers format and passed to create_proxy(), which creates a FastMCP proxy with MCPConfigTransport for each backend.
CondenseTransform (a fastmcp Transform subclass) is applied to the proxy. It intercepts tools/list responses and replaces each tool's description with a condensed version (first sentence, filler removed, max 80 chars) and each schema with a minimal version (property names + types + required only).
Helper tools (list_all_tools, get_tool_schema, get_tool_count) are registered directly on the proxy. They query the proxy's internal tool list before the transform is applied, so they return full uncondensed descriptions and schemas.
For multi-server setups, tools are prefixed as {server}_{tool} (e.g., filesystem_read_file). Single-server setups leave tools unprefixed.
Sessions are persistent and reused across tool calls (fastmcp 3.1.1+).

tools/list returns all backend tools with CONDENSED descriptions/schemas.
  - Single server: tools unprefixed (echo_tool)
  - Multi server: tools prefixed as {server}_{tool}

Helper tools (return FULL uncondensed info):
  - list_all_tools(): MUST call first — full descriptions grouped by server
  - get_tool_schema(name): full description + full inputSchema
  - get_tool_count(): tool counts by server
  - manage_servers: Add, remove, validate, test backend servers

Call directly: echo_tool(message="hello")

Proxy mode features:

True transparent proxying via fastmcp's MCPConfigTransport
Session isolation per request
Automatic MCP feature forwarding (sampling, elicitation, logging, progress)
CondenseTransform for ~69% token reduction in tools/list
Progressive disclosure: condensed by default, full on demand via helper tools

Token savings mechanism: All tools appear in tools/list (unlike gateway/meta), but descriptions are condensed from paragraphs to single sentences and schemas are stripped to names/types/required. The LLM calls list_all_tools() once to get full descriptions, then calls tools directly.

Shared Features

Progressive Disclosure

All modes use progressive disclosure to minimize tokens while keeping full information accessible:

tools/list — Condensed descriptions and schemas (what the LLM sees on connect)
list_all_tools() — Full descriptions grouped by server (LLM calls this first)
get_tool_schema(name) — Full description + complete inputSchema for a specific tool
First-use enrichment (gateway/meta only) — On the first invocation of each tool, the response includes the full description and parameter schema appended to the result

Description Condensation

The condense_description() function:

Normalizes whitespace (collapses newlines and multiple spaces)
Removes filler phrases ("Use this tool to", "This tool allows you to", etc.)
Capitalizes the first letter after filler removal
Extracts the first sentence (up to ., !, or ?)
Trims to 80 characters without cutting mid-word

Schema Condensation

The condense_schema() function strips schemas down to:

Property names and types
Array item types
Required field list

Removed: descriptions, defaults, examples, enums, pattern constraints, nested object details.

LLM-Powered Description Optimization

The optimize_descriptions tool lets the connected LLM generate higher-quality descriptions than the algorithmic condensation:

optimize_descriptions(action="generate") — Returns all tools with full descriptions
The LLM writes concise (<60 char) descriptions for each tool
optimize_descriptions(action="save", server="name", descriptions={...}) — Saves to cache
Restart ToolMux to use the optimized descriptions

Use optimize_descriptions(action="status") to check if descriptions have been optimized.

Build Cache

ToolMux caches tool descriptions in .toolmux_cache.json next to the config file. The cache is validated against a SHA-256 hash of mcp.json — any config change invalidates it.

Cache hit: Tools load instantly from cache. Backends init in the background for actual tool calls.
Cache miss: Server names are registered as placeholders immediately (so mcp.run() starts without delay). Backends init in the background. A cache is auto-generated once backends finish.

Server Management

The manage_servers tool provides runtime server management:

manage_servers(action="list") — List all configured servers
manage_servers(action="add", name="my-mcp", command="cmd") — Add a server (auto-resolves from bundles if no command given)
manage_servers(action="remove", name="my-mcp") — Remove a server
manage_servers(action="validate") — Check all server commands exist on PATH
manage_servers(action="test", name="my-mcp") — Start server and verify it returns tools

Self-Healing Bundle Resolution

When a configured server command fails or returns 0 tools, ToolMux automatically searches for the correct launch config in these locations (in order):

mcp-registry bundles (~/.config/smithy-mcp/bundles/)
User bundles (~/.aim/bundles/)
XDG mcp config (~/.config/mcp/mcp.json)
Claude Desktop (~/.claude/claude_desktop_config.json)
Cursor (~/.cursor/mcp.json)

If a fix is found, it's persisted back to mcp.json so it only happens once.

CLI Reference

toolmux [OPTIONS]

Options:
  --mode {gateway,meta,proxy}  Operating mode (default: gateway)
  --config PATH                Path to mcp.json config file
  --version                    Print version and exit
  --list-servers               List configured servers and exit
  --build-cache                Generate LLM description cache and exit
  --manage [list|add|remove|validate|test]  Manage backend servers

Configuration

Config File Discovery Order

--config flag (explicit path)
./mcp.json (project-local)
~/shared/toolmux/mcp.json (shared environments — persists across sessions)
~/toolmux/mcp.json (local installs)
First-run setup creates ~/shared/toolmux/mcp.json

Config Format

{
  "mode": "gateway",
  "cache_model": "us.anthropic.claude-3-5-haiku-20241022-v1:0",
  "servers": {
    "server-name": {
      "command": "npx",
      "args": ["-y", "package-name"],
      "env": {"KEY": "value"},
      "cwd": "/optional/working/dir",
      "description": "Optional human description"
    },
    "http-server": {
      "transport": "http",
      "base_url": "https://api.example.com/mcp",
      "headers": {"Authorization": "Bearer token"},
      "timeout": 30
    }
  }
}

Architecture

MCP Client (Agent/IDE)
    ↕ stdio JSON-RPC
FastMCP Server (ToolMux)
    ├── Mode Router → meta | gateway | proxy
    │
    ├── Gateway/Meta Mode
    │   ├── BackendManager (parallel init, tool routing)
    │   ├── Pure Functions (condense, enrich, collisions)
    │   ├── Build Cache (SHA-256 validated, auto-generated)
    │   ├── Self-Healing Bundle Resolution
    │   └── manage_servers + optimize_descriptions
    │
    └── Proxy Mode (fastmcp native)
        ├── create_proxy(mcpServers config)
        ├── CondenseTransform (token optimization)
        ├── Helper tools (list_all_tools, get_tool_schema, get_tool_count)
        ├── manage_servers
        └── Session isolation + MCP feature forwarding

Development

# Install in development mode
pip install -e ".[dev]"

# Run tests
python3 -m pytest tests/ -v

# Run with benchmark output
python3 -m pytest tests/test_token_optimization.py -v -s

Test Suite

File	Tests	Coverage
`test_pure_functions.py`	24	Property-based (hypothesis) + unit tests for all pure functions
`test_list_all_tools.py`	20	list_all_tools across all modes, server filtering, cache integration
`test_bundle_resolution.py`	20	Self-healing bundle resolution across 5 config sources
`test_config_cli.py`	15	Config discovery, CLI args, version sync, build cache
`test_backend.py`	11	BackendManager, HttpMcpClient, parallel init
`test_protocol_e2e.py`	11	MCP protocol compliance, end-to-end mode workflows
`test_token_optimization.py`	6	Token savings benchmarks per mode
Total	107	0 failures

Version History

Version	Changes
2.1.0	Native proxy mode via fastmcp `create_proxy()`, `CondenseTransform` for proxy token optimization, helper tools (`list_all_tools`/`get_tool_schema`/`get_tool_count`) bypass transform in proxy mode, session isolation per request, MCP feature forwarding (sampling, elicitation, logging, progress)
2.0.8	`list_all_tools` in all modes, MCP instructions in `initialize` response, `.gitignore` bundle fix
2.0.7	Self-healing bundle resolution (5 config sources), 8 test fixes, publish script symlink fix
2.0.6	`list_all_tools` gateway tool with server filtering and cached description support
2.0.5	Cache-first startup (no more init timeout), graceful stdin EOF handling, stderr suppression, version sync
2.0.0	Initial v2: FastMCP foundation, 3 operating modes, BackendManager, parallel init, smart condensation, build cache, collision resolution

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.3.1

Apr 6, 2026

2.3.0

Apr 6, 2026

This version

2.2.1

Apr 6, 2026

1.2.1

Sep 2, 2025

1.2.0

Sep 2, 2025

1.1.3

Sep 1, 2025

1.1.2

Sep 1, 2025

1.1.1

Sep 1, 2025

1.1.0

Sep 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toolmux-2.2.1.tar.gz (70.5 kB view details)

Uploaded Apr 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

toolmux-2.2.1-py3-none-any.whl (37.5 kB view details)

Uploaded Apr 6, 2026 Python 3

File details

Details for the file toolmux-2.2.1.tar.gz.

File metadata

Download URL: toolmux-2.2.1.tar.gz
Upload date: Apr 6, 2026
Size: 70.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for toolmux-2.2.1.tar.gz
Algorithm	Hash digest
SHA256	`231c59e0458851bfefd81e54fb652782ae1951a5ccf750416065f3543e4934ea`
MD5	`bda56a3d410ae8911be1025f21919148`
BLAKE2b-256	`4f4ba9bffcfc59042a683ff54aadfe636c8f7a75bee53ecc85b40f1d19959b89`

See more details on using hashes here.

File details

Details for the file toolmux-2.2.1-py3-none-any.whl.

File metadata

Download URL: toolmux-2.2.1-py3-none-any.whl
Upload date: Apr 6, 2026
Size: 37.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for toolmux-2.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`56b9e0a5812961d4b34270443b250cc9812eb1a5fc0267617d7552eae87c9183`
MD5	`55e9a50e0dceef7698caa6cfbf175ccb`
BLAKE2b-256	`902f572927d12213c505658845fefdb1f402e3ca5b5b59ebe78e2a8ac7f8e31c`

See more details on using hashes here.

toolmux 2.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ToolMux v2.2

Features

Installation

Quick Start

1. Configure backend servers

2. Run ToolMux

3. Use with any MCP client

Operating Modes

Mode Comparison

Gateway Mode (Default) — ~60-85% Token Savings

Meta Mode — ~80-93% Token Savings

Proxy Mode — Native FastMCP Proxy with Token Optimization

Shared Features

Progressive Disclosure

Description Condensation

Schema Condensation

LLM-Powered Description Optimization

Build Cache

Server Management

Self-Healing Bundle Resolution

CLI Reference

Configuration

Config File Discovery Order

Config Format

Architecture

Development

Test Suite

Version History

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes