Skip to main content

MCP-RLM-Proxy: Intelligent MCP middleware with RLM-style recursive exploration, smart caching, and proxy tools.

Project description

MCP-RLM-Proxy: Intelligent Middleware for MCP Servers

Production-ready middleware implementing Recursive Language Model principles (arXiv:2512.24601) for efficient multi-server management, automatic large-response handling, and first-class proxy tools for recursive data exploration. 100% compatible with the MCP specification - works with any existing MCP server without modification.

Quick Start for Current MCP Users

Already using MCP servers? Add this as middleware in 5 minutes:

# 1. Install
pip install mcp-rlm-proxy

# 2. Create a config in your working directory (mcp.json)
mcp-rlm-proxy --init-config ./mcp.json

# 3. Edit mcp.json to add your servers
$EDITOR ./mcp.json

# 2. Configure your existing servers
cat > mcp.json << EOF
{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/your/path"]
    }
  }
}
EOF

# 3. Run the proxy
mcp-rlm-proxy --config ./mcp.json

That's it! Your servers now have automatic large-response handling and three powerful proxy tools for recursive exploration.


Why Use This as Middleware?

The Problem with Direct MCP Connections

When AI agents connect directly to MCP servers:

  • Token waste: 85-95% of returned data is often unnecessary
  • Context pollution: Irrelevant data dilutes important information
  • No multi-server aggregation: Must connect to each server separately
  • Performance degradation: Large responses slow everything down
  • Cost explosion: Every unnecessary token costs money

The Solution: Intelligent Middleware

+---------------+
|  MCP Client   |  (Claude Desktop, Cursor, Custom Client)
+-------+-------+
        | ONE connection
        v
+---------------+
| MCP-RLM       |  <-- THIS MIDDLEWARE
| Proxy         |  - Connects to N servers
|               |  - Auto-truncates large responses
|               |  - Caches + provides proxy_filter / proxy_search / proxy_explore
|               |  - Tracks token savings
+-------+-------+
        | Manages connections to your servers
    +---+----+--------+--------+
    v        v        v        v
+-----+  +-----+  +-----+  +-----+
| FS  |  | Git |  | API |  | DB  |  <-- Your existing servers
+-----+  +-----+  +-----+  +-----+      (NO changes needed!)

Benefits

  • Zero Friction: Works with existing MCP servers (no code changes)
  • Huge Token Savings: 85-95% reduction typical
  • Multi-Server: Aggregate tools from many servers through one interface
  • Clean Schemas: No _meta injection; tool schemas are passed through unmodified
  • Agent-Friendly: Three first-class proxy tools with flat, simple parameters. proxy_filter uses Python REPL for flexible programmatic transformations
  • Auto-Truncation: Large responses automatically truncated + cached for follow-up
  • Multi-Agent Ready: Per-agent cache isolation supports hundreds of concurrent agents
  • Production Ready: Connection pooling, error handling, metrics, TTL-based caching, memory-aware eviction

How It Works

Architecture Overview

  1. Client connects to proxy (instead of individual servers)
  2. Proxy connects to N servers (configured in mcp.json)
  3. Tools are aggregated with server prefixes (filesystem_read_file)
  4. Tool schemas pass through clean - no modification, no _meta injection
  5. Large responses are auto-truncated and cached with a cache_id
  6. Three proxy tools let agents drill into cached data without re-executing

The Proxy Tools

Tool Purpose Key Parameters
proxy_filter Transform/filter using Python REPL cache_id, code (required), return_format
proxy_search Grep/BM25/fuzzy/context search on cached or fresh result cache_id, pattern, mode, max_results
proxy_explore Discover data structure without loading content cache_id, max_depth

All parameters are flat, top-level, simple types - no nested objects required. Each tool can work in two modes:

  • Cached mode: pass cache_id from a previous truncated response
  • Fresh mode: pass tool + arguments to call and filter in one step

Typical Agent Workflow

Step 1: Agent calls filesystem_read_file(path="large-data.json")
        -> Response is 50,000 chars -> auto-truncated + cached
        -> Agent receives first 8,000 chars + cache_id="a1b2c3d4e5f6"

Step 2: Agent calls proxy_explore(cache_id="a1b2c3d4e5f6")
        -> Returns structure summary: types, field names, sizes, sample
        -> 200 tokens instead of 50,000

Step 3: Agent calls proxy_filter(cache_id="a1b2c3d4e5f6", code="[{k: item[k] for k in ['name', 'email']} for item in data]")
        -> Returns only projected fields using Python REPL
        -> 500 tokens instead of 50,000

Step 4: Agent calls proxy_search(cache_id="a1b2c3d4e5f6", pattern="error", mode="bm25", top_k=3)
        -> Returns top-3 most relevant chunks
        -> 800 tokens instead of 50,000

Total: ~1,500 tokens vs 50,000+ (97% savings!)

Token Savings Impact

Real-World Token Reduction Examples

Use Case Without Proxy With Proxy Savings Cost Impact*
User Profile API 2,500 tokens 150 tokens 94% $0.10 -> $0.006
Log File Search (1MB) 280,000 tokens 800 tokens 99.7% Rate limited -> $0.32
Database Query (100 rows) 15,000 tokens 1,200 tokens 92% $0.60 -> $0.048
File System Scan 8,000 tokens 400 tokens 95% $0.32 -> $0.016

* Estimated using GPT-4 pricing ($0.03/1K input tokens)

Compound Savings in Multi-Step Workflows

For a typical AI agent workflow with 10 tool calls:

  • Without proxy: 10 calls x 10,000 tokens avg = 100,000 tokens -> $3.00
  • With proxy: 10 calls x 800 tokens avg = 8,000 tokens -> $0.24
  • Total savings per workflow: $2.76 (92% reduction)

Proxy Tool Reference

proxy_filter

Transform/filter cached or fresh tool results using Python REPL. Execute Python code to programmatically transform data. The cached data is available as variable data in a sandboxed Python environment.

Simple field projection:

{
  "cache_id": "a1b2c3d4e5f6",
  "code": "[{k: item[k] for k in ['name', 'email']} for item in data]"
}

Complex filtering with conditions:

{
  "cache_id": "a1b2c3d4e5f6",
  "code": "[item for item in data if item.get('status') == 'active' and item.get('score', 0) > 80]"
}

Aggregation:

{
  "cache_id": "a1b2c3d4e5f6",
  "code": "{'total': len(data), 'avg_score': sum(item.get('score', 0) for item in data) / len(data) if data else 0}"
}

With return format:

{
  "cache_id": "a1b2c3d4e5f6",
  "code": "[{'name': item['name'], 'email': item['email']} for item in data]",
  "return_format": "json"
}

With fresh call:

{
  "tool": "filesystem_read_file",
  "arguments": {"path": "data.json"},
  "code": "[item['name'] for item in data]"
}

proxy_search

Search within a cached or fresh result. Modes: regex, bm25, fuzzy, context.

{
  "cache_id": "a1b2c3d4e5f6",
  "pattern": "ERROR|FATAL",
  "mode": "regex",
  "case_insensitive": true,
  "max_results": 20,
  "context_lines": 2
}

BM25 relevance search:

{
  "cache_id": "a1b2c3d4e5f6",
  "pattern": "database connection timeout",
  "mode": "bm25",
  "top_k": 5
}

proxy_explore

Discover the structure of data without loading it all.

{
  "cache_id": "a1b2c3d4e5f6",
  "max_depth": 3
}

Returns: types, field names, sizes, and a small sample.


Multi-Agent Support

The proxy is designed to handle hundreds of concurrent agents efficiently through per-agent cache isolation.

How It Works

When enableAgentIsolation is enabled (default), each agent gets:

  • Dedicated cache quota: 20 entries and 100MB memory per agent (configurable)
  • Isolated cache space: One agent's cache doesn't affect others
  • Smart eviction: Large, idle, rarely-accessed entries evicted first
  • Automatic agent management: LRU eviction of agent caches when max agents reached

Benefits for Multi-Agent Scenarios

Scenario Without Isolation With Isolation
100 agents, shared cache (50 entries) ~0.5 entries/agent, 10-20% hit rate N/A
100 agents, per-agent isolation (20 entries) N/A 20 entries/agent, 70-80% hit rate
Cache thrashing High (agents evict each other's entries) None (isolated caches)
Memory usage Unbounded risk Predictable (~2GB for 100 agents)
Performance Degrades with more agents Consistent per agent

Configuration for Multi-Agent

{
  "proxySettings": {
    "enableAgentIsolation": true,
    "maxEntriesPerAgent": 20,
    "maxMemoryPerAgent": 104857600,
    "maxTotalAgents": 1000,
    "cacheTTLSeconds": 600
  }
}

Settings:

  • enableAgentIsolation: Enable per-agent cache isolation (recommended for 10+ agents)
  • maxEntriesPerAgent: Maximum cache entries per agent (default: 20)
  • maxMemoryPerAgent: Maximum memory per agent in bytes (default: 100MB)
  • maxTotalAgents: Maximum concurrent agent caches (default: 1000)

Cache ID Format

With agent isolation enabled, cache IDs are prefixed with the agent identifier:

  • Format: {agent_id}:{cache_id}
  • Example: agent_1:abc123def456
  • The proxy automatically handles agent ID extraction and prefixing

Backward Compatibility

If enableAgentIsolation is false, the proxy uses a shared cache (backward compatible with single-agent deployments).


Configuration

mcp.json

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/data"]
    },
    "git": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-git", "/repo"]
    }
  },
  "proxySettings": {
    "maxResponseSize": 8000,
    "cacheMaxEntries": 50,
    "cacheTTLSeconds": 300,
    "enableAutoTruncation": true,
    "enableAgentIsolation": true,
    "maxEntriesPerAgent": 20,
    "maxMemoryPerAgent": 104857600,
    "maxTotalAgents": 1000
  }
}

Proxy Settings

Setting Default Description
maxResponseSize 8000 Character threshold for auto-truncation
cacheMaxEntries 50 Maximum cached responses (per agent if isolation enabled)
cacheTTLSeconds 300 Cache entry time-to-live (seconds)
enableAutoTruncation true Enable/disable auto-truncation + caching
enableAgentIsolation true Enable per-agent cache isolation (recommended for multi-agent)
maxEntriesPerAgent 20 Maximum cache entries per agent (when isolation enabled)
maxMemoryPerAgent 104857600 Maximum memory per agent in bytes (100MB default)
maxTotalAgents 1000 Maximum concurrent agent caches

Installation

pip install mcp-rlm-proxy

# For development:
# git clone https://github.com/pratikjadhav2726/mcp-rlm-proxy.git && cd mcp-rlm-proxy && uv sync

Running the Proxy

mcp-rlm-proxy --config ./mcp.json

Using with Claude Desktop

Edit your Claude Desktop config:

{
  "mcpServers": {
    "proxy": {
      "command": "mcp-rlm-proxy",
      "args": ["--config", "/absolute/path/to/mcp.json"]
    }
  }
}

Using Programmatically

from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

server_params = StdioServerParameters(
    command="mcp-rlm-proxy",
    args=["--config", "/absolute/path/to/mcp.json"]
)

async with stdio_client(server_params) as (read, write):
    async with ClientSession(read, write) as session:
        await session.initialize()

        # List tools (prefixed with server names + 3 proxy tools)
        tools = await session.list_tools()

        # Call a tool - if response is large, it's auto-truncated with cache_id
        result = await session.call_tool("filesystem_read_file", {
            "path": "large-data.json"
        })

        # Drill into the cached data
        filtered = await session.call_tool("proxy_filter", {
            "cache_id": "a1b2c3d4e5f6",
            "fields": ["users.name", "users.email"]
        })

Legacy _meta Support

For backward compatibility, the _meta parameter is still accepted in tool arguments but is no longer advertised in schemas. If you pass _meta.projection or _meta.grep, the proxy will apply them. However, the recommended approach is to use the proxy tools instead:

Old way (_meta) New way (proxy tools)
Hidden in nested _meta.projection proxy_filter(fields=["name"])
Hidden in nested _meta.grep proxy_search(pattern="ERROR")
Not discoverable by agents First-class tools visible in list_tools()

Search Modes

Mode Use When Token Savings
structure (proxy_explore) Don't know data format 99.9%+
bm25 Know what, not where 99%+
fuzzy Handle typos/variations 98%+
context Need full paragraphs 95%+
regex Know exact pattern 95%+

Performance Monitoring

Automatic tracking of token savings and performance:

INFO: Token savings: 50,000 -> 500 tokens (99.0% reduction)

=== Proxy Performance Summary ===
  Total calls: 127
  Projection calls: 45
  Grep calls: 23
  Auto-truncated: 15
  Original tokens: 2,450,000
  Filtered tokens: 125,000
  Tokens saved: 2,325,000
  Savings: 94.9%
  Active connections: 3

Cache Statistics (Multi-Agent)

With agent isolation enabled, you can monitor per-agent cache usage:

# Get aggregate cache statistics
stats = await proxy_server.cache.stats()
# Returns:
# {
#   "total_agents": 42,
#   "total_entries": 840,
#   "total_cached_bytes": 52428800,
#   "max_agents": 1000,
#   "max_entries_per_agent": 20,
#   "max_memory_per_agent": 104857600,
#   "agents": [
#     {
#       "agent_id": "agent_1",
#       "entries": 15,
#       "memory_bytes": 3145728,
#       "last_accessed_at": 1234567890.123
#     },
#     ...
#   ]
# }

# Get statistics for a specific agent
agent_stats = await proxy_server.cache.stats(agent_id="agent_1")

Comparison with RLM Paper Concepts

RLM Paper Concept MCP-RLM-Proxy Implementation
External Environment Tool outputs treated as inspectable data stores
Recursive Decomposition proxy_explore -> proxy_filter -> proxy_search workflow
Programmatic Exploration proxy_search with multiple modes
Snippet Processing Auto-truncation + cached follow-up
Cost Efficiency 85-95% token reduction vs. full context loading
Long Context Handling Processes multi-MB tool outputs without context limits

Documentation


Related Concepts


Contributing

See CONTRIBUTING.md for development setup and guidelines.

License

MIT License - see LICENSE


Built for the AI agent community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_rlm_proxy-0.1.1.tar.gz (126.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_rlm_proxy-0.1.1-py3-none-any.whl (49.8 kB view details)

Uploaded Python 3

File details

Details for the file mcp_rlm_proxy-0.1.1.tar.gz.

File metadata

  • Download URL: mcp_rlm_proxy-0.1.1.tar.gz
  • Upload date:
  • Size: 126.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.2

File hashes

Hashes for mcp_rlm_proxy-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b144e320f54916f0d5d6ebb7407bd3fa6266680605e85107117c9761c9406b4d
MD5 fbf355e6e45d5168ebf6d88776dfdb48
BLAKE2b-256 b03b6b09fa8b4e52e5e1e494f50776b6c9b6346c3b7e5a979dc4ef47dc06ac02

See more details on using hashes here.

File details

Details for the file mcp_rlm_proxy-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for mcp_rlm_proxy-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 52a61ed9c7e908b9a21ba5134b4bd6e09012ba3d821272ec3a7d59d67d94a7bd
MD5 270007d7e8cdecc8eff196829c4e8774
BLAKE2b-256 cd90f35b561372d200a5cc236c870c573a329b1a82f51e0cbb0ff7e53ad28ef0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page