MCP-RLM-Proxy: Intelligent MCP middleware with RLM-style recursive exploration, smart caching, and proxy tools.
Project description
MCP-RLM-Proxy: Intelligent Middleware for MCP Servers
Production-ready middleware implementing Recursive Language Model principles (arXiv:2512.24601) for efficient multi-server management, automatic large-response handling, and first-class proxy tools for recursive data exploration. 100% compatible with the MCP specification - works with any existing MCP server without modification.
Quick Start for Current MCP Users
Already using MCP servers? Add this as middleware in 5 minutes:
# 1. Install
pip install mcp-rlm-proxy
# 2. Create a config in your working directory (mcp.json)
mcp-rlm-proxy --init-config ./mcp.json
# 3. Edit mcp.json to add your servers
$EDITOR ./mcp.json
# 2. Configure your existing servers
cat > mcp.json << EOF
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/your/path"]
}
}
}
EOF
# 3. Run the proxy
mcp-rlm-proxy --config ./mcp.json
That's it! Your servers now have automatic large-response handling and three powerful proxy tools for recursive exploration.
Why Use This as Middleware?
The Problem with Direct MCP Connections
When AI agents connect directly to MCP servers:
- Token waste: 85-95% of returned data is often unnecessary
- Context pollution: Irrelevant data dilutes important information
- No multi-server aggregation: Must connect to each server separately
- Performance degradation: Large responses slow everything down
- Cost explosion: Every unnecessary token costs money
The Solution: Intelligent Middleware
+---------------+
| MCP Client | (Claude Desktop, Cursor, Custom Client)
+-------+-------+
| ONE connection
v
+---------------+
| MCP-RLM | <-- THIS MIDDLEWARE
| Proxy | - Connects to N servers
| | - Auto-truncates large responses
| | - Caches + provides proxy_filter / proxy_search / proxy_explore
| | - Tracks token savings
+-------+-------+
| Manages connections to your servers
+---+----+--------+--------+
v v v v
+-----+ +-----+ +-----+ +-----+
| FS | | Git | | API | | DB | <-- Your existing servers
+-----+ +-----+ +-----+ +-----+ (NO changes needed!)
Benefits
- Zero Friction: Works with existing MCP servers (no code changes)
- Huge Token Savings: 85-95% reduction typical
- Multi-Server: Aggregate tools from many servers through one interface
- Clean Schemas: No
_metainjection; tool schemas are passed through unmodified - Agent-Friendly: Three first-class proxy tools with flat, simple parameters.
proxy_filteruses Python REPL for flexible programmatic transformations - Auto-Truncation: Large responses automatically truncated + cached for follow-up
- Multi-Agent Ready: Per-agent cache isolation supports hundreds of concurrent agents
- Production Ready: Connection pooling, error handling, metrics, TTL-based caching, memory-aware eviction
How It Works
Architecture Overview
- Client connects to proxy (instead of individual servers)
- Proxy connects to N servers (configured in
mcp.json) - Tools are aggregated with server prefixes (
filesystem_read_file) - Tool schemas pass through clean - no modification, no
_metainjection - Large responses are auto-truncated and cached with a
cache_id - Three proxy tools let agents drill into cached data without re-executing
The Proxy Tools
| Tool | Purpose | Key Parameters |
|---|---|---|
proxy_filter |
Transform/filter using Python REPL | cache_id, code (required), return_format |
proxy_search |
Grep/BM25/fuzzy/context search on cached or fresh result | cache_id, pattern, mode, max_results |
proxy_explore |
Discover data structure without loading content | cache_id, max_depth |
All parameters are flat, top-level, simple types - no nested objects required. Each tool can work in two modes:
- Cached mode: pass
cache_idfrom a previous truncated response - Fresh mode: pass
tool+argumentsto call and filter in one step
Typical Agent Workflow
Step 1: Agent calls filesystem_read_file(path="large-data.json")
-> Response is 50,000 chars -> auto-truncated + cached
-> Agent receives first 8,000 chars + cache_id="a1b2c3d4e5f6"
Step 2: Agent calls proxy_explore(cache_id="a1b2c3d4e5f6")
-> Returns structure summary: types, field names, sizes, sample
-> 200 tokens instead of 50,000
Step 3: Agent calls proxy_filter(cache_id="a1b2c3d4e5f6", code="[{k: item[k] for k in ['name', 'email']} for item in data]")
-> Returns only projected fields using Python REPL
-> 500 tokens instead of 50,000
Step 4: Agent calls proxy_search(cache_id="a1b2c3d4e5f6", pattern="error", mode="bm25", top_k=3)
-> Returns top-3 most relevant chunks
-> 800 tokens instead of 50,000
Total: ~1,500 tokens vs 50,000+ (97% savings!)
Token Savings Impact
Real-World Token Reduction Examples
| Use Case | Without Proxy | With Proxy | Savings | Cost Impact* |
|---|---|---|---|---|
| User Profile API | 2,500 tokens | 150 tokens | 94% | $0.10 -> $0.006 |
| Log File Search (1MB) | 280,000 tokens | 800 tokens | 99.7% | Rate limited -> $0.32 |
| Database Query (100 rows) | 15,000 tokens | 1,200 tokens | 92% | $0.60 -> $0.048 |
| File System Scan | 8,000 tokens | 400 tokens | 95% | $0.32 -> $0.016 |
* Estimated using GPT-4 pricing ($0.03/1K input tokens)
Compound Savings in Multi-Step Workflows
For a typical AI agent workflow with 10 tool calls:
- Without proxy: 10 calls x 10,000 tokens avg = 100,000 tokens -> $3.00
- With proxy: 10 calls x 800 tokens avg = 8,000 tokens -> $0.24
- Total savings per workflow: $2.76 (92% reduction)
Proxy Tool Reference
proxy_filter
Transform/filter cached or fresh tool results using Python REPL. Execute Python code to programmatically transform data. The cached data is available as variable data in a sandboxed Python environment.
Simple field projection:
{
"cache_id": "a1b2c3d4e5f6",
"code": "[{k: item[k] for k in ['name', 'email']} for item in data]"
}
Complex filtering with conditions:
{
"cache_id": "a1b2c3d4e5f6",
"code": "[item for item in data if item.get('status') == 'active' and item.get('score', 0) > 80]"
}
Aggregation:
{
"cache_id": "a1b2c3d4e5f6",
"code": "{'total': len(data), 'avg_score': sum(item.get('score', 0) for item in data) / len(data) if data else 0}"
}
With return format:
{
"cache_id": "a1b2c3d4e5f6",
"code": "[{'name': item['name'], 'email': item['email']} for item in data]",
"return_format": "json"
}
With fresh call:
{
"tool": "filesystem_read_file",
"arguments": {"path": "data.json"},
"code": "[item['name'] for item in data]"
}
proxy_search
Search within a cached or fresh result. Modes: regex, bm25, fuzzy, context.
{
"cache_id": "a1b2c3d4e5f6",
"pattern": "ERROR|FATAL",
"mode": "regex",
"case_insensitive": true,
"max_results": 20,
"context_lines": 2
}
BM25 relevance search:
{
"cache_id": "a1b2c3d4e5f6",
"pattern": "database connection timeout",
"mode": "bm25",
"top_k": 5
}
proxy_explore
Discover the structure of data without loading it all.
{
"cache_id": "a1b2c3d4e5f6",
"max_depth": 3
}
Returns: types, field names, sizes, and a small sample.
Multi-Agent Support
The proxy is designed to handle hundreds of concurrent agents efficiently through per-agent cache isolation.
How It Works
When enableAgentIsolation is enabled (default), each agent gets:
- Dedicated cache quota: 20 entries and 100MB memory per agent (configurable)
- Isolated cache space: One agent's cache doesn't affect others
- Smart eviction: Large, idle, rarely-accessed entries evicted first
- Automatic agent management: LRU eviction of agent caches when max agents reached
Benefits for Multi-Agent Scenarios
| Scenario | Without Isolation | With Isolation |
|---|---|---|
| 100 agents, shared cache (50 entries) | ~0.5 entries/agent, 10-20% hit rate | N/A |
| 100 agents, per-agent isolation (20 entries) | N/A | 20 entries/agent, 70-80% hit rate |
| Cache thrashing | High (agents evict each other's entries) | None (isolated caches) |
| Memory usage | Unbounded risk | Predictable (~2GB for 100 agents) |
| Performance | Degrades with more agents | Consistent per agent |
Configuration for Multi-Agent
{
"proxySettings": {
"enableAgentIsolation": true,
"maxEntriesPerAgent": 20,
"maxMemoryPerAgent": 104857600,
"maxTotalAgents": 1000,
"cacheTTLSeconds": 600
}
}
Settings:
enableAgentIsolation: Enable per-agent cache isolation (recommended for 10+ agents)maxEntriesPerAgent: Maximum cache entries per agent (default: 20)maxMemoryPerAgent: Maximum memory per agent in bytes (default: 100MB)maxTotalAgents: Maximum concurrent agent caches (default: 1000)
Cache ID Format
With agent isolation enabled, cache IDs are prefixed with the agent identifier:
- Format:
{agent_id}:{cache_id} - Example:
agent_1:abc123def456 - The proxy automatically handles agent ID extraction and prefixing
Backward Compatibility
If enableAgentIsolation is false, the proxy uses a shared cache (backward compatible with single-agent deployments).
Configuration
mcp.json
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/data"]
},
"git": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-git", "/repo"]
}
},
"proxySettings": {
"maxResponseSize": 8000,
"cacheMaxEntries": 50,
"cacheTTLSeconds": 300,
"enableAutoTruncation": true,
"enableAgentIsolation": true,
"maxEntriesPerAgent": 20,
"maxMemoryPerAgent": 104857600,
"maxTotalAgents": 1000
}
}
Proxy Settings
| Setting | Default | Description |
|---|---|---|
maxResponseSize |
8000 | Character threshold for auto-truncation |
cacheMaxEntries |
50 | Maximum cached responses (per agent if isolation enabled) |
cacheTTLSeconds |
300 | Cache entry time-to-live (seconds) |
enableAutoTruncation |
true | Enable/disable auto-truncation + caching |
enableAgentIsolation |
true | Enable per-agent cache isolation (recommended for multi-agent) |
maxEntriesPerAgent |
20 | Maximum cache entries per agent (when isolation enabled) |
maxMemoryPerAgent |
104857600 | Maximum memory per agent in bytes (100MB default) |
maxTotalAgents |
1000 | Maximum concurrent agent caches |
Installation
pip install mcp-rlm-proxy
# For development:
# git clone https://github.com/pratikjadhav2726/mcp-rlm-proxy.git && cd mcp-rlm-proxy && uv sync
Running the Proxy
mcp-rlm-proxy --config ./mcp.json
Using with Claude Desktop
Edit your Claude Desktop config:
{
"mcpServers": {
"proxy": {
"command": "mcp-rlm-proxy",
"args": ["--config", "/absolute/path/to/mcp.json"]
}
}
}
Using Programmatically
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
server_params = StdioServerParameters(
command="mcp-rlm-proxy",
args=["--config", "/absolute/path/to/mcp.json"]
)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
# List tools (prefixed with server names + 3 proxy tools)
tools = await session.list_tools()
# Call a tool - if response is large, it's auto-truncated with cache_id
result = await session.call_tool("filesystem_read_file", {
"path": "large-data.json"
})
# Drill into the cached data
filtered = await session.call_tool("proxy_filter", {
"cache_id": "a1b2c3d4e5f6",
"fields": ["users.name", "users.email"]
})
Legacy _meta Support
For backward compatibility, the _meta parameter is still accepted in tool arguments but is no longer advertised in schemas. If you pass _meta.projection or _meta.grep, the proxy will apply them. However, the recommended approach is to use the proxy tools instead:
| Old way (_meta) | New way (proxy tools) |
|---|---|
Hidden in nested _meta.projection |
proxy_filter(fields=["name"]) |
Hidden in nested _meta.grep |
proxy_search(pattern="ERROR") |
| Not discoverable by agents | First-class tools visible in list_tools() |
Search Modes
| Mode | Use When | Token Savings |
|---|---|---|
structure (proxy_explore) |
Don't know data format | 99.9%+ |
bm25 |
Know what, not where | 99%+ |
fuzzy |
Handle typos/variations | 98%+ |
context |
Need full paragraphs | 95%+ |
regex |
Know exact pattern | 95%+ |
Performance Monitoring
Automatic tracking of token savings and performance:
INFO: Token savings: 50,000 -> 500 tokens (99.0% reduction)
=== Proxy Performance Summary ===
Total calls: 127
Projection calls: 45
Grep calls: 23
Auto-truncated: 15
Original tokens: 2,450,000
Filtered tokens: 125,000
Tokens saved: 2,325,000
Savings: 94.9%
Active connections: 3
Cache Statistics (Multi-Agent)
With agent isolation enabled, you can monitor per-agent cache usage:
# Get aggregate cache statistics
stats = await proxy_server.cache.stats()
# Returns:
# {
# "total_agents": 42,
# "total_entries": 840,
# "total_cached_bytes": 52428800,
# "max_agents": 1000,
# "max_entries_per_agent": 20,
# "max_memory_per_agent": 104857600,
# "agents": [
# {
# "agent_id": "agent_1",
# "entries": 15,
# "memory_bytes": 3145728,
# "last_accessed_at": 1234567890.123
# },
# ...
# ]
# }
# Get statistics for a specific agent
agent_stats = await proxy_server.cache.stats(agent_id="agent_1")
Comparison with RLM Paper Concepts
| RLM Paper Concept | MCP-RLM-Proxy Implementation |
|---|---|
| External Environment | Tool outputs treated as inspectable data stores |
| Recursive Decomposition | proxy_explore -> proxy_filter -> proxy_search workflow |
| Programmatic Exploration | proxy_search with multiple modes |
| Snippet Processing | Auto-truncation + cached follow-up |
| Cost Efficiency | 85-95% token reduction vs. full context loading |
| Long Context Handling | Processes multi-MB tool outputs without context limits |
Documentation
- Architecture - System design and data flow
- Configuration - Configuration options and validation
- Performance - Performance benchmarks and optimization
Related Concepts
- Recursive Language Models Paper: arXiv:2512.24601
- Model Context Protocol: MCP Specification
Contributing
See CONTRIBUTING.md for development setup and guidelines.
License
MIT License - see LICENSE
Built for the AI agent community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_rlm_proxy-0.1.1.tar.gz.
File metadata
- Download URL: mcp_rlm_proxy-0.1.1.tar.gz
- Upload date:
- Size: 126.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b144e320f54916f0d5d6ebb7407bd3fa6266680605e85107117c9761c9406b4d
|
|
| MD5 |
fbf355e6e45d5168ebf6d88776dfdb48
|
|
| BLAKE2b-256 |
b03b6b09fa8b4e52e5e1e494f50776b6c9b6346c3b7e5a979dc4ef47dc06ac02
|
File details
Details for the file mcp_rlm_proxy-0.1.1-py3-none-any.whl.
File metadata
- Download URL: mcp_rlm_proxy-0.1.1-py3-none-any.whl
- Upload date:
- Size: 49.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52a61ed9c7e908b9a21ba5134b4bd6e09012ba3d821272ec3a7d59d67d94a7bd
|
|
| MD5 |
270007d7e8cdecc8eff196829c4e8774
|
|
| BLAKE2b-256 |
cd90f35b561372d200a5cc236c870c573a329b1a82f51e0cbb0ff7e53ad28ef0
|