MCP-RLM-Proxy: Intelligent MCP middleware with RLM-style recursive exploration, smart caching, and proxy tools.

These details have not been verified by PyPI

Project links

Project description

MCP-RLM-Proxy: Intelligent Middleware for MCP Servers

Production-ready middleware implementing Recursive Language Model principles (arXiv:2512.24601) for efficient multi-server management, automatic large-response handling, and first-class proxy tools for recursive data exploration. 100% compatible with the MCP specification - works with any existing MCP server without modification.

Quick Start for Current MCP Users

Already using MCP servers? Add this as middleware in 5 minutes:

# 1. Install
pip install mcp-rlm-proxy

# 2. Create a config in your working directory (mcp.json)
mcp-rlm-proxy --init-config ./mcp.json

# 3. Edit mcp.json to add your servers
$EDITOR ./mcp.json

# 2. Configure your existing servers
cat > mcp.json << EOF
{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/your/path"]
    }
  }
}
EOF

# 3. Run the proxy
mcp-rlm-proxy --config ./mcp.json

That's it! Your servers now have automatic large-response handling and three powerful proxy tools for recursive exploration.

Why Use This as Middleware?

The Problem with Direct MCP Connections

When AI agents connect directly to MCP servers:

Token waste: 85-95% of returned data is often unnecessary
Context pollution: Irrelevant data dilutes important information
No multi-server aggregation: Must connect to each server separately
Performance degradation: Large responses slow everything down
Cost explosion: Every unnecessary token costs money

The Solution: Intelligent Middleware

+---------------+
|  MCP Client   |  (Claude Desktop, Cursor, Custom Client)
+-------+-------+
        | ONE connection
        v
+---------------+
| MCP-RLM       |  <-- THIS MIDDLEWARE
| Proxy         |  - Connects to N servers
|               |  - Auto-truncates large responses
|               |  - Caches + provides proxy_filter / proxy_search / proxy_explore
|               |  - Tracks token savings
+-------+-------+
        | Manages connections to your servers
    +---+----+--------+--------+
    v        v        v        v
+-----+  +-----+  +-----+  +-----+
| FS  |  | Git |  | API |  | DB  |  <-- Your existing servers
+-----+  +-----+  +-----+  +-----+      (NO changes needed!)

Benefits

Zero Friction: Works with existing MCP servers (no code changes)
Huge Token Savings: 85-95% reduction typical
Multi-Server: Aggregate tools from many servers through one interface
Clean Schemas: No _meta injection; tool schemas are passed through unmodified
Agent-Friendly: Three first-class proxy tools with flat, simple parameters. proxy_filter uses Python REPL for flexible programmatic transformations
Auto-Truncation: Large responses automatically truncated + cached for follow-up
Multi-Agent Ready: Per-agent cache isolation supports hundreds of concurrent agents
Production Ready: Connection pooling, error handling, metrics, TTL-based caching, memory-aware eviction

How It Works

Architecture Overview

Client connects to proxy (instead of individual servers)
Proxy connects to N servers (configured in mcp.json)
Tools are aggregated with server prefixes (filesystem_read_file)
Tool schemas pass through clean - no modification, no _meta injection
Large responses are auto-truncated and cached with a cache_id
Three proxy tools let agents drill into cached data without re-executing

The Proxy Tools

Tool	Purpose	Key Parameters
`proxy_filter`	Transform/filter using Python REPL	`cache_id`, `code` (required), `return_format`
`proxy_search`	Grep/BM25/fuzzy/context search on cached or fresh result	`cache_id`, `pattern`, `mode`, `max_results`
`proxy_explore`	Discover data structure without loading content	`cache_id`, `max_depth`

All parameters are flat, top-level, simple types - no nested objects required. Each tool can work in two modes:

Cached mode: pass cache_id from a previous truncated response
Fresh mode: pass tool + arguments to call and filter in one step

Typical Agent Workflow

Step 1: Agent calls filesystem_read_file(path="large-data.json")
        -> Response is 50,000 chars -> auto-truncated + cached
        -> Agent receives first 8,000 chars + cache_id="a1b2c3d4e5f6"

Step 2: Agent calls proxy_explore(cache_id="a1b2c3d4e5f6")
        -> Returns structure summary: types, field names, sizes, sample
        -> 200 tokens instead of 50,000

Step 3: Agent calls proxy_filter(cache_id="a1b2c3d4e5f6", code="[{k: item[k] for k in ['name', 'email']} for item in data]")
        -> Returns only projected fields using Python REPL
        -> 500 tokens instead of 50,000

Step 4: Agent calls proxy_search(cache_id="a1b2c3d4e5f6", pattern="error", mode="bm25", top_k=3)
        -> Returns top-3 most relevant chunks
        -> 800 tokens instead of 50,000

Total: ~1,500 tokens vs 50,000+ (97% savings!)

Token Savings Impact

Real-World Token Reduction Examples

Use Case	Without Proxy	With Proxy	Savings	Cost Impact*
User Profile API	2,500 tokens	150 tokens	94%	$0.10 -> $0.006
Log File Search (1MB)	280,000 tokens	800 tokens	99.7%	Rate limited -> $0.32
Database Query (100 rows)	15,000 tokens	1,200 tokens	92%	$0.60 -> $0.048
File System Scan	8,000 tokens	400 tokens	95%	$0.32 -> $0.016

* Estimated using GPT-4 pricing ($0.03/1K input tokens)

Compound Savings in Multi-Step Workflows

For a typical AI agent workflow with 10 tool calls:

Without proxy: 10 calls x 10,000 tokens avg = 100,000 tokens -> $3.00
With proxy: 10 calls x 800 tokens avg = 8,000 tokens -> $0.24
Total savings per workflow: $2.76 (92% reduction)

Proxy Tool Reference

proxy_filter

Transform/filter cached or fresh tool results using Python REPL. Execute Python code to programmatically transform data. The cached data is available as variable data in a sandboxed Python environment.

Simple field projection:

{
  "cache_id": "a1b2c3d4e5f6",
  "code": "[{k: item[k] for k in ['name', 'email']} for item in data]"
}

Complex filtering with conditions:

{
  "cache_id": "a1b2c3d4e5f6",
  "code": "[item for item in data if item.get('status') == 'active' and item.get('score', 0) > 80]"
}

Aggregation:

{
  "cache_id": "a1b2c3d4e5f6",
  "code": "{'total': len(data), 'avg_score': sum(item.get('score', 0) for item in data) / len(data) if data else 0}"
}

With return format:

{
  "cache_id": "a1b2c3d4e5f6",
  "code": "[{'name': item['name'], 'email': item['email']} for item in data]",
  "return_format": "json"
}

With fresh call:

{
  "tool": "filesystem_read_file",
  "arguments": {"path": "data.json"},
  "code": "[item['name'] for item in data]"
}

proxy_search

Search within a cached or fresh result. Modes: regex, bm25, fuzzy, context.

{
  "cache_id": "a1b2c3d4e5f6",
  "pattern": "ERROR|FATAL",
  "mode": "regex",
  "case_insensitive": true,
  "max_results": 20,
  "context_lines": 2
}

BM25 relevance search:

{
  "cache_id": "a1b2c3d4e5f6",
  "pattern": "database connection timeout",
  "mode": "bm25",
  "top_k": 5
}

proxy_explore

Discover the structure of data without loading it all.

{
  "cache_id": "a1b2c3d4e5f6",
  "max_depth": 3
}

Returns: types, field names, sizes, and a small sample.

Multi-Agent Support

The proxy is designed to handle hundreds of concurrent agents efficiently through per-agent cache isolation.

How It Works

When enableAgentIsolation is enabled (default), each agent gets:

Dedicated cache quota: 20 entries and 100MB memory per agent (configurable)
Isolated cache space: One agent's cache doesn't affect others
Smart eviction: Large, idle, rarely-accessed entries evicted first
Automatic agent management: LRU eviction of agent caches when max agents reached

Benefits for Multi-Agent Scenarios

Scenario	Without Isolation	With Isolation
100 agents, shared cache (50 entries)	~0.5 entries/agent, 10-20% hit rate	N/A
100 agents, per-agent isolation (20 entries)	N/A	20 entries/agent, 70-80% hit rate
Cache thrashing	High (agents evict each other's entries)	None (isolated caches)
Memory usage	Unbounded risk	Predictable (~2GB for 100 agents)
Performance	Degrades with more agents	Consistent per agent

Configuration for Multi-Agent

{
  "proxySettings": {
    "enableAgentIsolation": true,
    "maxEntriesPerAgent": 20,
    "maxMemoryPerAgent": 104857600,
    "maxTotalAgents": 1000,
    "cacheTTLSeconds": 600
  }
}

Settings:

enableAgentIsolation: Enable per-agent cache isolation (recommended for 10+ agents)
maxEntriesPerAgent: Maximum cache entries per agent (default: 20)
maxMemoryPerAgent: Maximum memory per agent in bytes (default: 100MB)
maxTotalAgents: Maximum concurrent agent caches (default: 1000)

Cache ID Format

With agent isolation enabled, cache IDs are prefixed with the agent identifier:

Format: {agent_id}:{cache_id}
Example: agent_1:abc123def456
The proxy automatically handles agent ID extraction and prefixing

Backward Compatibility

If enableAgentIsolation is false, the proxy uses a shared cache (backward compatible with single-agent deployments).

Configuration

mcp.json

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/data"]
    },
    "git": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-git", "/repo"]
    }
  },
  "proxySettings": {
    "maxResponseSize": 8000,
    "cacheMaxEntries": 50,
    "cacheTTLSeconds": 300,
    "enableAutoTruncation": true,
    "enableAgentIsolation": true,
    "maxEntriesPerAgent": 20,
    "maxMemoryPerAgent": 104857600,
    "maxTotalAgents": 1000
  }
}

Proxy Settings

Setting	Default	Description
`maxResponseSize`	8000	Character threshold for auto-truncation
`cacheMaxEntries`	50	Maximum cached responses (per agent if isolation enabled)
`cacheTTLSeconds`	300	Cache entry time-to-live (seconds)
`enableAutoTruncation`	true	Enable/disable auto-truncation + caching
`enableAgentIsolation`	true	Enable per-agent cache isolation (recommended for multi-agent)
`maxEntriesPerAgent`	20	Maximum cache entries per agent (when isolation enabled)
`maxMemoryPerAgent`	104857600	Maximum memory per agent in bytes (100MB default)
`maxTotalAgents`	1000	Maximum concurrent agent caches

Installation

pip install mcp-rlm-proxy

# For development:
# git clone https://github.com/pratikjadhav2726/mcp-rlm-proxy.git && cd mcp-rlm-proxy && uv sync

Running the Proxy

mcp-rlm-proxy --config ./mcp.json

Using with Claude Desktop

Edit your Claude Desktop config:

{
  "mcpServers": {
    "proxy": {
      "command": "mcp-rlm-proxy",
      "args": ["--config", "/absolute/path/to/mcp.json"]
    }
  }
}

Using Programmatically

from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

server_params = StdioServerParameters(
    command="mcp-rlm-proxy",
    args=["--config", "/absolute/path/to/mcp.json"]
)

async with stdio_client(server_params) as (read, write):
    async with ClientSession(read, write) as session:
        await session.initialize()

        # List tools (prefixed with server names + 3 proxy tools)
        tools = await session.list_tools()

        # Call a tool - if response is large, it's auto-truncated with cache_id
        result = await session.call_tool("filesystem_read_file", {
            "path": "large-data.json"
        })

        # Drill into the cached data
        filtered = await session.call_tool("proxy_filter", {
            "cache_id": "a1b2c3d4e5f6",
            "fields": ["users.name", "users.email"]
        })

Legacy _meta Support

For backward compatibility, the _meta parameter is still accepted in tool arguments but is no longer advertised in schemas. If you pass _meta.projection or _meta.grep, the proxy will apply them. However, the recommended approach is to use the proxy tools instead:

Old way (_meta)	New way (proxy tools)
Hidden in nested `_meta.projection`	`proxy_filter(fields=["name"])`
Hidden in nested `_meta.grep`	`proxy_search(pattern="ERROR")`
Not discoverable by agents	First-class tools visible in `list_tools()`

Search Modes

Mode	Use When	Token Savings
`structure` (proxy_explore)	Don't know data format	99.9%+
`bm25`	Know what, not where	99%+
`fuzzy`	Handle typos/variations	98%+
`context`	Need full paragraphs	95%+
`regex`	Know exact pattern	95%+

Performance Monitoring

Automatic tracking of token savings and performance:

INFO: Token savings: 50,000 -> 500 tokens (99.0% reduction)

=== Proxy Performance Summary ===
  Total calls: 127
  Projection calls: 45
  Grep calls: 23
  Auto-truncated: 15
  Original tokens: 2,450,000
  Filtered tokens: 125,000
  Tokens saved: 2,325,000
  Savings: 94.9%
  Active connections: 3

Cache Statistics (Multi-Agent)

With agent isolation enabled, you can monitor per-agent cache usage:

# Get aggregate cache statistics
stats = await proxy_server.cache.stats()
# Returns:
# {
#   "total_agents": 42,
#   "total_entries": 840,
#   "total_cached_bytes": 52428800,
#   "max_agents": 1000,
#   "max_entries_per_agent": 20,
#   "max_memory_per_agent": 104857600,
#   "agents": [
#     {
#       "agent_id": "agent_1",
#       "entries": 15,
#       "memory_bytes": 3145728,
#       "last_accessed_at": 1234567890.123
#     },
#     ...
#   ]
# }

# Get statistics for a specific agent
agent_stats = await proxy_server.cache.stats(agent_id="agent_1")

Comparison with RLM Paper Concepts

RLM Paper Concept	MCP-RLM-Proxy Implementation
External Environment	Tool outputs treated as inspectable data stores
Recursive Decomposition	proxy_explore -> proxy_filter -> proxy_search workflow
Programmatic Exploration	proxy_search with multiple modes
Snippet Processing	Auto-truncation + cached follow-up
Cost Efficiency	85-95% token reduction vs. full context loading
Long Context Handling	Processes multi-MB tool outputs without context limits

Documentation

Architecture - System design and data flow
Configuration - Configuration options and validation
Performance - Performance benchmarks and optimization

Related Concepts

Recursive Language Models Paper: arXiv:2512.24601
Model Context Protocol: MCP Specification

Contributing

See CONTRIBUTING.md for development setup and guidelines.

License

MIT License - see LICENSE

Built for the AI agent community

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Feb 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_rlm_proxy-0.1.1.tar.gz (126.7 kB view details)

Uploaded Feb 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mcp_rlm_proxy-0.1.1-py3-none-any.whl (49.8 kB view details)

Uploaded Feb 18, 2026 Python 3

File details

Details for the file mcp_rlm_proxy-0.1.1.tar.gz.

File metadata

Download URL: mcp_rlm_proxy-0.1.1.tar.gz
Upload date: Feb 18, 2026
Size: 126.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.2

File hashes

Hashes for mcp_rlm_proxy-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`b144e320f54916f0d5d6ebb7407bd3fa6266680605e85107117c9761c9406b4d`
MD5	`fbf355e6e45d5168ebf6d88776dfdb48`
BLAKE2b-256	`b03b6b09fa8b4e52e5e1e494f50776b6c9b6346c3b7e5a979dc4ef47dc06ac02`

See more details on using hashes here.

File details

Details for the file mcp_rlm_proxy-0.1.1-py3-none-any.whl.

File metadata

Download URL: mcp_rlm_proxy-0.1.1-py3-none-any.whl
Upload date: Feb 18, 2026
Size: 49.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.2

File hashes

Hashes for mcp_rlm_proxy-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`52a61ed9c7e908b9a21ba5134b4bd6e09012ba3d821272ec3a7d59d67d94a7bd`
MD5	`270007d7e8cdecc8eff196829c4e8774`
BLAKE2b-256	`cd90f35b561372d200a5cc236c870c573a329b1a82f51e0cbb0ff7e53ad28ef0`

See more details on using hashes here.

mcp-rlm-proxy 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MCP-RLM-Proxy: Intelligent Middleware for MCP Servers

Quick Start for Current MCP Users

Why Use This as Middleware?

The Problem with Direct MCP Connections

The Solution: Intelligent Middleware

Benefits

How It Works

Architecture Overview

The Proxy Tools

Typical Agent Workflow

Token Savings Impact

Real-World Token Reduction Examples

Compound Savings in Multi-Step Workflows

Proxy Tool Reference

proxy_filter

proxy_search

proxy_explore

Multi-Agent Support

How It Works

Benefits for Multi-Agent Scenarios

Configuration for Multi-Agent

Cache ID Format

Backward Compatibility

Configuration

mcp.json

Proxy Settings

Installation

Running the Proxy

Using with Claude Desktop

Using Programmatically

Legacy _meta Support

Search Modes

Performance Monitoring

Cache Statistics (Multi-Agent)

Comparison with RLM Paper Concepts

Documentation

Related Concepts

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes