Zero-friction shared memory for multi-agent AI systems. Pass URL pointers instead of token-expensive blobs.
Project description
ContextRelay ๐ The Zero-Friction S3 for Agentic Memory
Pass a URL. Not a token wall. ContextRelay stores massive AI context payloads at the Cloudflare edge and gives you back a single URL. Agents exchange the pointer โ not the data.
The Problem
Multi-agent AI pipelines have a dirty secret: they burn most of their token budget passing data around, not thinking.
When Agent A (Claude) finishes building a 50,000-token architecture spec and needs to hand it to Agent B (Mistral), your orchestrator has two options โ and both are terrible:
| Option | Cost |
|---|---|
| Pass the full text in the next prompt | 50,000 tokens ร $0.003/1K = $0.15 per handoff |
| Truncate it | You lose context. Agent B works blind. |
At scale โ hundreds of agents, thousands of handoffs per day โ you are paying a token tax on data transit, not intelligence. This is waste, not compute.
The Solution: Token Cost Arbitrage
ContextRelay replaces token-expensive data blobs with sub-100ms URL pointers.
Without ContextRelay: With ContextRelay:
โโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Agent A โ [50KB JSON] Agent A โ POST /push โ [UUID url]
โ โ
Agent B Agent B โ GET /pull/<id> โ [50KB JSON]
(50K tokens burned) (73ms, ~0 tokens)
The math: A 50KB context payload costs ~12,500 tokens to pass directly. Via ContextRelay, the pointer URL is ~80 characters โ effectively zero tokens. At 1,000 agent handoffs/day, that's ~$150/day saved.
ContextRelay runs on Cloudflare Workers โ globally distributed V8 isolates with sub-millisecond cold starts. Your context lives at the edge, milliseconds from wherever your agents are running.
Quickstart โ MCP Users (Claude Desktop, Cursor)
Install the server:
pip install contextrelay-mcp
Add to your claude_desktop_config.json:
{
"mcpServers": {
"contextrelay": {
"command": "contextrelay-mcp",
"env": {
"CONTEXTRELAY_URL": "https://contextrelay.your-account.workers.dev"
}
}
}
}
Restart Claude Desktop. You now have two native tools:
push_contextโ offload any large payload, get back a URLpull_contextโ retrieve any payload from aworkers.dev/pull/URL
Claude will call these automatically when handling large context handoffs.
Quickstart โ Python SDK
pip install contextrelay-mcp
from contextrelay import ContextRelay
hub = ContextRelay("https://contextrelay.your-account.workers.dev")
# Agent A: offload 50KB of context, hand off a URL
url = hub.push(large_json_string)
print(url) # https://...workers.dev/pull/3f7a2b...
# Agent B: retrieve the full payload in one call
data = hub.pull(url)
Five lines. No infrastructure. No token waste.
Self-Hosting the Edge API
ContextRelay is fully self-hostable. You own your data.
-
Clone & deploy in 3 commands:
git clone https://github.com/cmhashim/contextrelay cd contextrelay/api && npm install wrangler deploy
-
Create your KV namespace (Cloudflare stores the payloads):
wrangler kv namespace create CONTEXT_KV # Copy the returned ID into wrangler.toml
-
Set your worker URL in the SDK or MCP server:
export CONTEXTRELAY_URL="https://contextrelay.your-account.workers.dev"
Free Cloudflare tier covers 100,000 Worker requests/day and 1GB KV storage.
Architecture
Your Agent
โ
โ POST /push (payload)
โผ
Cloudflare Worker โโโ globally distributed, <1ms cold start
โ
โ KV.put(uuid, payload, ttl=86400)
โผ
Cloudflare KV โโโ edge-replicated, 24hr TTL
โ
โ returns { url: "https://.../pull/<uuid>" }
โผ
Your Agent โโโ passes URL to next agent (80 chars, ~0 tokens)
โ
โ GET /pull/<uuid>
โผ
Cloudflare Worker โ KV.get(uuid) โ payload
Benchmarks (live Cloudflare deployment):
| Operation | Payload | Latency |
|---|---|---|
| push | 125 KB | ~250ms |
| pull | 125 KB | ~75ms |
| pull | 220 KB | ~100ms |
Pub/Sub Signaling (no polling)
Multi-agent orchestration usually means Agent B polling, or your framework wiring up a callback by hand. ContextRelay ships a WebSocket signaling layer so Agent B can subscribe to a channel and get the pointer URL the millisecond Agent A pushes it.
from contextrelay import ContextRelay
import threading
hub = ContextRelay("https://contextrelay.your-account.workers.dev")
# --- Agent B, in a background thread ---
def on_context_ready(url):
payload = hub.pull(url)
print(f"Agent B got {len(payload)} chars: {payload[:80]}...")
threading.Thread(
target=hub.subscribe, args=("project_x", on_context_ready), daemon=True
).start()
# --- Agent A, some time later ---
hub.push(huge_spec, channel="project_x")
# Agent B's callback fires within ~20 ms of the push returning.
Under the hood: each channel is a Cloudflare Durable Object using Hibernatable WebSockets. Idle channels pay zero CPU/memory; fan-out is in-memory on the same DO instance. The Python SDK auto-reconnects with exponential backoff and 30-second ping keepalive โ drops are transparent.
Metadata & Peek (decide before you download)
Before pulling a 200 KB payload into a context window, an agent should be able to peek at what it is. ContextRelay lets the producer attach a small plaintext metadata header on push, and any agent can read it with one lightweight call:
url = hub.push(
big_payload,
metadata={"summary": "Database schema for orders service",
"size_kb": 80, "type": "sql"},
)
# Agent B, on receiving url:
hub.peek(url)
# โ {"summary": "Database schema for orders service", "size_kb": 80, "type": "sql"}
# Agent decides to pull only if it matches the task.
hub.pull(url)
Route: GET /peek/:id โ returns only the metadata object (not the
payload). Server-side JSON parse means the heavy data field never
touches the wire. Works even when the payload is encrypted โ metadata
stays plaintext by design, so peek needs no key.
The MCP server exposes peek_context(url) with guidance to LLM clients
to call it first on any ContextRelay URL, so agents stop burning
tokens on pulls they didn't need.
End-to-End Encryption (opt-in)
Passing secrets through a third-party edge โ API keys, PII, proprietary code โ means trusting that edge. ContextRelay's opt-in E2EE removes the trust assumption. Encryption runs entirely client-side; Cloudflare sees only ciphertext.
hub = ContextRelay("https://contextrelay.your-account.workers.dev")
url = hub.push(secret_payload, encrypted=True)
# url โ https://.../pull/<uuid>#key=<fernet_key>
plaintext = hub.pull(url) # โ decrypted locally
How the key stays private:
Per RFC 3986, URL fragments (everything after #) are never transmitted
to the server by HTTP clients. So when the SDK calls GET /pull/<uuid>,
the #key=... portion is stripped locally and never leaves your machine.
The Worker stores โ and only ever sees โ opaque Fernet ciphertext
(gAAAAA...).
- Cipher: Fernet (AES-128-CBC + HMAC-SHA256)
- Key: fresh 256-bit URL-safe base64 key per upload
- Errors: a wrong/missing key raises
ValueError("Failed to decrypt: Invalid or missing key")โ never a partial or corrupt payload
Roadmap
- Phase 1 โ Core edge API + Python SDK + MCP server
- Phase 2 โ WebSocket pub/sub (agents subscribe to context-ready events)
- Phase 3 โ E2EE (Fernet, key in URL fragment โ server never sees plaintext)
- Phase 4 โ Metadata & peek (decide before you pull)
- Phase 5 โ LangChain / CrewAI / AutoGen native integrations
License
MIT โ see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file contextrelay-0.2.0.tar.gz.
File metadata
- Download URL: contextrelay-0.2.0.tar.gz
- Upload date:
- Size: 18.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
223a9cd7fe25e6579cc5430d62b8f6e26450a8c438f239f2e29786f3344f5a18
|
|
| MD5 |
aa5d9d0f8054450adc56581554988f83
|
|
| BLAKE2b-256 |
10362712c88fa6493ab235055fd7f8b8fd1b8aeb71a7da34aab1f818e9a3208f
|
File details
Details for the file contextrelay-0.2.0-py3-none-any.whl.
File metadata
- Download URL: contextrelay-0.2.0-py3-none-any.whl
- Upload date:
- Size: 22.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f10c4f96ef01d75e2832c12b3bf49a275b7d80757c8463a89c8213a88d6862cf
|
|
| MD5 |
698370a90251b1699f8980e0f4a7d518
|
|
| BLAKE2b-256 |
49dfdfd400c6268e36a6fc3ddb970e3da0a3d362ae973289a995a058114a0a66
|