Skip to main content

Zero-friction shared memory for multi-agent AI systems. Pass URL pointers instead of token-expensive blobs.

Project description

ContextRelay ๐Ÿ”— The Zero-Friction S3 for Agentic Memory

PyPI version License: MIT Python 3.9+

Pass a URL. Not a token wall. ContextRelay stores massive AI context payloads at the Cloudflare edge and gives you back a single URL. Agents exchange the pointer โ€” not the data.


The Problem

Multi-agent AI pipelines have a dirty secret: they burn most of their token budget passing data around, not thinking.

When Agent A (Claude) finishes building a 50,000-token architecture spec and needs to hand it to Agent B (Mistral), your orchestrator has two options โ€” and both are terrible:

Option Cost
Pass the full text in the next prompt 50,000 tokens ร— $0.003/1K = $0.15 per handoff
Truncate it You lose context. Agent B works blind.

At scale โ€” hundreds of agents, thousands of handoffs per day โ€” you are paying a token tax on data transit, not intelligence. This is waste, not compute.


The Solution: Token Cost Arbitrage

ContextRelay replaces token-expensive data blobs with sub-100ms URL pointers.

Without ContextRelay:        With ContextRelay:
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€      โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Agent A โ†’ [50KB JSON]      Agent A โ†’ POST /push โ†’ [UUID url]
           โ†“                                          โ†“
        Agent B            Agent B โ†’ GET /pull/<id> โ†’ [50KB JSON]
        (50K tokens burned)          (73ms, ~0 tokens)

The math: A 50KB context payload costs ~12,500 tokens to pass directly. Via ContextRelay, the pointer URL is ~80 characters โ€” effectively zero tokens. At 1,000 agent handoffs/day, that's ~$150/day saved.

ContextRelay runs on Cloudflare Workers โ€” globally distributed V8 isolates with sub-millisecond cold starts. Your context lives at the edge, milliseconds from wherever your agents are running.


Quickstart โ€” MCP Users (Claude Desktop, Cursor)

Install the server:

pip install contextrelay-mcp

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "contextrelay": {
      "command": "contextrelay-mcp",
      "env": {
        "CONTEXTRELAY_URL": "https://contextrelay.your-account.workers.dev"
      }
    }
  }
}

Restart Claude Desktop. You now have two native tools:

  • push_context โ€” offload any large payload, get back a URL
  • pull_context โ€” retrieve any payload from a workers.dev/pull/ URL

Claude will call these automatically when handling large context handoffs.


Quickstart โ€” Python SDK

pip install contextrelay-mcp
from contextrelay import ContextRelay

hub = ContextRelay("https://contextrelay.your-account.workers.dev")

# Agent A: offload 50KB of context, hand off a URL
url = hub.push(large_json_string)
print(url)  # https://...workers.dev/pull/3f7a2b...

# Agent B: retrieve the full payload in one call
data = hub.pull(url)

Five lines. No infrastructure. No token waste.


Self-Hosting the Edge API

ContextRelay is fully self-hostable. You own your data.

  • Clone & deploy in 3 commands:

    git clone https://github.com/cmhashim/contextrelay
    cd contextrelay/api && npm install
    wrangler deploy
    
  • Create your KV namespace (Cloudflare stores the payloads):

    wrangler kv namespace create CONTEXT_KV
    # Copy the returned ID into wrangler.toml
    
  • Set your worker URL in the SDK or MCP server:

    export CONTEXTRELAY_URL="https://contextrelay.your-account.workers.dev"
    

Free Cloudflare tier covers 100,000 Worker requests/day and 1GB KV storage.


Architecture

Your Agent
    โ”‚
    โ”‚  POST /push (payload)
    โ–ผ
Cloudflare Worker  โ†โ”€โ”€ globally distributed, <1ms cold start
    โ”‚
    โ”‚  KV.put(uuid, payload, ttl=86400)
    โ–ผ
Cloudflare KV  โ†โ”€โ”€ edge-replicated, 24hr TTL
    โ”‚
    โ”‚  returns { url: "https://.../pull/<uuid>" }
    โ–ผ
Your Agent  โ”€โ”€โ†’  passes URL to next agent (80 chars, ~0 tokens)
                       โ”‚
                       โ”‚  GET /pull/<uuid>
                       โ–ผ
               Cloudflare Worker โ†’ KV.get(uuid) โ†’ payload

Benchmarks (live Cloudflare deployment):

Operation Payload Latency
push 125 KB ~250ms
pull 125 KB ~75ms
pull 220 KB ~100ms

Pub/Sub Signaling (no polling)

Multi-agent orchestration usually means Agent B polling, or your framework wiring up a callback by hand. ContextRelay ships a WebSocket signaling layer so Agent B can subscribe to a channel and get the pointer URL the millisecond Agent A pushes it.

from contextrelay import ContextRelay
import threading

hub = ContextRelay("https://contextrelay.your-account.workers.dev")

# --- Agent B, in a background thread ---
def on_context_ready(url):
    payload = hub.pull(url)
    print(f"Agent B got {len(payload)} chars: {payload[:80]}...")

threading.Thread(
    target=hub.subscribe, args=("project_x", on_context_ready), daemon=True
).start()

# --- Agent A, some time later ---
hub.push(huge_spec, channel="project_x")
# Agent B's callback fires within ~20 ms of the push returning.

Under the hood: each channel is a Cloudflare Durable Object using Hibernatable WebSockets. Idle channels pay zero CPU/memory; fan-out is in-memory on the same DO instance. The Python SDK auto-reconnects with exponential backoff and 30-second ping keepalive โ€” drops are transparent.


Metadata & Peek (decide before you download)

Before pulling a 200 KB payload into a context window, an agent should be able to peek at what it is. ContextRelay lets the producer attach a small plaintext metadata header on push, and any agent can read it with one lightweight call:

url = hub.push(
    big_payload,
    metadata={"summary": "Database schema for orders service",
              "size_kb": 80, "type": "sql"},
)

# Agent B, on receiving url:
hub.peek(url)
# โ†’ {"summary": "Database schema for orders service", "size_kb": 80, "type": "sql"}

# Agent decides to pull only if it matches the task.
hub.pull(url)

Route: GET /peek/:id โ€” returns only the metadata object (not the payload). Server-side JSON parse means the heavy data field never touches the wire. Works even when the payload is encrypted โ€” metadata stays plaintext by design, so peek needs no key.

The MCP server exposes peek_context(url) with guidance to LLM clients to call it first on any ContextRelay URL, so agents stop burning tokens on pulls they didn't need.


End-to-End Encryption (opt-in)

Passing secrets through a third-party edge โ€” API keys, PII, proprietary code โ€” means trusting that edge. ContextRelay's opt-in E2EE removes the trust assumption. Encryption runs entirely client-side; Cloudflare sees only ciphertext.

hub = ContextRelay("https://contextrelay.your-account.workers.dev")

url = hub.push(secret_payload, encrypted=True)
# url โ†’ https://.../pull/<uuid>#key=<fernet_key>

plaintext = hub.pull(url)   # โ†’ decrypted locally

How the key stays private:

Per RFC 3986, URL fragments (everything after #) are never transmitted to the server by HTTP clients. So when the SDK calls GET /pull/<uuid>, the #key=... portion is stripped locally and never leaves your machine. The Worker stores โ€” and only ever sees โ€” opaque Fernet ciphertext (gAAAAA...).

  • Cipher: Fernet (AES-128-CBC + HMAC-SHA256)
  • Key: fresh 256-bit URL-safe base64 key per upload
  • Errors: a wrong/missing key raises ValueError("Failed to decrypt: Invalid or missing key") โ€” never a partial or corrupt payload

Roadmap

  • Phase 1 โ€” Core edge API + Python SDK + MCP server
  • Phase 2 โ€” WebSocket pub/sub (agents subscribe to context-ready events)
  • Phase 3 โ€” E2EE (Fernet, key in URL fragment โ€” server never sees plaintext)
  • Phase 4 โ€” Metadata & peek (decide before you pull)
  • Phase 5 โ€” LangChain / CrewAI / AutoGen native integrations

License

MIT โ€” see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contextrelay-0.2.0.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contextrelay-0.2.0-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file contextrelay-0.2.0.tar.gz.

File metadata

  • Download URL: contextrelay-0.2.0.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for contextrelay-0.2.0.tar.gz
Algorithm Hash digest
SHA256 223a9cd7fe25e6579cc5430d62b8f6e26450a8c438f239f2e29786f3344f5a18
MD5 aa5d9d0f8054450adc56581554988f83
BLAKE2b-256 10362712c88fa6493ab235055fd7f8b8fd1b8aeb71a7da34aab1f818e9a3208f

See more details on using hashes here.

File details

Details for the file contextrelay-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: contextrelay-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 22.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for contextrelay-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f10c4f96ef01d75e2832c12b3bf49a275b7d80757c8463a89c8213a88d6862cf
MD5 698370a90251b1699f8980e0f4a7d518
BLAKE2b-256 49dfdfd400c6268e36a6fc3ddb970e3da0a3d362ae973289a995a058114a0a66

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page