Governance proxy for every AI tool — semantic cache, token quotas, vendor routing, Chrome extension
Project description
AgentMesh
The governance proxy for every AI tool your team uses.
"Istio for AI — intercept, cache, and govern every LLM call across Claude Code, VS Code Copilot, ChatGPT, Gemini, and your own agents. One proxy, one policy, one bill."
What it does
AgentMesh sits between your engineers and every LLM API. It enforces token budgets, semantically caches repeated prompts, and routes calls to the cheapest capable model — without touching a single line of agent code.
Claude Code / VS Code Copilot / Cursor
ChatGPT web / Claude.ai / Gemini web ──► AgentMesh Proxy ──► Anthropic
Your LangGraph / CrewAI / AutoGen agents OpenAI
Google
It catches everything — not just the agents you wrote, but also the AI tools your engineers use every day in their browsers.
Benchmark — real numbers, demo mode, no API keys needed
pip install agentmesh-proxy-proxy sentence-transformers
python examples/benchmark.py
20 requests across 5 topic clusters, each cluster with 4 phrasings (persona prefix, markdown, British spelling, plain):
Total requests 20
Exact cache hits 2 (10%)
Semantic cache hits 15 (75%)
Total misses 3 (15%)
Cost WITHOUT AgentMesh $0.0030 ($3/M token baseline)
Cost WITH AgentMesh $0.0008
Savings $0.0023 (75%)
Effective cost/request $0.00004
85% of requests never reached the LLM. The 3 misses are the cold-start first call per cluster.
The problem it solves
- Uber burned through their entire 2026 AI budget in 4 months
- Amazon shut down an internal AI leaderboard because engineers ran pointless loops to inflate scores ("tokenmaxxing")
- A single recursive agent loop, undetected, can generate a $47,000 API bill
- Only 38% of enterprises have end-to-end AI cost monitoring (Cloud Security Alliance, 2026)
The root cause: every AI tool — Claude Code, GitHub Copilot, ChatGPT, your custom agents — talks to LLM APIs independently, with no shared governance layer. AgentMesh is that layer.
Three ways to use it
1. Proxy mode — zero code changes, covers everything
pip install agentmesh-proxy
agentmesh serve --port 8080 --demo
Point any tool at localhost:8080:
# Claude Code
export ANTHROPIC_BASE_URL=http://localhost:8080
# VS Code Copilot / Cursor / any OpenAI SDK tool
export OPENAI_BASE_URL=http://localhost:8080/v1
# curl test
curl http://localhost:8080/v1/messages \
-H "x-api-key: any" \
-H "X-AgentMesh-Team: engineering" \
-d '{"model":"claude-haiku-4-5","max_tokens":512,"messages":[{"role":"user","content":"Review this code..."}]}'
Every response includes governance headers:
X-AgentMesh-Cache: hit # exact | semantic | miss
X-AgentMesh-Tokens: 0 # 0 on cache hit
X-AgentMesh-Cost-USD: 0.000000 # $0 on cache hit
X-AgentMesh-Quota-Pct: 23% # team budget consumed
X-AgentMesh-Vendor: anthropic
X-AgentMesh-Model: claude-haiku-4-5
2. Chrome Extension — governance for ChatGPT, Claude.ai, Gemini
The extension intercepts prompts typed into web AI tools before they hit the LLM. It shows a governance overlay on every submission, checks the semantic cache, and displays per-session savings in a popup.
Load the extension:
- Clone this repo:
git clone https://github.com/anilatambharii/agentmesh - Generate icons:
cd agentmesh-extension && python generate_icons.py - Open
chrome://extensions→ Enable Developer Mode → Load Unpacked → selectagentmesh-extension/ - Click the AgentMesh popup → set Port to match your running proxy
What it catches:
chat.openai.com/chatgpt.com— content script intercepts input boxclaude.ai— content script intercepts input boxgemini.google.com— content script intercepts input boxapi.anthropic.com/api.openai.com— declarativeNetRequest silently redirects API calls
How the overlay works:
┌─────────────────────────────────────────┐
│ AgentMesh Governance │
│ │
│ [●] Cache HIT — saved 847 tokens │
│ Team: engineering Quota: 23% │
│ │
│ [Send original] [Cancel] │
└─────────────────────────────────────────┘
Popup stats (persist across Chrome restarts):
AgentMesh Connected
3 Prompts
2 Cache Hits
87 Tokens Saved
$0.002 Cost Saved
3. SDK mode — wrap existing agents
from agentmesh import AgentMesh
from agentmesh.policy.engine import Policy
policy = Policy.from_yaml("""
policies:
- name: engineering-team
budget:
daily_tokens: 1_000_000
monthly_usd: 3_000
hard_stop: true
circuit_breaker:
max_iterations: 25
compliance:
frameworks: [eu-ai-act, soc2]
""")
mesh = AgentMesh(policy=policy)
# Zero changes to agent code
governed_graph = mesh.wrap_langgraph(your_langgraph_graph)
governed_crew = mesh.wrap_crewai(your_crew)
governed_agent = mesh.wrap_openai_agent(your_openai_agent)
Three-layer cache
Every prompt passes through three cache layers before touching an LLM:
Layer 1 — Exact match SHA-256 of normalised prompt → 0 tokens, instant
Layer 2 — Semantic match sentence-transformers cosine → 0 tokens, ~5ms
Layer 3 — Vendor cache Anthropic cache_control → 10% of input cost
Layer 2 catches prompts that mean the same thing but are worded differently:
| Original | Rephrased | Similarity | Result |
|---|---|---|---|
Review this microservices design... |
You are a senior architect. Review... |
0.99 | HIT — persona stripped |
Review this microservices design... |
Analyse this distributed system... |
0.85 | HIT — British spelling normalised |
Review this microservices design... |
**Review** this \microservices` design...` |
0.97 | HIT — markdown stripped |
Review this microservices design... |
Review this distributed system: orders calls payments via REST... |
0.70 | HIT — semantic match |
Review this microservices design... |
Write a Fibonacci function in Python |
-0.05 | MISS — correctly different |
Normalisation pipeline (applied before hashing and embedding):
- Persona prefix strip —
"You are a senior SWE."removed - Filler word strip —
"Please can you"removed - Markdown strip —
**bold**,# headers,`code`removed - Date normalisation —
"June 13 2026"→"2026-06-13" - Number normalisation —
"1,000,000"→"1000000" - British→American spelling —
"optimise"→"optimize" - Code argument canonicalisation —
login(user, pwd)≡login(username, password) - Lowercase + whitespace collapse
Token quota governance
Per-team, per-user, per-tool limits with pre-call blocking and real-time observability.
# Start proxy with team limits
agentmesh serve --port 8080 \
--team-limit engineering=2000000 \
--team-limit sales=500000 \
--warn-at 80 \
--hard-stop-at 100
# Request from a team at 85% quota
X-AgentMesh-Quota-Pct: 85%
X-AgentMesh-Quota-Warn: Quota warning: team 'engineering' at 85% (300,000 tokens remaining)
# Request from a team at 102% quota → 429
HTTP 429
{"error": {"type": "quota_exceeded", "message": "Quota exceeded: team 'engineering' used 2,040,000/2,000,000 tokens"}}
New in this release:
- Pre-call blocking — blocked before the LLM call using estimated token count, not after
- Global vs team conflict resolution — all quota dimensions checked; most restrictive wins
- Temp grant expiry — emergency escalation grants expire after 24h (configurable)
Architecture
Engineers Business users
────────────────────────── ──────────────────────────────────────
Claude Code (terminal) ChatGPT web ──► Chrome Extension
VS Code Copilot (IDE) Claude.ai ──► Chrome Extension
Cursor (IDE) Gemini web ──► Chrome Extension
Your agents (LangGraph etc.) ──────────────────────────────────────
│ │
│ ANTHROPIC_BASE_URL │ declarativeNetRequest
│ = http://localhost:8080 │ api.anthropic.com ──► localhost:8080
│ │ api.openai.com ──► localhost:8080
└──────────────┬───────────────┘
│
┌─────────▼──────────┐
│ AgentMesh Proxy │
│ │
│ 1. Exact cache │ SHA-256 → 0 tokens
│ 2. Quota check │ pre-call estimation
│ 3. Compression │ budget < 30%
│ 4. Dry-run gate │ require_approval mode
│ 5. Vendor route │ cheapest capable model
│ 6. Audit log │ Ed25519 tamper-evident
│ 7. LLM call │ Anthropic cache_control
│ 8. Cache store │ semantic + exact
│ 9. Cost calc │ per-team attribution
└─────────┬──────────┘
│
┌─────────────┼──────────────┐
▼ ▼ ▼
Anthropic OpenAI Google
(Haiku/Sonnet) (GPT-4o-mini) (Gemini Flash)
Observability dashboard
agentmesh observe --port 7861 # SSE event stream
Or start everything together:
agentmesh serve --port 8080 --demo --observe
# Opens: http://localhost:7860 (Gradio dashboard)
# http://localhost:7861 (SSE stream)
# http://localhost:8080 (proxy)
Events streamed in real time:
{"kind": "cache_hit", "team": "engineering", "tokens_saved": 847}
{"kind": "cache_miss", "team": "engineering", "model": "claude-haiku-4-5"}
{"kind": "quota_warn", "team": "engineering", "quota_pct": 0.85}
{"kind": "quota_block", "team": "sales", "quota_pct": 1.02}
{"kind": "llm_call", "vendor": "anthropic", "tokens": 1234, "cost_usd": 0.000185}
Quickstart (60 seconds)
# 1. Install
pip install agentmesh-proxy sentence-transformers
# 2. Start proxy in demo mode (no API keys needed)
agentmesh serve --port 8080 --demo
# 3. Point Claude Code at it
export ANTHROPIC_BASE_URL=http://localhost:8080
# 4. Run the benchmark
python examples/benchmark.py
# → 85% cache hit rate, 75% cost reduction on 20 requests
# 5. Run the full test suite
python examples/test_extension_e2e.py
# → 13/13 PASS
Framework support
| Framework | Status |
|---|---|
| LangGraph | Full support |
| CrewAI | Full support |
| OpenAI Agents SDK | Full support |
| AutoGen v2 / AG2 | Full support |
| Pydantic AI | Full support |
| Haystack 2.x | Full support |
| Google ADK | Full support |
| NVIDIA NIM | Full support |
Raw anthropic / openai SDK |
Full support |
| Chrome extension (ChatGPT, Claude.ai, Gemini) | Full support |
| Microsoft Semantic Kernel | In progress (v0.3) |
What's new (June 2026)
- Chrome Extension — governance overlay for ChatGPT, Claude.ai, Gemini web
- sentence-transformers semantic cache — 384-dim embeddings replace character bigrams; catches paraphrased prompts at 0.70 cosine threshold
- Anthropic prompt caching —
cache_control: ephemeralwired into every system prompt (10% of normal input cost on cached reads) - Streaming cache — streamed responses now accumulated and cached after completion
- Pre-call quota blocking — blocked before the LLM call using token estimation
- Normalisation pipeline — markdown, dates, British spelling, persona prefixes all stripped before cache key generation
- Stats persistence — Chrome extension stats survive service worker restarts
Roadmap
- Redis cache backend (shared across proxy instances)
- VS Code extension (native IDE panel)
- SAML/SSO identity propagation for enterprise quota
- Slack/Teams bot intercept
- OpenTelemetry trace export
- Per-prompt cost alerts (Slack/PagerDuty webhook)
Contributing
See CONTRIBUTING.md. PRs welcome — especially Redis backend, VS Code extension, and additional vendor support.
License
Apache 2.0 — see LICENSE.
Built by Anil Prasad — open to feedback, collabs, and conversations about enterprise AI governance.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentmesh_proxy-0.2.1.tar.gz.
File metadata
- Download URL: agentmesh_proxy-0.2.1.tar.gz
- Upload date:
- Size: 975.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0d9d590a94ddc065327c6aeef930d1b65a6bc0d247eba3d97c1c4a79fa1059d
|
|
| MD5 |
9af8a2764252f13f1feb6ad93801f37e
|
|
| BLAKE2b-256 |
90823be39850b6f513edf87a5c8d83535dc727276fa53e293980895859161bee
|
File details
Details for the file agentmesh_proxy-0.2.1-py3-none-any.whl.
File metadata
- Download URL: agentmesh_proxy-0.2.1-py3-none-any.whl
- Upload date:
- Size: 99.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea161b7f11cf6af3191991b13430202128af02aa71d45814435586db9b9c88a8
|
|
| MD5 |
2ca57757549b21048e0f7ce46fce5ef3
|
|
| BLAKE2b-256 |
9808866b61c145e4009a4aa64dd04b4ac03b8b28037d3cb3b10dc276c4848974
|