Skip to main content

Governance proxy for every AI tool — semantic cache, token quotas, vendor routing, Chrome extension

Project description

AgentMesh

The governance proxy for every AI tool your team uses.

"Istio for AI — intercept, cache, and govern every LLM call across Claude Code, VS Code Copilot, ChatGPT, Gemini, and your own agents. One proxy, one policy, one bill."

CI PyPI version PyPI Downloads License: Apache 2.0 Python 3.10+


AgentMesh demo — 85% cache hit rate, 75% cost reduction


What it does

AgentMesh sits between your engineers and every LLM API. It enforces token budgets, semantically caches repeated prompts, and routes calls to the cheapest capable model — without touching a single line of agent code.

Claude Code / VS Code Copilot / Cursor
ChatGPT web / Claude.ai / Gemini web          ──►  AgentMesh Proxy  ──►  Anthropic
Your LangGraph / CrewAI / AutoGen agents                                   OpenAI
                                                                           Google

It catches everything — not just the agents you wrote, but also the AI tools your engineers use every day in their browsers.


Benchmark — real numbers, demo mode, no API keys needed

pip install agentmesh-proxy-proxy sentence-transformers
python examples/benchmark.py

20 requests across 5 topic clusters, each cluster with 4 phrasings (persona prefix, markdown, British spelling, plain):

Total requests          20
Exact cache hits         2  (10%)
Semantic cache hits     15  (75%)
Total misses             3  (15%)

Cost WITHOUT AgentMesh  $0.0030  ($3/M token baseline)
Cost WITH AgentMesh     $0.0008
Savings                 $0.0023  (75%)
Effective cost/request  $0.00004

85% of requests never reached the LLM. The 3 misses are the cold-start first call per cluster.


The problem it solves

  • Uber burned through their entire 2026 AI budget in 4 months
  • Amazon shut down an internal AI leaderboard because engineers ran pointless loops to inflate scores ("tokenmaxxing")
  • A single recursive agent loop, undetected, can generate a $47,000 API bill
  • Only 38% of enterprises have end-to-end AI cost monitoring (Cloud Security Alliance, 2026)

The root cause: every AI tool — Claude Code, GitHub Copilot, ChatGPT, your custom agents — talks to LLM APIs independently, with no shared governance layer. AgentMesh is that layer.


Three ways to use it

1. Proxy mode — zero code changes, covers everything

pip install agentmesh-proxy
agentmesh serve --port 8080 --demo

Point any tool at localhost:8080:

# Claude Code
export ANTHROPIC_BASE_URL=http://localhost:8080

# VS Code Copilot / Cursor / any OpenAI SDK tool
export OPENAI_BASE_URL=http://localhost:8080/v1

# curl test
curl http://localhost:8080/v1/messages \
  -H "x-api-key: any" \
  -H "X-AgentMesh-Team: engineering" \
  -d '{"model":"claude-haiku-4-5","max_tokens":512,"messages":[{"role":"user","content":"Review this code..."}]}'

Every response includes governance headers:

X-AgentMesh-Cache:     hit          # exact | semantic | miss
X-AgentMesh-Tokens:    0            # 0 on cache hit
X-AgentMesh-Cost-USD:  0.000000     # $0 on cache hit
X-AgentMesh-Quota-Pct: 23%          # team budget consumed
X-AgentMesh-Vendor:    anthropic
X-AgentMesh-Model:     claude-haiku-4-5

2. Chrome Extension — governance for ChatGPT, Claude.ai, Gemini

The extension intercepts prompts typed into web AI tools before they hit the LLM. It shows a governance overlay on every submission, checks the semantic cache, and displays per-session savings in a popup.

Load the extension:

  1. Clone this repo: git clone https://github.com/anilatambharii/agentmesh
  2. Generate icons: cd agentmesh-extension && python generate_icons.py
  3. Open chrome://extensions → Enable Developer Mode → Load Unpacked → select agentmesh-extension/
  4. Click the AgentMesh popup → set Port to match your running proxy

What it catches:

  • chat.openai.com / chatgpt.com — content script intercepts input box
  • claude.ai — content script intercepts input box
  • gemini.google.com — content script intercepts input box
  • api.anthropic.com / api.openai.com — declarativeNetRequest silently redirects API calls

How the overlay works:

  ┌─────────────────────────────────────────┐
  │  AgentMesh Governance                   │
  │                                         │
  │  [●] Cache HIT — saved 847 tokens       │
  │  Team: engineering   Quota: 23%         │
  │                                         │
  │  [Send original]  [Cancel]              │
  └─────────────────────────────────────────┘

Popup stats (persist across Chrome restarts):

AgentMesh Connected
3  Prompts
2  Cache Hits
87 Tokens Saved
$0.002 Cost Saved

3. SDK mode — wrap existing agents

from agentmesh import AgentMesh
from agentmesh.policy.engine import Policy

policy = Policy.from_yaml("""
policies:
  - name: engineering-team
    budget:
      daily_tokens: 1_000_000
      monthly_usd: 3_000
      hard_stop: true
    circuit_breaker:
      max_iterations: 25
    compliance:
      frameworks: [eu-ai-act, soc2]
""")

mesh = AgentMesh(policy=policy)

# Zero changes to agent code
governed_graph = mesh.wrap_langgraph(your_langgraph_graph)
governed_crew  = mesh.wrap_crewai(your_crew)
governed_agent = mesh.wrap_openai_agent(your_openai_agent)

Three-layer cache

Every prompt passes through three cache layers before touching an LLM:

Layer 1 — Exact match      SHA-256 of normalised prompt    → 0 tokens, instant
Layer 2 — Semantic match   sentence-transformers cosine    → 0 tokens, ~5ms
Layer 3 — Vendor cache     Anthropic cache_control         → 10% of input cost

Layer 2 catches prompts that mean the same thing but are worded differently:

Original Rephrased Similarity Result
Review this microservices design... You are a senior architect. Review... 0.99 HIT — persona stripped
Review this microservices design... Analyse this distributed system... 0.85 HIT — British spelling normalised
Review this microservices design... **Review** this \microservices` design...` 0.97 HIT — markdown stripped
Review this microservices design... Review this distributed system: orders calls payments via REST... 0.70 HIT — semantic match
Review this microservices design... Write a Fibonacci function in Python -0.05 MISS — correctly different

Normalisation pipeline (applied before hashing and embedding):

  1. Persona prefix strip — "You are a senior SWE." removed
  2. Filler word strip — "Please can you" removed
  3. Markdown strip — **bold**, # headers, `code` removed
  4. Date normalisation — "June 13 2026""2026-06-13"
  5. Number normalisation — "1,000,000""1000000"
  6. British→American spelling — "optimise""optimize"
  7. Code argument canonicalisation — login(user, pwd)login(username, password)
  8. Lowercase + whitespace collapse

Token quota governance

Per-team, per-user, per-tool limits with pre-call blocking and real-time observability.

# Start proxy with team limits
agentmesh serve --port 8080 \
  --team-limit engineering=2000000 \
  --team-limit sales=500000 \
  --warn-at 80 \
  --hard-stop-at 100
# Request from a team at 85% quota
X-AgentMesh-Quota-Pct:  85%
X-AgentMesh-Quota-Warn: Quota warning: team 'engineering' at 85% (300,000 tokens remaining)

# Request from a team at 102% quota → 429
HTTP 429
{"error": {"type": "quota_exceeded", "message": "Quota exceeded: team 'engineering' used 2,040,000/2,000,000 tokens"}}

New in this release:

  • Pre-call blocking — blocked before the LLM call using estimated token count, not after
  • Global vs team conflict resolution — all quota dimensions checked; most restrictive wins
  • Temp grant expiry — emergency escalation grants expire after 24h (configurable)

Architecture

Engineers                    Business users
──────────────────────────   ──────────────────────────────────────
Claude Code (terminal)       ChatGPT web  ──► Chrome Extension
VS Code Copilot (IDE)        Claude.ai    ──► Chrome Extension
Cursor (IDE)                 Gemini web   ──► Chrome Extension
Your agents (LangGraph etc.) ──────────────────────────────────────
         │                              │
         │  ANTHROPIC_BASE_URL          │  declarativeNetRequest
         │  = http://localhost:8080     │  api.anthropic.com ──► localhost:8080
         │                              │  api.openai.com   ──► localhost:8080
         └──────────────┬───────────────┘
                        │
              ┌─────────▼──────────┐
              │   AgentMesh Proxy  │
              │                    │
              │  1. Exact cache    │   SHA-256 → 0 tokens
              │  2. Quota check    │   pre-call estimation
              │  3. Compression    │   budget < 30%
              │  4. Dry-run gate   │   require_approval mode
              │  5. Vendor route   │   cheapest capable model
              │  6. Audit log      │   Ed25519 tamper-evident
              │  7. LLM call       │   Anthropic cache_control
              │  8. Cache store    │   semantic + exact
              │  9. Cost calc      │   per-team attribution
              └─────────┬──────────┘
                        │
          ┌─────────────┼──────────────┐
          ▼             ▼              ▼
     Anthropic       OpenAI         Google
     (Haiku/Sonnet)  (GPT-4o-mini)  (Gemini Flash)

Observability dashboard

agentmesh observe --port 7861   # SSE event stream

Or start everything together:

agentmesh serve --port 8080 --demo --observe
# Opens: http://localhost:7860  (Gradio dashboard)
#        http://localhost:7861  (SSE stream)
#        http://localhost:8080  (proxy)

Events streamed in real time:

{"kind": "cache_hit",   "team": "engineering", "tokens_saved": 847}
{"kind": "cache_miss",  "team": "engineering", "model": "claude-haiku-4-5"}
{"kind": "quota_warn",  "team": "engineering", "quota_pct": 0.85}
{"kind": "quota_block", "team": "sales",       "quota_pct": 1.02}
{"kind": "llm_call",    "vendor": "anthropic", "tokens": 1234, "cost_usd": 0.000185}

Quickstart (60 seconds)

# 1. Install
pip install agentmesh-proxy sentence-transformers

# 2. Start proxy in demo mode (no API keys needed)
agentmesh serve --port 8080 --demo

# 3. Point Claude Code at it
export ANTHROPIC_BASE_URL=http://localhost:8080

# 4. Run the benchmark
python examples/benchmark.py
# → 85% cache hit rate, 75% cost reduction on 20 requests

# 5. Run the full test suite
python examples/test_extension_e2e.py
# → 13/13 PASS

Framework support

Framework Status
LangGraph Full support
CrewAI Full support
OpenAI Agents SDK Full support
AutoGen v2 / AG2 Full support
Pydantic AI Full support
Haystack 2.x Full support
Google ADK Full support
NVIDIA NIM Full support
Raw anthropic / openai SDK Full support
Chrome extension (ChatGPT, Claude.ai, Gemini) Full support
Microsoft Semantic Kernel In progress (v0.3)

What's new (June 2026)

  • Chrome Extension — governance overlay for ChatGPT, Claude.ai, Gemini web
  • sentence-transformers semantic cache — 384-dim embeddings replace character bigrams; catches paraphrased prompts at 0.70 cosine threshold
  • Anthropic prompt cachingcache_control: ephemeral wired into every system prompt (10% of normal input cost on cached reads)
  • Streaming cache — streamed responses now accumulated and cached after completion
  • Pre-call quota blocking — blocked before the LLM call using token estimation
  • Normalisation pipeline — markdown, dates, British spelling, persona prefixes all stripped before cache key generation
  • Stats persistence — Chrome extension stats survive service worker restarts

Roadmap

  • Redis cache backend (shared across proxy instances)
  • VS Code extension (native IDE panel)
  • SAML/SSO identity propagation for enterprise quota
  • Slack/Teams bot intercept
  • OpenTelemetry trace export
  • Per-prompt cost alerts (Slack/PagerDuty webhook)

Contributing

See CONTRIBUTING.md. PRs welcome — especially Redis backend, VS Code extension, and additional vendor support.


License

Apache 2.0 — see LICENSE.


Built by Anil Prasad — open to feedback, collabs, and conversations about enterprise AI governance.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentmesh_proxy-0.2.1.tar.gz (975.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentmesh_proxy-0.2.1-py3-none-any.whl (99.2 kB view details)

Uploaded Python 3

File details

Details for the file agentmesh_proxy-0.2.1.tar.gz.

File metadata

  • Download URL: agentmesh_proxy-0.2.1.tar.gz
  • Upload date:
  • Size: 975.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for agentmesh_proxy-0.2.1.tar.gz
Algorithm Hash digest
SHA256 f0d9d590a94ddc065327c6aeef930d1b65a6bc0d247eba3d97c1c4a79fa1059d
MD5 9af8a2764252f13f1feb6ad93801f37e
BLAKE2b-256 90823be39850b6f513edf87a5c8d83535dc727276fa53e293980895859161bee

See more details on using hashes here.

File details

Details for the file agentmesh_proxy-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for agentmesh_proxy-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ea161b7f11cf6af3191991b13430202128af02aa71d45814435586db9b9c88a8
MD5 2ca57757549b21048e0f7ce46fce5ef3
BLAKE2b-256 9808866b61c145e4009a4aa64dd04b4ac03b8b28037d3cb3b10dc276c4848974

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page