LLM API cost interceptor and budget enforcer for AI agents
Project description
agentcents
LLM API cost tracking proxy and budget enforcement for AI agents.
Drop agentcents between your agent and any LLM provider. It tracks every call, enforces budgets, caches responses, and tells you exactly where your money is going — across cloud APIs and local models.
Your Agent → agentcents proxy (localhost:8082) → OpenAI / Anthropic / Ollama
No code changes required. Just point your LLM client at the proxy.
Install
pip install agentcents
Pro features require a license key from labhamfounder.gumroad.com/l/agentcents-pro.
What to expect
Zero configuration to get started. Install, start the proxy, point your LLM client at it — that's it.
Step 1 — pip install agentcents (one time)
Step 2 — start the proxy (once per session)
Step 3 — point your LLM client at it (one header change)
Step 4 — agentcents usage (see your costs)
No API keys, no accounts, no signup required for the free tier.
Configuration is optional — only add ~/.agentcents.toml when you want:
| You want... | What to add |
|---|---|
| Hard budget limits | [budgets] daily = 5.00 |
| Routing warnings when budget runs low | [routing] threshold_pct = 80 |
| Track local Ollama power costs | [local] gpu_watts = 40 |
| Separate costs per agent | X-Agentcents-Tag header on each call |
Pricing data syncs automatically on proxy startup — you never need to run agentcents sync manually unless you want to force a refresh after a provider announces new models.
Pro license — activate once per machine:
agentcents activate <your-key>
Pro features are then available immediately. No restart needed.
Quick Start
1. Start the proxy
uvicorn agentcents.proxy:app --port 8082
2. Point your LLM client at the proxy
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8082",
default_headers={"X-Agentcents-Target": "https://api.anthropic.com"},
)
3. Check your costs
agentcents usage
agentcents recent
That's it. Every call is now tracked.
Configuration
Create ~/.agentcents.toml to configure budgets, routing, and local models.
# ~/.agentcents.toml
# ── Budgets ────────────────────────────────────────────────────────────────
[budgets]
daily = 5.00 # hard block at $5/day across all calls
monthly = 50.00 # used by `agentcents rolling` reporting
# Per-tag daily budgets (optional)
[budgets.tags.my-agent]
daily = 1.00
[budgets.tags.research]
daily = 2.00
# ── Auto-routing ───────────────────────────────────────────────────────────
[routing]
mode = "warn" # "warn" — log suggestion only
# "swap" — silently swap model (Pro)
# "off" — disable routing
threshold_pct = 80 # trigger when X% of daily budget is used
skip_tool_use = true # never swap requests that use tools
# ── Local Models (Ollama) ──────────────────────────────────────────────────
[local]
gpu_watts = 40 # your GPU/chip TDP in watts
# M1 Max ≈ 40W, M2 Ultra ≈ 60W, RTX 4090 ≈ 450W
electricity_rate = 0.12 # $/kWh — check your electricity bill
ollama_base_url = "http://localhost:11434"
# ── Advisor ────────────────────────────────────────────────────────────────
[advisor]
min_saving_pct = 20 # only suggest swaps that save ≥ 20%
Budget behavior
| Spend vs budget | Action |
|---|---|
| 0–80% | Normal |
| 80%+ | ⚠ ROUTING WARN logged, X-Agentcents-Suggest header added |
| 100%+ | 429 budget_exceeded returned, call blocked |
Request Headers
Add these headers to your LLM client requests to control agentcents behavior.
| Header | Required | Example | Description |
|---|---|---|---|
X-Agentcents-Target |
Yes | https://api.anthropic.com |
Provider base URL to forward to |
X-Agentcents-Tag |
No | my-agent |
Group calls for cost reporting |
X-Agentcents-Session |
No | agent-run-42 |
Track individual agent sessions |
X-Agentcents-Cache |
No | off |
Disable cache for this request |
X-Agentcents-Cache |
No | exact |
Exact-match cache only, skip semantic |
Examples
# Tag calls by project
client = anthropic.Anthropic(
base_url="http://localhost:8082",
default_headers={
"X-Agentcents-Target": "https://api.anthropic.com",
"X-Agentcents-Tag": "research-agent",
"X-Agentcents-Session": "run-001",
},
)
# Disable cache for a specific call
response = client.messages.create(
model="claude-3-haiku-20240307",
max_tokens=100,
messages=[...],
extra_headers={"X-Agentcents-Cache": "off"},
)
Response headers
| Header | Description |
|---|---|
X-Agentcents-Cache: exact-hit |
Response served from exact-match cache |
X-Agentcents-Cache: semantic-hit |
Response served from semantic cache (Pro) |
X-Agentcents-Suggest: <model> |
Cheaper model suggested (routing warn) |
X-Agentcents-Routed: <model> |
Model was swapped to this (routing swap, Pro) |
Local Models (Ollama)
Route Ollama calls through agentcents to track GPU power costs alongside cloud API costs.
Start Ollama normally:
ollama serve
Point your Ollama client at the proxy:
# Instead of http://localhost:11434
# Use http://localhost:8082/ollama
curl http://localhost:8082/ollama/api/chat -d '{
"model": "llama3:8b",
"stream": false,
"messages": [{"role": "user", "content": "hello"}]
}'
Or use the OpenAI-compatible endpoint:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8082/ollama/v1",
api_key="ollama",
)
Power cost is estimated as:
cost = (inference_seconds / 3600) × gpu_watts × electricity_rate
Configure gpu_watts and electricity_rate in ~/.agentcents.toml.
CLI Reference
agentcents <command> [options]
Cost reporting
agentcents usage # cost summary last 24h
agentcents usage --hours 168 # last 7 days
agentcents usage --tag my-agent # filter by tag
agentcents recent # last 20 individual calls
agentcents recent --n 50 # last 50 calls
agentcents rolling # 30-day rolling spend
agentcents rolling --days 7 # 7-day rolling spend
agentcents agents # per-agent/session breakdown
agentcents agents --hours 48 # last 48h
agentcents local # local vs cloud cost comparison
Live monitoring
agentcents watch # live tail of calls (Pro)
agentcents watch --poll 1 # refresh every 1 second
agentcents dashboard # full TUI dashboard (Pro)
Budget alerts
agentcents alerts # recent budget alerts
agentcents alerts --n 50 # last 50 alerts
Catalog & models
agentcents models # list all models with pricing
agentcents sync # force sync pricing + chains
Intelligence (Pro)
agentcents suggest # model swap suggestions based on usage
agentcents suggest --hours 168 # based on last 7 days
agentcents train # train XGBoost cost predictor
License
agentcents activate <key> # activate Pro license
agentcents deactivate # remove Pro license
agentcents features # show available features
Pro Features
| Feature | Free | Pro |
|---|---|---|
| Proxy + cost logging | ✓ | ✓ |
| Exact-match cache | ✓ | ✓ |
| Budget alerts + hard block | ✓ | ✓ |
| CLI reporting | ✓ | ✓ |
| Web dashboard | ✓ | ✓ |
| Local Ollama tracking | ✓ | ✓ |
| Semantic similarity cache | — | ✓ |
| Multi-agent TUI dashboard | — | ✓ |
| Live watch | — | ✓ |
| Model swap advisor | — | ✓ |
| Auto-routing (swap mode) | — | ✓ |
| XGBoost cost predictor | — | ✓ |
Get Pro at labhamfounder.gumroad.com/l/agentcents-pro.
Supported Providers
Any provider that speaks the OpenAI API format:
| Provider | Target URL |
|---|---|
| Anthropic | https://api.anthropic.com |
| OpenAI | https://api.openai.com |
| Google Gemini | https://generativelanguage.googleapis.com |
| OpenRouter | https://openrouter.ai/api |
| Groq | https://api.groq.com/openai |
| Ollama | via /ollama route (no header needed) |
Sync
agentcents keeps two files updated in ~/.agentcents/:
| File | Contents | Source |
|---|---|---|
models.json |
Model pricing ($/M tokens) | OpenRouter + LiteLLM |
chains.json |
Downgrade chains for routing | labham.com |
These update in two ways:
- Proxy startup — if files are older than 24h, proxy fetches fresh data automatically when you run
uvicorn agentcents.proxy:app - Manual — run
agentcents syncany time to force an update
agentcents sync
# Syncing pricing catalog...
# Chains updated to v1.0.1
# Done.
Why this matters: Anthropic and OpenAI release new models frequently. Without syncing, agentcents may not recognize new model IDs or have accurate pricing. Run agentcents sync after any major provider announcement.
If sync fails (no internet, server down), agentcents falls back to the bundled data/chains.json and data/fallback.json that shipped with the package.
Architecture
~/.agentcents.toml — budgets, routing, local config
~/.agentcents/models.json — pricing catalog (auto-updated)
~/.agentcents/chains.json — downgrade chains (auto-updated)
~/.agentcents/ledger.db — all call records (SQLite)
The proxy runs entirely locally. No call data leaves your machine.
Pricing data syncs from OpenRouter and LiteLLM APIs.
License validation calls agentcents-license.labham.workers.dev.
License
Copyright (c) 2026 Labham LLC. All rights reserved. Licensed under the Labham Commercial License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentcents-0.1.6-py3-none-any.whl.
File metadata
- Download URL: agentcents-0.1.6-py3-none-any.whl
- Upload date:
- Size: 46.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4a1ab9399c9d146742de7ce8dd43823e57c16e087a0fdbc9f93cc094e9c5929
|
|
| MD5 |
9670d6c233e2f22e05bfaa6d85ea9fbe
|
|
| BLAKE2b-256 |
c2eab3920e5906cd2fd00dcbda0526eeee27eb0792d225b7e3561d22170dc2e0
|