Provider-agnostic hard dollar budgets + loop detection for AI agent runs. Stops the spend before it happens.
Project description
agentcap
Hard dollar budgets + zombie-loop detection for AI agent runs. A provider-agnostic SDK that stops the spend before it happens — not a dashboard that shows you the bill after a flaky tool burned $400 overnight.
agentcap wraps your OpenAI-compatible client in one line. Before every model
call it estimates the cost and refuses to exceed a hard cap, and it hashes
repeated call/tool cycles to catch agents stuck in zombie retry loops — tripping
a circuit breaker that kills the run with a typed exception. Local-first, no
account, bring your own keys.
Why
Observability tools (LangSmith, Langfuse, Helicone) tell you a run cost $380
after it happened. Framework limits (max_iterations) count steps, not
dollars, and evaporate the moment you switch frameworks. agentcap is the
missing piece: an in-process, pre-call brake that's provider-agnostic and
dead simple to adopt.
Install
pip install agentcap-guard # from PyPI (import name stays `agentcap`)
Or from source:
pip install -e . # core, zero required deps
pip install -e ".[extras]" # + tiktoken (accurate token counts) + rich
Note: the PyPI distribution name is
agentcap-guard(the bareagentcapname was already taken), but the import name and CLI command are stillagentcap—pip install agentcap-guardthenimport agentcap.
Python 3.11+.
Quick start
Wrap any OpenAI-compatible client:
from openai import OpenAI
from agentcap import guard
client = guard(OpenAI(), budget=5.00) # hard $5 cap for this run
# use it exactly like the normal client — reads pass through untouched
client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "hello"}],
)
# a call that would push spend over $5 raises BudgetExceeded *before* sending.
# the same model/messages repeated past the loop threshold raises LoopDetected.
Works with OpenAI, Azure OpenAI, DeepSeek, Alibaba Qwen (DashScope), and any
OpenAI-compatible endpoint — agentcap only needs
client.chat.completions.create.
Or wrap a whole run (for frameworks / custom loops that aren't a plain client):
import agentcap
with agentcap.budget(2.50) as cap:
while not done:
cap.check(model="claude-opus-4-1", prompt_text=prompt, # pre-call brake
step_signature={"model": m, "messages": msgs}) # loop guard
usage = call_model(prompt)
cap.record_usage("claude-opus-4-1",
usage.prompt_tokens, usage.completion_tokens)
See it stop a runaway agent (no API key needed)
python examples/demo_budget_stop.py
A flaky-tool agent that would loop forever on expensive calls, stopped hard at a $0.50 cap. Then inspect what happened:
agentcap runs # recent runs: spend, budget, outcome
agentcap show <run_id> # full event timeline, with the call that tripped it
Run it against your real Qwen/OpenAI key:
set DASHSCOPE_API_KEY=sk-... # your Qwen key
python examples/demo_real_qwen.py
Use it in an autonomous agent loop
The whole point: when you let an agent run itself in a while loop, a hard
budget + loop guard makes "burned $400 overnight" and "stuck retrying the same
tool forever" structurally impossible — one cap.check(...) per step:
import agentcap
with agentcap.budget(5.00) as cap: # hard $5 ceiling for this run
while not done:
cap.check(model="gpt-4o", # pre-call brake: raises BudgetExceeded
prompt_text=prompt, # before sending if it would exceed,
step_signature=step) # and LoopDetected on a zombie loop
resp = call_model(prompt)
cap.record_usage("gpt-4o",
resp.usage.prompt_tokens,
resp.usage.completion_tokens)
See the full, runnable demo (no API key needed) showing both a budget stop and a zombie-loop stop:
python examples/demo_agent_loop.py
CLI
| Command | What it does |
|---|---|
agentcap runs |
list recent runs with cost, budget, outcome |
agentcap show <run_id> |
full event timeline for one run |
agentcap pricing |
show the active pricing table |
agentcap doctor |
sanity-check: tokenizer, pricing coverage for used models |
How it works
- Pre-call cost estimation. Prompt tokens (via
tiktokenif installed, else a conservative char-based estimate) + a conservative max-output assumption × a per-model price (pricing.toml). Ifspent + estimated > budget, the call is aborted before sending. - Reconciliation. After each call, real token usage from the response updates the running total — so the budget is enforced against truth, not just estimates.
- Loop detection. Each step's
(model, messages)/(tool, args)signature is hashed; if the same hash repeats past the threshold within a rolling window, that's a zombie loop. An allowlist exempts legitimately-repeating calls (polling, pagination). - Circuit breaker. On budget-exceed or loop-detected, the breaker latches:
a typed exception (
BudgetExceeded/LoopDetected) is raised and all further calls through the wrapped client re-raise it. One hard stop. - Audit log. Append-only JSONL (one line per event) is the source of truth;
a SQLite roll-up powers
agentcap runs. State lives in.agentcap/(override withAGENTCAP_HOME).
Pricing table
pricing.toml is data, not code — edit it, or point agentcap at your own
copy with AGENTCAP_PRICING=/path/to/pricing.toml. Unknown models fall back to
a deliberately high default so costs over-estimate rather than silently
under-count. agentcap doctor warns when a run used a model missing from the
table.
Status
MVP. Core (budget + loops + breaker + audit + CLI) is implemented and tested
offline. Not yet built: the HTTP sidecar proxy for non-Python stacks, the Anthropic
native wrapper, and the hosted Team tier (shared budgets + Slack alerts). See
../pain-radar/specs/02-agent-budget-guardrails/spec.md for the full plan.
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentcap_guard-0.1.0.tar.gz.
File metadata
- Download URL: agentcap_guard-0.1.0.tar.gz
- Upload date:
- Size: 24.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5954096f1727477182f748e099e3882c90888d9af53030027edc63877987f4e
|
|
| MD5 |
a2d218e89601109f90415b4f5a272488
|
|
| BLAKE2b-256 |
ae6800812e730c3bd7f700483c25fef751019c043fb6360e66063ac2bc160e48
|
File details
Details for the file agentcap_guard-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agentcap_guard-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
108ae44a1ad02ecb8dd8ceda8eb9fd92cc8f0e78b9fe1d02a34e463f949c2a24
|
|
| MD5 |
384af6ae06efdbe4c051b706ecc93bc0
|
|
| BLAKE2b-256 |
8cf00b06ef259aeb36b45f97044ca2381a004881a0cec1d443c6d0958b86af4c
|