Skip to main content

Provider-agnostic hard dollar budgets + loop detection for AI agent runs. Stops the spend before it happens.

Project description

agentcap

Hard dollar budgets + zombie-loop detection for AI agent runs. A provider-agnostic SDK that stops the spend before it happens — not a dashboard that shows you the bill after a flaky tool burned $400 overnight.

agentcap wraps your OpenAI-compatible client in one line. Before every model call it estimates the cost and refuses to exceed a hard cap, and it hashes repeated call/tool cycles to catch agents stuck in zombie retry loops — tripping a circuit breaker that kills the run with a typed exception. Local-first, no account, bring your own keys.

Why

Observability tools (LangSmith, Langfuse, Helicone) tell you a run cost $380 after it happened. Framework limits (max_iterations) count steps, not dollars, and evaporate the moment you switch frameworks. agentcap is the missing piece: an in-process, pre-call brake that's provider-agnostic and dead simple to adopt.

Install

pip install agentcap-guard              # from PyPI (import name stays `agentcap`)

Or from source:

pip install -e .            # core, zero required deps
pip install -e ".[extras]" # + tiktoken (accurate token counts) + rich

Note: the PyPI distribution name is agentcap-guard (the bare agentcap name was already taken), but the import name and CLI command are still agentcappip install agentcap-guard then import agentcap.

Python 3.11+.

Quick start

Wrap any OpenAI-compatible client:

from openai import OpenAI
from agentcap import guard

client = guard(OpenAI(), budget=5.00)   # hard $5 cap for this run

# use it exactly like the normal client — reads pass through untouched
client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "hello"}],
)
# a call that would push spend over $5 raises BudgetExceeded *before* sending.
# the same model/messages repeated past the loop threshold raises LoopDetected.

Works with OpenAI, Azure OpenAI, DeepSeek, Alibaba Qwen (DashScope), and any OpenAI-compatible endpoint — agentcap only needs client.chat.completions.create.

Or wrap a whole run (for frameworks / custom loops that aren't a plain client):

import agentcap

with agentcap.budget(2.50) as cap:
    while not done:
        cap.check(model="claude-opus-4-1", prompt_text=prompt,   # pre-call brake
                  step_signature={"model": m, "messages": msgs}) # loop guard
        usage = call_model(prompt)
        cap.record_usage("claude-opus-4-1",
                         usage.prompt_tokens, usage.completion_tokens)

See it stop a runaway agent (no API key needed)

python examples/demo_budget_stop.py

A flaky-tool agent that would loop forever on expensive calls, stopped hard at a $0.50 cap. Then inspect what happened:

agentcap runs                # recent runs: spend, budget, outcome
agentcap show <run_id>       # full event timeline, with the call that tripped it

Run it against your real Qwen/OpenAI key:

set DASHSCOPE_API_KEY=sk-...     # your Qwen key
python examples/demo_real_qwen.py

Use it in an autonomous agent loop

The whole point: when you let an agent run itself in a while loop, a hard budget + loop guard makes "burned $400 overnight" and "stuck retrying the same tool forever" structurally impossible — one cap.check(...) per step:

import agentcap

with agentcap.budget(5.00) as cap:          # hard $5 ceiling for this run
    while not done:
        cap.check(model="gpt-4o",            # pre-call brake: raises BudgetExceeded
                  prompt_text=prompt,        #   before sending if it would exceed,
                  step_signature=step)       #   and LoopDetected on a zombie loop
        resp = call_model(prompt)
        cap.record_usage("gpt-4o",
                         resp.usage.prompt_tokens,
                         resp.usage.completion_tokens)

See the full, runnable demo (no API key needed) showing both a budget stop and a zombie-loop stop:

python examples/demo_agent_loop.py

CLI

Command What it does
agentcap runs list recent runs with cost, budget, outcome
agentcap show <run_id> full event timeline for one run
agentcap pricing show the active pricing table
agentcap doctor sanity-check: tokenizer, pricing coverage for used models

How it works

  • Pre-call cost estimation. Prompt tokens (via tiktoken if installed, else a conservative char-based estimate) + a conservative max-output assumption × a per-model price (pricing.toml). If spent + estimated > budget, the call is aborted before sending.
  • Reconciliation. After each call, real token usage from the response updates the running total — so the budget is enforced against truth, not just estimates.
  • Loop detection. Each step's (model, messages) / (tool, args) signature is hashed; if the same hash repeats past the threshold within a rolling window, that's a zombie loop. An allowlist exempts legitimately-repeating calls (polling, pagination).
  • Circuit breaker. On budget-exceed or loop-detected, the breaker latches: a typed exception (BudgetExceeded / LoopDetected) is raised and all further calls through the wrapped client re-raise it. One hard stop.
  • Audit log. Append-only JSONL (one line per event) is the source of truth; a SQLite roll-up powers agentcap runs. State lives in .agentcap/ (override with AGENTCAP_HOME).

Pricing table

pricing.toml is data, not code — edit it, or point agentcap at your own copy with AGENTCAP_PRICING=/path/to/pricing.toml. Unknown models fall back to a deliberately high default so costs over-estimate rather than silently under-count. agentcap doctor warns when a run used a model missing from the table.

Status

MVP. Core (budget + loops + breaker + audit + CLI) is implemented and tested offline. Not yet built: the HTTP sidecar proxy for non-Python stacks, the Anthropic native wrapper, and the hosted Team tier (shared budgets + Slack alerts). See ../pain-radar/specs/02-agent-budget-guardrails/spec.md for the full plan.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentcap_guard-0.1.0.tar.gz (24.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentcap_guard-0.1.0-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file agentcap_guard-0.1.0.tar.gz.

File metadata

  • Download URL: agentcap_guard-0.1.0.tar.gz
  • Upload date:
  • Size: 24.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for agentcap_guard-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a5954096f1727477182f748e099e3882c90888d9af53030027edc63877987f4e
MD5 a2d218e89601109f90415b4f5a272488
BLAKE2b-256 ae6800812e730c3bd7f700483c25fef751019c043fb6360e66063ac2bc160e48

See more details on using hashes here.

File details

Details for the file agentcap_guard-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: agentcap_guard-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for agentcap_guard-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 108ae44a1ad02ecb8dd8ceda8eb9fd92cc8f0e78b9fe1d02a34e463f949c2a24
MD5 384af6ae06efdbe4c051b706ecc93bc0
BLAKE2b-256 8cf00b06ef259aeb36b45f97044ca2381a004881a0cec1d443c6d0958b86af4c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page