Real-time cost enforcement for AI agent sessions
Project description
AgentBudget
Real-time cost enforcement for AI agent sessions.
Website · Docs · PyPI · npm · Go · GitHub
What's New in v0.4.0 — LangChain/LangGraph coverage, safer concurrency & costs
- LangChain / LangGraph —
LangChainBudgetCallbacknow tracks modern chat-model usage (usage_metadata) and LangGraph runs, adds optional tool-cost tracking, and a context-manager lifecycle withon_hard_limitsupport. (#26) - Per-request isolation in drop-in mode —
init()/teardown()session state is now scoped per thread and per async task, so concurrent requests no longer overwrite or tear down each other's sessions. (#20) - Cost validation — negative,
NaN, and infinite costs are now rejected withInvalidCostbefore they can corrupt the ledger. (#21) - Streaming, fixed — cost is recorded even when you break out of a stream early or an async stream ends incompletely, and OpenAI's
stream_options={"include_usage": True}is now injected automatically. (#15)
See the changelog for the full list. Earlier 0.3.0 features — streaming, wrap_client(), finalization_reserve, would_exceed(), and OpenRouter model names — are documented below.
What is AgentBudget?
AgentBudget is an open-source Python SDK that puts a hard dollar limit on any AI agent session. It wraps LLM calls, tool calls, and external API requests with real-time cost tracking and automatic circuit breaking — so your agent can never silently burn through your budget.
One line to set a budget. Zero infrastructure to manage. Works with any LLM provider.
Quickstart
Drop-in Mode (Recommended)
Two lines. Zero code changes to your existing agent.
import agentbudget
import openai
agentbudget.init("$5.00")
# Your existing code — no changes needed
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Analyze this market..."}]
)
print(agentbudget.spent()) # e.g. 0.0035
print(agentbudget.remaining()) # e.g. 4.9965
print(agentbudget.report()) # Full cost breakdown
agentbudget.teardown() # Stop tracking, get final report
agentbudget.init() patches OpenAI and Anthropic SDKs so every call is tracked automatically. The patch is process-wide, but the active budget/session is isolated to the current thread or async task. teardown() only closes the current context's session, and the SDKs are unpatched after the last active context exits.
For concurrent apps:
- Call
agentbudget.init()inside each request thread that needs its own session. asynciotasks inherit the current session at task creation; callagentbudget.init()inside the task to give it an independent session.- For explicit server-side scoping,
wrap_client()or manualBudgetSessionusage is still the most predictable option.
Manual Mode
For full control, use the context manager API directly.
from agentbudget import AgentBudget
budget = AgentBudget(max_spend="$5.00")
with budget.session() as session:
# Auto-cost LLM responses
response = session.wrap(openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Research competitors in the CRM space"}]
))
# Track tool/API calls with known costs
result = session.track(call_serp_api(query="CRM market"), cost=0.01)
# When the $5 limit is hit, BudgetExhausted is raised
# No silent overruns. No surprise bills.
print(session.report())
Install
Python
pip install agentbudget
Python 3.9+. For LangChain integration: pip install agentbudget[langchain]. For AutoGen integration: pip install agentbudget[autogen].
Go
go get github.com/AgentBudget/agentbudget/sdks/go
Go 1.21+. No external dependencies. Imported directly from GitHub — no registry needed.
TypeScript / JavaScript
npm install agentbudget
Node.js 18+. Works with openai ≥ 4.0 and @anthropic-ai/sdk ≥ 0.20 (both optional peer deps).
Multi-Language SDK
AgentBudget ships first-party SDKs for Python, Go, and TypeScript. All three share the same session/budget pattern and built-in pricing table.
Go
import agentbudget "github.com/AgentBudget/agentbudget/sdks/go"
budget, _ := agentbudget.New("$5.00")
session := budget.NewSession()
defer session.Close()
// After your OpenAI or Anthropic call, record token usage:
if err := session.WrapUsage("gpt-4o", inputTokens, outputTokens); err != nil {
// *agentbudget.BudgetExhausted or *agentbudget.LoopDetected
}
fmt.Printf("spent: $%.4f remaining: $%.4f\n", session.Spent(), session.Remaining())
See /sdks/go/README.md for full docs.
TypeScript
import { AgentBudget } from "agentbudget";
import OpenAI from "openai";
const budget = new AgentBudget("$5.00");
const session = budget.newSession();
const resp = await new OpenAI().chat.completions.create({ model: "gpt-4o", messages });
session.wrapOpenAI(resp);
console.log(`spent: $${session.spent.toFixed(4)}`);
session.close();
See /sdks/typescript/README.md for full docs.
See
/sdks/README.mdfor the full feature parity matrix and monorepo structure.
Drop-in API
| Function | Description |
|---|---|
agentbudget.init(budget) |
Start tracking in the current thread/task context. Returns session. |
agentbudget.spent() |
Total dollars spent so far. |
agentbudget.remaining() |
Dollars left in the budget. |
agentbudget.report() |
Full cost breakdown as a dict. |
agentbudget.track(result, cost, tool_name) |
Manually track a tool/API call cost. |
agentbudget.wrap_client(client, session) |
Attach tracking to a specific client instance only. |
agentbudget.register_model(name, input, output) |
Add pricing for a new model at runtime. |
agentbudget.register_models(dict) |
Batch register pricing for multiple models. |
agentbudget.get_session() |
Get the active session visible in the current context. |
agentbudget.teardown() |
Stop tracking in the current context and return its final report. |
Features
Streaming Support
Streaming responses (stream=True) are fully tracked. Cost is recorded even if you break out of the stream early — chunks pass through to your code unchanged.
# Drop-in mode — works automatically
agentbudget.init("$5.00")
client = openai.OpenAI()
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize this report"}],
stream=True, # include_usage is added automatically for OpenAI
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
print(agentbudget.spent()) # cost recorded after the stream
OpenAI note: In drop-in mode (
init()) and withwrap_client(), AgentBudget automatically addsstream_options={"include_usage": True}so token counts appear on the final chunk — you don't need to pass it yourself (an explicit value is always respected). Anthropic streams always include usage. The only time you must set it manually is if you call the OpenAI API directly and pass the response tosession.wrap().
Async streaming works the same way:
async for chunk in await client.chat.completions.create(stream=True, ...):
...
Explicit Per-Client Tracking
By default, agentbudget.init() installs process-wide OpenAI/Anthropic patches and resolves the active session from the current thread/task context at call time. If you need finer control — multiple budgets, different clients per task, or just prefer explicit scope — use wrap_client():
from agentbudget import AgentBudget
import agentbudget
import openai
budget = AgentBudget(max_spend="$5.00")
with budget.session() as session:
# Only this client instance is tracked
client = agentbudget.wrap_client(openai.OpenAI(), session)
response = client.chat.completions.create(...) # tracked
other = openai.OpenAI()
other.chat.completions.create(...) # NOT tracked
Works with openai.OpenAI, openai.AsyncOpenAI, anthropic.Anthropic, and anthropic.AsyncAnthropic.
Finalization Reserve
Prevent your agent from being cut off mid-task. Reserve a fraction of the budget exclusively for the final response step:
budget = AgentBudget(
max_spend="$1.00",
finalization_reserve=0.05, # hard limit fires at $0.95, last $0.05 stays free
)
For manual control, check before the final call:
with budget.session() as session:
# ... do work ...
if session.would_exceed(estimated_final_cost):
return "Here's what I completed so far: ..."
# Safe to proceed — won't hit the hard limit
response = session.wrap(client.chat.completions.create(...))
Circuit Breaker
Three levels of protection against runaway spend:
budget = AgentBudget(
max_spend="$5.00",
soft_limit=0.9, # Warn at 90% spent
max_repeated_calls=10, # Trip after 10 repeated calls
loop_window_seconds=60.0, # Within a 60-second window
on_soft_limit=lambda r: print("90% budget used"),
on_hard_limit=lambda r: alert_ops_team(r),
on_loop_detected=lambda r: print("Loop detected!"),
)
- Soft limit — Fires a callback when spending exceeds a threshold. Agent can wrap up gracefully.
- Hard limit — Raises
BudgetExhausted. No more calls allowed. - Loop detection — Catches infinite loops before they drain the budget.
Async Support
async with budget.async_session() as session:
response = await session.wrap_async(
client.chat.completions.acreate(model="gpt-4o", messages=[...])
)
@session.track_tool(cost=0.01)
async def async_search(query):
return await api.search(query)
Nested Budgets
Parent sessions allocate sub-budgets to child tasks. Costs roll up automatically.
with budget.session() as parent:
child = parent.child_session(max_spend=2.0)
with child:
child.track("result", cost=1.50, tool_name="sub_task")
print(parent.spent) # 1.50
print(parent.remaining) # 8.50
Webhooks
Stream budget events to any HTTP endpoint for alerting and billing.
budget = AgentBudget(
max_spend="$5.00",
webhook_url="https://your-app.com/api/budget-events",
)
Events are sent as JSON with event_type (soft_limit, hard_limit, loop_detected) and the full cost report.
Track Tool Decorator
Annotate any function to auto-track cost on every call.
@session.track_tool(cost=0.02, tool_name="search")
def my_search(query):
return api.search(query)
Integrations
LangChain / LangGraph
pip install agentbudget[langchain]
from agentbudget.integrations.langchain import LangChainBudgetCallback
# Use as a context manager so the session is finalized (duration + on_hard_limit).
with LangChainBudgetCallback(budget="$5.00") as callback:
agent.invoke(
{"input": "Research competitors"},
config={"callbacks": [callback]},
)
print(callback.get_report())
Costs are tracked from both legacy LLMResult token usage and modern chat-model
usage_metadata, so LangGraph runs and chat models (OpenAI, Anthropic, …) are
counted correctly. To also charge tool calls against the budget, pass per-tool costs:
callback = LangChainBudgetCallback(
budget="$5.00",
tool_costs={"web_search": 0.01, "code_exec": 0.005},
)
AutoGen
pip install agentbudget[autogen]
from agentbudget.integrations.autogen import BudgetedAssistantAgent, BudgetedUserProxyAgent
assistant = BudgetedAssistantAgent(name="assistant", budget="$5.00")
user = BudgetedUserProxyAgent(name="user", budget="$5.00")
user.initiate_chat(assistant, message="Research competitors in the CRM space")
print(assistant.get_report())
CrewAI
from agentbudget.integrations.crewai import CrewAIBudgetMiddleware
with CrewAIBudgetMiddleware(budget="$3.00") as middleware:
result = middleware.track(crew.kickoff(), cost=0.50, tool_name="crew_run")
print(middleware.get_report())
Raw OpenAI / Anthropic SDK
from agentbudget import AgentBudget
budget = AgentBudget("$5.00")
with budget.session() as s:
response = s.wrap(client.chat.completions.create(...))
Supported Models
Built-in pricing for 40+ models across OpenAI, Anthropic, Google Gemini, Mistral, and Cohere.
| Provider | Models |
|---|---|
| OpenAI | gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo, o1, o1-mini, o3, o3-pro, o4-mini |
| Anthropic | claude-opus-4-6, claude-opus-4-5, claude-sonnet-4-5, claude-sonnet-4, claude-haiku-4-5, claude-3-opus, claude-3-sonnet, claude-3-haiku |
| gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite, gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash | |
| Mistral | mistral-large, mistral-small, mistral-medium, codestral, open-mistral-nemo |
| Cohere | command-r-plus, command-r, command, command-light |
Custom Model Pricing
New model just launched? Don't wait for a release — register it at runtime:
import agentbudget
agentbudget.register_model(
"gpt-5",
input_price_per_million=5.00,
output_price_per_million=20.00,
)
# Or batch register multiple models:
agentbudget.register_models({
"gpt-5": (5.00, 20.00),
"gpt-5-mini": (0.50, 2.00),
})
Dated model variants (e.g. gpt-4o-2025-06-15) are automatically matched to their base model pricing.
OpenRouter model names (e.g. "openai/gpt-4o", "anthropic/claude-3-5-sonnet") are supported — the provider prefix is stripped automatically before the pricing lookup.
Missing a model from built-in pricing? PRs welcome — pricing data is in agentbudget/pricing.py.
Cost Report
Every session produces a structured cost report:
{
"session_id": "sess_abc123",
"budget": 5.00,
"total_spent": 3.42,
"remaining": 1.58,
"breakdown": {
"llm": {"total": 3.12, "calls": 8, "by_model": {"gpt-4o": 2.80, "gpt-4o-mini": 0.32}},
"tools": {"total": 0.30, "calls": 6, "by_tool": {"serp_api": 0.05, "scrape": 0.25}},
},
"duration_seconds": 34.2,
"terminated_by": null, # or "budget_exhausted" or "loop_detected"
"events": [...]
}
Pipe it to your observability stack, billing system, or just log it.
The Problem
AI agents are unpredictable by design. An agent might make 3 LLM calls or 300, use cheap models or expensive ones, invoke 1 tool or 50.
- The Loop Problem — A stuck agent makes 200 LLM calls in 10 minutes. $50-$200 before anyone notices.
- The Invisible Spend — Tokens aren't dollars. GPT-4o costs 15x more than GPT-4o-mini for similar token counts.
- Multi-Provider Chaos — One session calls OpenAI, Anthropic, Google, and 3 APIs. No unified real-time view.
- The Scaling Problem — 1,000 concurrent sessions with 5% failure rate = 50 runaway agents.
AgentBudget fills the gap: Real-time, dollar-denominated, per-session budget enforcement that spans LLM calls + tool calls + external APIs, works across providers, and kills runaway sessions automatically.
What It's NOT
- Not an LLM proxy. Wraps your existing client calls in-process.
- Not an observability platform. Produces cost data — pipe it wherever you want.
- Not a billing system. Enforces budgets, doesn't invoice customers.
- Not infrastructure. No Redis, no servers, no cloud account. It's a library.
License
Apache 2.0
Ship your agents with confidence. Set a budget. Move on.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentbudget-0.4.0.tar.gz.
File metadata
- Download URL: agentbudget-0.4.0.tar.gz
- Upload date:
- Size: 75.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
773fa513ca4c1a93282749c3c6d62f8e2d2ea35752e6941f045d1c21f9d33ecc
|
|
| MD5 |
8d58ed7abcaa05de846b8651e4a74497
|
|
| BLAKE2b-256 |
4cff9752fd259063e682a6ef87327b85be0762b11093e3562cba193dca0bc733
|
Provenance
The following attestation bundles were made for agentbudget-0.4.0.tar.gz:
Publisher:
workflow.yml on AgentBudget/agentbudget
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentbudget-0.4.0.tar.gz -
Subject digest:
773fa513ca4c1a93282749c3c6d62f8e2d2ea35752e6941f045d1c21f9d33ecc - Sigstore transparency entry: 1676058079
- Sigstore integration time:
-
Permalink:
AgentBudget/agentbudget@471c281b70f37baced41686da73e426dddd143a2 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/AgentBudget
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@471c281b70f37baced41686da73e426dddd143a2 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file agentbudget-0.4.0-py3-none-any.whl.
File metadata
- Download URL: agentbudget-0.4.0-py3-none-any.whl
- Upload date:
- Size: 43.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d68418bda1767bd75078298511acb9cc624272ac4ac015aee93a2874d68b71a4
|
|
| MD5 |
b46245eba515756138b1b5d8624728a4
|
|
| BLAKE2b-256 |
545007f17e1a91a0ad08c1d1729c37837a16c0d04c40cc0b25f26f0af22f0d29
|
Provenance
The following attestation bundles were made for agentbudget-0.4.0-py3-none-any.whl:
Publisher:
workflow.yml on AgentBudget/agentbudget
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentbudget-0.4.0-py3-none-any.whl -
Subject digest:
d68418bda1767bd75078298511acb9cc624272ac4ac015aee93a2874d68b71a4 - Sigstore transparency entry: 1676058093
- Sigstore integration time:
-
Permalink:
AgentBudget/agentbudget@471c281b70f37baced41686da73e426dddd143a2 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/AgentBudget
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@471c281b70f37baced41686da73e426dddd143a2 -
Trigger Event:
workflow_dispatch
-
Statement type: