Budget-limited LLM API client wrapper to prevent runaway AI agent costs
Project description
Agent Budget Guard
Hard spending limits for LLM API calls. Prevents runaway agent costs.
Wraps OpenAI, Anthropic, and Google Gemini — drop-in replacement for each SDK client with budget enforcement and no other changes to your code.
Install
pip install agent-budget-guard
Quickstart
OpenAI
from agent_budget_guard import BudgetedSession
client = BudgetedSession.openai(budget_usd=5.00)
# Non-streaming — identical to normal OpenAI usage
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
# Streaming — works the same way, cost tracked from final chunk
for chunk in client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}],
stream=True,
):
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
print(client.session.get_summary())
Anthropic
client = BudgetedSession.anthropic(budget_usd=5.00)
# Non-streaming
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
)
print(response.content[0].text)
# Streaming
for event in client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
stream=True,
):
if event.type == "content_block_delta":
print(event.delta.text, end="")
Google Gemini
client = BudgetedSession.google(budget_usd=5.00)
# Non-streaming
response = client.models.generate_content(
model="gemini-2.0-flash",
contents="Hello",
)
print(response.text)
# Streaming — Google uses a separate method (mirrors the underlying SDK)
for chunk in client.models.generate_content_stream(
model="gemini-2.0-flash",
contents="Hello",
):
print(chunk.text, end="")
API Keys
Set the standard environment variable for each provider:
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GOOGLE_API_KEY=...
Or pass api_key= directly to any factory method.
Manual Wrapping
If you already have a client instance, wrap it directly:
from openai import OpenAI
from agent_budget_guard import BudgetedSession
session = BudgetedSession(budget_usd=5.00)
client = session.wrap_openai(OpenAI())
Same pattern for wrap_anthropic() and wrap_google().
Callbacks
client = BudgetedSession.openai(
budget_usd=5.00,
on_budget_exceeded=lambda e: print(f"Budget hit: {e}"),
on_warning=lambda w: print(f"{w['threshold']}% of budget used"),
warning_thresholds=[50, 90], # default: [30, 80, 95]
)
on_budget_exceeded — called when a request would exceed the budget. The call returns None instead of raising. Without this callback, a BudgetExceededError is raised.
on_warning — called when utilization crosses a threshold. Each threshold fires once per session. The callback receives:
{
"threshold": 50, # which % threshold was crossed
"spent": 2.51, # total spent so far
"remaining": 2.49, # budget left
"budget": 5.00 # total budget
}
Concurrent Agents
All agents share the same budget pool with atomic reservation — no race conditions.
import concurrent.futures
from agent_budget_guard import BudgetedSession, BudgetExceededError
client = BudgetedSession.openai(budget_usd=10.00)
def agent_task(task_id):
for _ in range(10):
try:
client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"Task {task_id}"}]
)
except BudgetExceededError:
return
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
executor.map(agent_task, range(5))
How It Works
- Estimates cost before the API call (token counting + model pricing)
- Atomically reserves that amount from the budget
- Makes the API call only if within budget
- Calculates actual cost from the response (or final stream chunk)
- Commits actual cost, releases the reservation
- Fires warning callbacks if thresholds are crossed
spent + reserved <= budget at all times, even under concurrency.
Session API
client.session.get_total_spent() # USD spent so far
client.session.get_remaining_budget() # USD remaining (accounts for in-flight calls)
client.session.get_reserved() # USD reserved for in-flight calls
client.session.get_budget() # total budget
client.session.get_summary() # dict with all of the above
client.session.reset() # reset to zero (don't use mid-flight)
Supported Models
OpenAI — GPT-5.2, GPT-5.1, GPT-5-mini, GPT-5-nano, GPT-4.1, GPT-4.1-mini, GPT-4.1-nano, GPT-4o, GPT-4o-mini, o1, o1-pro, o3, o3-pro, o4-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo
Batch pricing: BudgetedSession.openai(budget_usd=5.00, tier="batch")
Anthropic — claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5, claude-3-5-sonnet, claude-3-5-haiku, claude-3-opus, claude-3-sonnet, claude-3-haiku
Google Gemini — gemini-2.0-flash, gemini-2.0-flash-lite, gemini-2.0-pro, gemini-1.5-pro, gemini-1.5-flash, gemini-1.5-flash-8b
Development
git clone https://github.com/Digital-Ibraheem/agent-budget-guard.git
cd agent-budget-guard
pip install -e ".[dev]"
pytest
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_budget_guard-0.2.0.tar.gz.
File metadata
- Download URL: agent_budget_guard-0.2.0.tar.gz
- Upload date:
- Size: 30.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a97f2680057e777cf47a603d7caee0614ac416188c6414e8cfe08ee21c07ffcf
|
|
| MD5 |
9fc47b4a67bcb0a43291a5eac49f5272
|
|
| BLAKE2b-256 |
456b8fe13fdd240f136c2802272422a6e48a79f8fbb2613b8183b0b4831a1a6e
|
File details
Details for the file agent_budget_guard-0.2.0-py3-none-any.whl.
File metadata
- Download URL: agent_budget_guard-0.2.0-py3-none-any.whl
- Upload date:
- Size: 30.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2428aea80be4527f0766ae0e0b0052fbe0444544acfb22ab6810fa61931211d
|
|
| MD5 |
b0c99d1b4b696a189a1e025e9f0ee614
|
|
| BLAKE2b-256 |
7ace702e111f6db067ad712e88b6f0f1529742476eb17bdd47552aded1e149f2
|