Skip to main content

Budget-limited LLM API client wrapper to prevent runaway AI agent costs

Project description

Agent Budget Guard

Hard spending limits for LLM API calls. Prevents runaway agent costs.

Wraps OpenAI, Anthropic, and Google Gemini — drop-in replacement for each SDK client with budget enforcement and no other changes to your code.

Install

pip install agent-budget-guard

Quickstart

OpenAI

from agent_budget_guard import BudgetedSession

client = BudgetedSession.openai(budget_usd=5.00)

# Non-streaming — identical to normal OpenAI usage
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

# Streaming — works the same way, cost tracked from final chunk
for chunk in client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
):
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

print(client.session.get_summary())

Anthropic

client = BudgetedSession.anthropic(budget_usd=5.00)

# Non-streaming
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.content[0].text)

# Streaming
for event in client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
):
    if event.type == "content_block_delta":
        print(event.delta.text, end="")

Google Gemini

client = BudgetedSession.google(budget_usd=5.00)

# Non-streaming
response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents="Hello",
)
print(response.text)

# Streaming — Google uses a separate method (mirrors the underlying SDK)
for chunk in client.models.generate_content_stream(
    model="gemini-2.0-flash",
    contents="Hello",
):
    print(chunk.text, end="")

API Keys

Set the standard environment variable for each provider:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GOOGLE_API_KEY=...

Or pass api_key= directly to any factory method.

Manual Wrapping

If you already have a client instance, wrap it directly:

from openai import OpenAI
from agent_budget_guard import BudgetedSession

session = BudgetedSession(budget_usd=5.00)
client = session.wrap_openai(OpenAI())

Same pattern for wrap_anthropic() and wrap_google().

Callbacks

client = BudgetedSession.openai(
    budget_usd=5.00,
    on_budget_exceeded=lambda e: print(f"Budget hit: {e}"),
    on_warning=lambda w: print(f"{w['threshold']}% of budget used"),
    warning_thresholds=[50, 90],  # default: [30, 80, 95]
)

on_budget_exceeded — called when a request would exceed the budget. The call returns None instead of raising. Without this callback, a BudgetExceededError is raised.

on_warning — called when utilization crosses a threshold. Each threshold fires once per session. The callback receives:

{
    "threshold": 50,       # which % threshold was crossed
    "spent": 2.51,         # total spent so far
    "remaining": 2.49,     # budget left
    "budget": 5.00         # total budget
}

Concurrent Agents

All agents share the same budget pool with atomic reservation — no race conditions.

import concurrent.futures
from agent_budget_guard import BudgetedSession, BudgetExceededError

client = BudgetedSession.openai(budget_usd=10.00)

def agent_task(task_id):
    for _ in range(10):
        try:
            client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[{"role": "user", "content": f"Task {task_id}"}]
            )
        except BudgetExceededError:
            return

with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    executor.map(agent_task, range(5))

How It Works

  1. Estimates cost before the API call (token counting + model pricing)
  2. Atomically reserves that amount from the budget
  3. Makes the API call only if within budget
  4. Calculates actual cost from the response (or final stream chunk)
  5. Commits actual cost, releases the reservation
  6. Fires warning callbacks if thresholds are crossed

spent + reserved <= budget at all times, even under concurrency.

Session API

client.session.get_total_spent()        # USD spent so far
client.session.get_remaining_budget()   # USD remaining (accounts for in-flight calls)
client.session.get_reserved()           # USD reserved for in-flight calls
client.session.get_budget()             # total budget
client.session.get_summary()            # dict with all of the above
client.session.reset()                  # reset to zero (don't use mid-flight)

Supported Models

OpenAI — GPT-5.2, GPT-5.1, GPT-5-mini, GPT-5-nano, GPT-4.1, GPT-4.1-mini, GPT-4.1-nano, GPT-4o, GPT-4o-mini, o1, o1-pro, o3, o3-pro, o4-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo

Batch pricing: BudgetedSession.openai(budget_usd=5.00, tier="batch")

Anthropic — claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5, claude-3-5-sonnet, claude-3-5-haiku, claude-3-opus, claude-3-sonnet, claude-3-haiku

Google Gemini — gemini-2.0-flash, gemini-2.0-flash-lite, gemini-2.0-pro, gemini-1.5-pro, gemini-1.5-flash, gemini-1.5-flash-8b

Development

git clone https://github.com/Digital-Ibraheem/agent-budget-guard.git
cd agent-budget-guard
pip install -e ".[dev]"
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_budget_guard-0.2.0.tar.gz (30.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_budget_guard-0.2.0-py3-none-any.whl (30.8 kB view details)

Uploaded Python 3

File details

Details for the file agent_budget_guard-0.2.0.tar.gz.

File metadata

  • Download URL: agent_budget_guard-0.2.0.tar.gz
  • Upload date:
  • Size: 30.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for agent_budget_guard-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a97f2680057e777cf47a603d7caee0614ac416188c6414e8cfe08ee21c07ffcf
MD5 9fc47b4a67bcb0a43291a5eac49f5272
BLAKE2b-256 456b8fe13fdd240f136c2802272422a6e48a79f8fbb2613b8183b0b4831a1a6e

See more details on using hashes here.

File details

Details for the file agent_budget_guard-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_budget_guard-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c2428aea80be4527f0766ae0e0b0052fbe0444544acfb22ab6810fa61931211d
MD5 b0c99d1b4b696a189a1e025e9f0ee614
BLAKE2b-256 7ace702e111f6db067ad712e88b6f0f1529742476eb17bdd47552aded1e149f2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page