Budget-limited LLM API client wrapper to prevent runaway AI agent costs

These details have not been verified by PyPI

Project links

Project description

Agent Budget Guard

Hard spending limits for LLM API calls. Prevents runaway agent costs.

Wraps OpenAI, Anthropic, and Google Gemini — drop-in replacement for each SDK client with budget enforcement and no other changes to your code.

Install

pip install agent-budget-guard

Quickstart

OpenAI

from agent_budget_guard import BudgetedSession

client = BudgetedSession.openai(budget_usd=5.00)

# Non-streaming — identical to normal OpenAI usage
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

# Streaming — works the same way, cost tracked from final chunk
for chunk in client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
):
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

print(client.session.get_summary())

Anthropic

client = BudgetedSession.anthropic(budget_usd=5.00)

# Non-streaming
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.content[0].text)

# Streaming
for event in client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
):
    if event.type == "content_block_delta":
        print(event.delta.text, end="")

Google Gemini

client = BudgetedSession.google(budget_usd=5.00)

# Non-streaming
response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents="Hello",
)
print(response.text)

# Streaming — Google uses a separate method (mirrors the underlying SDK)
for chunk in client.models.generate_content_stream(
    model="gemini-2.0-flash",
    contents="Hello",
):
    print(chunk.text, end="")

API Keys

Set the standard environment variable for each provider:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GOOGLE_API_KEY=...

Or pass api_key= directly to any factory method.

Manual Wrapping

If you already have a client instance, wrap it directly:

from openai import OpenAI
from agent_budget_guard import BudgetedSession

session = BudgetedSession(budget_usd=5.00)
client = session.wrap_openai(OpenAI())

Same pattern for wrap_anthropic() and wrap_google().

Callbacks

client = BudgetedSession.openai(
    budget_usd=5.00,
    on_budget_exceeded=lambda e: print(f"Budget hit: {e}"),
    on_warning=lambda w: print(f"{w['threshold']}% of budget used"),
    warning_thresholds=[50, 90],  # default: [30, 80, 95]
)

on_budget_exceeded — called when a request would exceed the budget. The call returns None instead of raising. Without this callback, a BudgetExceededError is raised.

on_warning — called when utilization crosses a threshold. Each threshold fires once per session. The callback receives:

{
    "threshold": 50,       # which % threshold was crossed
    "spent": 2.51,         # total spent so far
    "remaining": 2.49,     # budget left
    "budget": 5.00         # total budget
}

Concurrent Agents

All agents share the same budget pool with atomic reservation — no race conditions.

import concurrent.futures
from agent_budget_guard import BudgetedSession, BudgetExceededError

client = BudgetedSession.openai(budget_usd=10.00)

def agent_task(task_id):
    for _ in range(10):
        try:
            client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[{"role": "user", "content": f"Task {task_id}"}]
            )
        except BudgetExceededError:
            return

with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    executor.map(agent_task, range(5))

How It Works

Estimates cost before the API call (token counting + model pricing)
Atomically reserves that amount from the budget
Makes the API call only if within budget
Calculates actual cost from the response (or final stream chunk)
Commits actual cost, releases the reservation
Fires warning callbacks if thresholds are crossed

spent + reserved <= budget at all times, even under concurrency.

Session API

client.session.get_total_spent()        # USD spent so far
client.session.get_remaining_budget()   # USD remaining (accounts for in-flight calls)
client.session.get_reserved()           # USD reserved for in-flight calls
client.session.get_budget()             # total budget
client.session.get_summary()            # dict with all of the above
client.session.reset()                  # reset to zero (don't use mid-flight)

Supported Models

OpenAI — GPT-5.2, GPT-5.1, GPT-5-mini, GPT-5-nano, GPT-4.1, GPT-4.1-mini, GPT-4.1-nano, GPT-4o, GPT-4o-mini, o1, o1-pro, o3, o3-pro, o4-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo

Batch pricing: BudgetedSession.openai(budget_usd=5.00, tier="batch")

Anthropic — claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5, claude-3-5-sonnet, claude-3-5-haiku, claude-3-opus, claude-3-sonnet, claude-3-haiku

Google Gemini — gemini-2.0-flash, gemini-2.0-flash-lite, gemini-2.0-pro, gemini-1.5-pro, gemini-1.5-flash, gemini-1.5-flash-8b

Development

git clone https://github.com/Digital-Ibraheem/agent-budget-guard.git
cd agent-budget-guard
pip install -e ".[dev]"
pytest

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Feb 21, 2026

0.1.4

Feb 18, 2026

0.1.3

Feb 16, 2026

0.1.2

Feb 16, 2026

0.1.1

Feb 16, 2026

0.1.0

Feb 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_budget_guard-0.2.0.tar.gz (30.3 kB view details)

Uploaded Feb 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_budget_guard-0.2.0-py3-none-any.whl (30.8 kB view details)

Uploaded Feb 21, 2026 Python 3

File details

Details for the file agent_budget_guard-0.2.0.tar.gz.

File metadata

Download URL: agent_budget_guard-0.2.0.tar.gz
Upload date: Feb 21, 2026
Size: 30.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for agent_budget_guard-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`a97f2680057e777cf47a603d7caee0614ac416188c6414e8cfe08ee21c07ffcf`
MD5	`9fc47b4a67bcb0a43291a5eac49f5272`
BLAKE2b-256	`456b8fe13fdd240f136c2802272422a6e48a79f8fbb2613b8183b0b4831a1a6e`

See more details on using hashes here.

File details

Details for the file agent_budget_guard-0.2.0-py3-none-any.whl.

File metadata

Download URL: agent_budget_guard-0.2.0-py3-none-any.whl
Upload date: Feb 21, 2026
Size: 30.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for agent_budget_guard-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c2428aea80be4527f0766ae0e0b0052fbe0444544acfb22ab6810fa61931211d`
MD5	`b0c99d1b4b696a189a1e025e9f0ee614`
BLAKE2b-256	`7ace702e111f6db067ad712e88b6f0f1529742476eb17bdd47552aded1e149f2`

See more details on using hashes here.

agent-budget-guard 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Agent Budget Guard

Install

Quickstart

OpenAI

Anthropic

Google Gemini

API Keys

Manual Wrapping

Callbacks

Concurrent Agents

How It Works

Session API

Supported Models

Development

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes