Skip to main content

Cost circuit breaker for AI agents — guard your OpenAI spend with automatic downgrade and kill switch.

Project description

TokenFence

Cost circuit breaker for AI agents. Guard your LLM spend with automatic model downgrade and kill switch. Supports OpenAI, Anthropic Claude, Google Gemini, and DeepSeek.

Install

pip install tokenfence[openai]

Quick Start

import openai
from tokenfence import guard

client = guard(
    openai.OpenAI(),
    budget='$0.50',
    fallback='gpt-4o-mini',
    on_limit='stop',
)

# Use exactly like a normal OpenAI client
response = client.chat.completions.create(
    model='gpt-4o',
    messages=[{'role': 'user', 'content': 'Hello'}],
)

# Check spend
print(client.tokenfence.spent)      # 0.0023
print(client.tokenfence.remaining)  # 0.4977
print(client.tokenfence.calls)      # 1

Anthropic Claude

import anthropic
from tokenfence import guard

client = guard(
    anthropic.Anthropic(),
    budget='$1.00',
    fallback='claude-3-haiku-20240307',
    on_limit='stop',
)

# Use exactly like a normal Anthropic client
response = client.messages.create(
    model='claude-3-5-sonnet-20241022',
    max_tokens=1024,
    messages=[{'role': 'user', 'content': 'Hello'}],
)

# Check spend
print(client.tokenfence.spent)      # 0.00105
print(client.tokenfence.remaining)  # 0.99895

Async Support

For async applications (the standard in production agent pipelines), use async_guard:

import openai
from tokenfence import async_guard

client = async_guard(
    openai.AsyncOpenAI(),
    budget='$0.50',
    fallback='gpt-4o-mini',
    on_limit='stop',
)

# Use exactly like a normal async OpenAI client
response = await client.chat.completions.create(
    model='gpt-4o',
    messages=[{'role': 'user', 'content': 'Hello'}],
)

print(client.tokenfence.spent)

Works with anthropic.AsyncAnthropic too:

import anthropic
from tokenfence import async_guard

client = async_guard(
    anthropic.AsyncAnthropic(),
    budget='$1.00',
    fallback='claude-3-haiku-20240307',
    on_limit='raise',
)

response = await client.messages.create(
    model='claude-3-5-sonnet-20241022',
    max_tokens=1024,
    messages=[{'role': 'user', 'content': 'Hello'}],
)

How It Works

  1. Track — every chat.completions.create() call records token usage and calculates cost.
  2. Downgrade — when cumulative spend hits the threshold (default 80% of budget), the model is transparently swapped to your fallback.
  3. Kill switch — when the budget is fully consumed:
    • on_limit='stop' — returns a synthetic response explaining the budget was exceeded.
    • on_limit='warn' — logs a warning but allows the call through.
    • on_limit='raise' — raises BudgetExceeded.

API

guard(client, *, budget, fallback=None, on_limit='stop', threshold=0.8)

Parameter Type Description
client openai.OpenAI An OpenAI client instance
budget str | float Max spend — '$0.50' or 0.50
fallback str | None Model to downgrade to when threshold is hit
on_limit str 'stop', 'warn', or 'raise'
threshold float Fraction of budget at which downgrade kicks in (0.0–1.0)

async_guard(client, *, budget, fallback=None, on_limit='stop', threshold=0.8)

Same parameters as guard(), but for async clients (openai.AsyncOpenAI, anthropic.AsyncAnthropic).

client.tokenfence

Attribute Description
.spent Total USD spent so far
.remaining USD remaining in budget
.calls Number of tracked API calls
.budget The configured budget
.reset() Reset spend tracking to zero

Limits on Free Tier

The free Hobby tier includes 50K tracked requests/month. For production workloads:

Tier Requests Price
Hobby 50K/mo Free
Pro 500K/mo $49/mo
Team 2M/mo $149/mo

Upgrade to Pro at tokenfence.dev — 7-day free trial, no credit card required to start.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenfence-0.3.2.tar.gz (14.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokenfence-0.3.2-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file tokenfence-0.3.2.tar.gz.

File metadata

  • Download URL: tokenfence-0.3.2.tar.gz
  • Upload date:
  • Size: 14.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tokenfence-0.3.2.tar.gz
Algorithm Hash digest
SHA256 820a4c7f115f7bd9a9e4da0fa0c150f5e0a49ce8b62ad3e1530d84be7414b7b6
MD5 fc863ad2b75e3a050dce04ab7d2f066e
BLAKE2b-256 dbbf525d26af96c831b74eeb7fa9a2516708107078ebea8502d994d219747c39

See more details on using hashes here.

File details

Details for the file tokenfence-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: tokenfence-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tokenfence-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a224ec87cf26d7a1738182df49fca0ed69cec013b49d2bb48a67e32ce4a1e6db
MD5 f520787845c3428879fcc8017244c538
BLAKE2b-256 ea2bb285d2aef6541a839e49fd6f6041673b6a643b0ae6ee9216756d068ef98f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page