Skip to main content

Lightweight token tracking, cost management, and budget enforcement for LLM API calls

Project description

TokenBudget

Stop bleeding money on LLM API calls.

A lightweight Python library for tracking tokens, managing costs, and enforcing budgets across all major LLM providers.

Python 3.9+ License: MIT PyPI version


The Problem

You're building with LLMs and:

  • Costs spiral out of control with no visibility
  • No idea which API calls are eating your budget
  • Production bills that make you cry
  • Clunky observability platforms that require external services

There's no simple pip install library that just works.

The Solution

from tokenbudget import TokenTracker, budget

tracker = TokenTracker()
client = tracker.wrap_openai(openai.OpenAI())

# Every call is tracked automatically
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

print(tracker.usage)
# Usage(total_tokens=25, total_cost_usd=0.000375, calls=1)

That's it. No platforms, no external services, no configuration.


Features

Token Tracking - Automatic tracking for OpenAI, Anthropic, Google Cost Calculation - Built-in pricing database (always up-to-date) Budget Enforcement - Decorators to prevent overspending Response Caching - Save money with zero-cost cached responses Usage Reports - Beautiful tables + CSV/JSON exports Multi-Provider - One tracker for all your LLM calls Thread-Safe - Works seamlessly in concurrent applications Async Support - Works with async clients out of the box


Installation

# Basic installation
pip install tokenbudget

# With OpenAI support
pip install tokenbudget[openai]

# With Anthropic support
pip install tokenbudget[anthropic]

# With everything
pip install tokenbudget[all]

Quick Examples

1. Basic Tracking

from tokenbudget import TokenTracker
import openai

tracker = TokenTracker()
client = tracker.wrap_openai(openai.OpenAI())

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is 2+2?"}]
)

print(f"Tokens: {tracker.usage.total_tokens}")
print(f"Cost: ${tracker.usage.total_cost_usd:.6f}")

2. Budget Enforcement

from tokenbudget import budget, BudgetExceeded

@budget(max_cost_usd=1.00, max_tokens=50000)
def my_llm_pipeline(data):
    # All LLM calls inside are tracked
    # Raises BudgetExceeded if limit is hit
    result = process_with_llm(data)
    return result

# Or as context manager
with budget(max_cost_usd=0.50) as ctx:
    response = client.chat.completions.create(...)
    print(f"Remaining: ${ctx.remaining_budget:.4f}")

3. Multi-Provider Tracking

import openai
import anthropic
from tokenbudget import TokenTracker

tracker = TokenTracker()

# Track OpenAI
openai_client = tracker.wrap_openai(openai.OpenAI())
openai_client.chat.completions.create(model="gpt-4o", ...)

# Track Anthropic
anthropic_client = tracker.wrap_anthropic(anthropic.Anthropic())
anthropic_client.messages.create(model="claude-sonnet-4-5", ...)

# Combined reporting
print(f"Total cost: ${tracker.total_cost_usd:.4f}")
print(tracker.usage_by_provider)

4. Response Caching

tracker = TokenTracker(cache="memory")
client = tracker.wrap_openai(openai.OpenAI())

# First call - costs money
response1 = client.chat.completions.create(model="gpt-4o", ...)

# Identical call - FREE (cached)
response2 = client.chat.completions.create(model="gpt-4o", ...)

stats = tracker.cache_stats
print(f"Saved: ${stats.saved_cost_usd:.4f}")

5. Usage Reports

from tokenbudget import generate_table_report

print(generate_table_report(tracker))

# Output:
# ┌─────────────────────────────────────┐
# │ TokenBudget Usage Report            │
# ├─────────────────────────────────────┤
# │ Provider   │ Calls │ Tokens │ Cost  │
# │ openai     │   15  │ 12.3k  │ $0.24 │
# │ anthropic  │    8  │  8.1k  │ $0.18 │
# ├─────────────────────────────────────┤
# │ Total      │   23  │ 20.4k  │ $0.42 │
# └─────────────────────────────────────┘

# Export to CSV/JSON
tracker.export_csv("usage.csv")
tracker.export_json("usage.json")

Supported Models

OpenAI gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo, o1, o1-mini, o3-mini

Anthropic claude-opus-4-5, claude-sonnet-4-5, claude-haiku-4-5, claude-3-5-sonnet, claude-3-opus

Google gemini-2.0-flash, gemini-2.0-pro, gemini-1.5-pro, gemini-1.5-flash

Need a custom model? Easy:

from tokenbudget import register_model

register_model(
    "my-custom-model",
    input_per_1k=0.001,
    output_per_1k=0.002,
    provider="custom"
)

API Reference

TokenTracker

tracker = TokenTracker(cache=None)  # cache: "memory", "disk", or None

Methods:

  • wrap_openai(client) - Wrap OpenAI client
  • wrap_anthropic(client) - Wrap Anthropic client
  • track(model, prompt_tokens, completion_tokens, provider) - Manual tracking
  • reset() - Reset all statistics

Properties:

  • usage - Overall usage stats
  • usage_by_provider - Per-provider breakdown
  • total_cost_usd - Total cost across all calls
  • cache_stats - Cache hit/miss statistics

Budget Enforcement

@budget(max_cost_usd=None, max_tokens=None, tracker=None)
def my_function():
    ...

# Or as context manager
with budget(max_cost_usd=1.0) as ctx:
    ...
    print(ctx.remaining_budget)
    print(ctx.current_usage)

Exceptions:

  • BudgetExceeded - Cost limit exceeded
  • TokenLimitReached - Token limit exceeded

Pricing

from tokenbudget import get_price, register_model, calculate_cost

# Get model pricing
price = get_price("gpt-4o")
print(price.input_per_1k, price.output_per_1k)

# Calculate cost
cost = calculate_cost("gpt-4o", input_tokens=1000, output_tokens=500)

# Register custom model
register_model("my-model", input_per_1k=0.001, output_per_1k=0.002)

Reports

from tokenbudget import generate_table_report, export_csv, export_json

# Pretty table
print(generate_table_report(tracker))

# Export
export_csv(tracker, "usage.csv")
export_json(tracker, "usage.json")

Custom Providers

Support any LLM provider:

from tokenbudget.providers.custom import CustomProvider

custom = CustomProvider(
    tracker=tracker,
    provider_name="my-llm-service",
    extract_model=lambda r: r["model"],
    extract_prompt_tokens=lambda r: r["usage"]["input"],
    extract_completion_tokens=lambda r: r["usage"]["output"],
)

# Track your custom response
custom.track(api_response)

Contributing

Contributions welcome! Please feel free to submit a Pull Request.


License

MIT License - see LICENSE file for details.


Author

Built by a developer tired of surprise LLM bills.

If this saved you money, consider starring the repo.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenbudget-0.1.2.tar.gz (20.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokenbudget-0.1.2-py3-none-any.whl (20.3 kB view details)

Uploaded Python 3

File details

Details for the file tokenbudget-0.1.2.tar.gz.

File metadata

  • Download URL: tokenbudget-0.1.2.tar.gz
  • Upload date:
  • Size: 20.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for tokenbudget-0.1.2.tar.gz
Algorithm Hash digest
SHA256 8444886720c24c75e3dab72913ece4c8ae8aee962d8541973060a6a6976ea553
MD5 01faabf8d135a04f02a72442a98af4ab
BLAKE2b-256 648959b9e2044d2bc129c7b0ab491fc0a5e321691272ab21c4378ea5bbccbfab

See more details on using hashes here.

File details

Details for the file tokenbudget-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: tokenbudget-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 20.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for tokenbudget-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 db556383cec38e6ba97d50d6d141c041e9571b9b6394a5ec0988f0184bd7ffde
MD5 bcdbd19634d26f37fed7becfe067fb25
BLAKE2b-256 2ab95d7c1c339bab271883d0cd1b6ae7e24a914c27aae117658dd6fe3cccee58

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page