Skip to main content

Lightweight token tracking, cost management, and budget enforcement for LLM API calls

Project description

TokenBudget

Stop bleeding money on LLM API calls.

A lightweight Python library for tracking tokens, managing costs, and enforcing budgets across all major LLM providers.

Python 3.9+ License: MIT PyPI version


The Problem

You're building with LLMs and:

  • Costs spiral out of control with no visibility
  • No idea which API calls are eating your budget
  • Production bills that make you cry
  • Clunky observability platforms that require external services

There's no simple pip install library that just works.

The Solution

from tokenbudget import TokenTracker, budget

tracker = TokenTracker()
client = tracker.wrap_openai(openai.OpenAI())

# Every call is tracked automatically
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

print(tracker.usage)
# Usage(total_tokens=25, total_cost_usd=0.000375, calls=1)

That's it. No platforms, no external services, no configuration.


Features

Token Tracking - Automatic tracking for OpenAI, Anthropic, Google Cost Calculation - Built-in pricing database (always up-to-date) Budget Enforcement - Decorators to prevent overspending Response Caching - Save money with zero-cost cached responses Usage Reports - Beautiful tables + CSV/JSON exports Multi-Provider - One tracker for all your LLM calls Thread-Safe - Works seamlessly in concurrent applications Async Support - Works with async clients out of the box


Installation

# Basic installation
pip install tokenbudget

# With OpenAI support
pip install tokenbudget[openai]

# With Anthropic support
pip install tokenbudget[anthropic]

# With everything
pip install tokenbudget[all]

Quick Examples

1. Basic Tracking

from tokenbudget import TokenTracker
import openai

tracker = TokenTracker()
client = tracker.wrap_openai(openai.OpenAI())

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is 2+2?"}]
)

print(f"Tokens: {tracker.usage.total_tokens}")
print(f"Cost: ${tracker.usage.total_cost_usd:.6f}")

2. Budget Enforcement

from tokenbudget import budget, BudgetExceeded

@budget(max_cost_usd=1.00, max_tokens=50000)
def my_llm_pipeline(data):
    # All LLM calls inside are tracked
    # Raises BudgetExceeded if limit is hit
    result = process_with_llm(data)
    return result

# Or as context manager
with budget(max_cost_usd=0.50) as ctx:
    response = client.chat.completions.create(...)
    print(f"Remaining: ${ctx.remaining_budget:.4f}")

3. Multi-Provider Tracking

import openai
import anthropic
from tokenbudget import TokenTracker

tracker = TokenTracker()

# Track OpenAI
openai_client = tracker.wrap_openai(openai.OpenAI())
openai_client.chat.completions.create(model="gpt-4o", ...)

# Track Anthropic
anthropic_client = tracker.wrap_anthropic(anthropic.Anthropic())
anthropic_client.messages.create(model="claude-sonnet-4-5", ...)

# Combined reporting
print(f"Total cost: ${tracker.total_cost_usd:.4f}")
print(tracker.usage_by_provider)

4. Response Caching

tracker = TokenTracker(cache="memory")
client = tracker.wrap_openai(openai.OpenAI())

# First call - costs money
response1 = client.chat.completions.create(model="gpt-4o", ...)

# Identical call - FREE (cached)
response2 = client.chat.completions.create(model="gpt-4o", ...)

stats = tracker.cache_stats
print(f"Saved: ${stats.saved_cost_usd:.4f}")

5. Usage Reports

from tokenbudget import generate_table_report

print(generate_table_report(tracker))

# Output:
# ┌─────────────────────────────────────┐
# │ TokenBudget Usage Report            │
# ├─────────────────────────────────────┤
# │ Provider   │ Calls │ Tokens │ Cost  │
# │ openai     │   15  │ 12.3k  │ $0.24 │
# │ anthropic  │    8  │  8.1k  │ $0.18 │
# ├─────────────────────────────────────┤
# │ Total      │   23  │ 20.4k  │ $0.42 │
# └─────────────────────────────────────┘

# Export to CSV/JSON
tracker.export_csv("usage.csv")
tracker.export_json("usage.json")

Supported Models

OpenAI gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo, o1, o1-mini, o3-mini

Anthropic claude-opus-4-5, claude-sonnet-4-5, claude-haiku-4-5, claude-3-5-sonnet, claude-3-opus

Google gemini-2.0-flash, gemini-2.0-pro, gemini-1.5-pro, gemini-1.5-flash

Need a custom model? Easy:

from tokenbudget import register_model

register_model(
    "my-custom-model",
    input_per_1k=0.001,
    output_per_1k=0.002,
    provider="custom"
)

API Reference

TokenTracker

tracker = TokenTracker(cache=None)  # cache: "memory", "disk", or None

Methods:

  • wrap_openai(client) - Wrap OpenAI client
  • wrap_anthropic(client) - Wrap Anthropic client
  • track(model, prompt_tokens, completion_tokens, provider) - Manual tracking
  • reset() - Reset all statistics

Properties:

  • usage - Overall usage stats
  • usage_by_provider - Per-provider breakdown
  • total_cost_usd - Total cost across all calls
  • cache_stats - Cache hit/miss statistics

Budget Enforcement

@budget(max_cost_usd=None, max_tokens=None, tracker=None)
def my_function():
    ...

# Or as context manager
with budget(max_cost_usd=1.0) as ctx:
    ...
    print(ctx.remaining_budget)
    print(ctx.current_usage)

Exceptions:

  • BudgetExceeded - Cost limit exceeded
  • TokenLimitReached - Token limit exceeded

Pricing

from tokenbudget import get_price, register_model, calculate_cost

# Get model pricing
price = get_price("gpt-4o")
print(price.input_per_1k, price.output_per_1k)

# Calculate cost
cost = calculate_cost("gpt-4o", input_tokens=1000, output_tokens=500)

# Register custom model
register_model("my-model", input_per_1k=0.001, output_per_1k=0.002)

Reports

from tokenbudget import generate_table_report, export_csv, export_json

# Pretty table
print(generate_table_report(tracker))

# Export
export_csv(tracker, "usage.csv")
export_json(tracker, "usage.json")

Custom Providers

Support any LLM provider:

from tokenbudget.providers.custom import CustomProvider

custom = CustomProvider(
    tracker=tracker,
    provider_name="my-llm-service",
    extract_model=lambda r: r["model"],
    extract_prompt_tokens=lambda r: r["usage"]["input"],
    extract_completion_tokens=lambda r: r["usage"]["output"],
)

# Track your custom response
custom.track(api_response)

Contributing

Contributions welcome! Please feel free to submit a Pull Request.


License

MIT License - see LICENSE file for details.


Author

Built by a developer tired of surprise LLM bills.

If this saved you money, consider starring the repo.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenbudget-0.1.0.tar.gz (16.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokenbudget-0.1.0-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file tokenbudget-0.1.0.tar.gz.

File metadata

  • Download URL: tokenbudget-0.1.0.tar.gz
  • Upload date:
  • Size: 16.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for tokenbudget-0.1.0.tar.gz
Algorithm Hash digest
SHA256 22e1b44a17abc1e2bd29e5a68b10477e47a1ddfffdca23cac341450a197bdaa4
MD5 aceb77786d65f42a7d24ad2e5c1f9d9d
BLAKE2b-256 5e5d96ff2bcf1a71cd4662b7c429d91377b04930fad5d73db0c9b97a81d88529

See more details on using hashes here.

File details

Details for the file tokenbudget-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tokenbudget-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for tokenbudget-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 76627f4d0c1c35c6212610fbc4f1b0e2be08676a3008989b27f61a930718bcad
MD5 200eb880306f594023fd31af24c2a2c5
BLAKE2b-256 854e1884bf8cd4f84d9e93303aa7973bf14245bd9177c89e0542a7f2967b79c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page