Skip to main content

A lightweight Python library for tracking OpenAI and Anthropic SDK costs with budget alerts

Project description

tokencost

A lightweight Python library for tracking LLM API costs with budget alerts and spending limits. Works directly with OpenAI and Anthropic SDKs.

Installation

pip install llm-tokencost

With provider SDKs:

# For OpenAI SDK integration
pip install llm-tokencost[openai]

# For Anthropic SDK integration
pip install llm-tokencost[anthropic]

# For all providers
pip install llm-tokencost[all]

Quick Start

With OpenAI SDK

from openai import OpenAI
from tokencost import CostTracker, track_openai

tracker = CostTracker(budget=1.0)
client = track_openai(OpenAI(), tracker)

# Use the client as normal - costs are tracked automatically
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(f"Cost: ${tracker.total_cost:.6f}")

With Anthropic SDK

from anthropic import Anthropic
from tokencost import CostTracker, track_anthropic

tracker = CostTracker(budget=1.0)
client = track_anthropic(Anthropic(), tracker)

# Use the client as normal - costs are tracked automatically
response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)

print(f"Cost: ${tracker.total_cost:.6f}")

With Budget Alerts

from openai import OpenAI
from tokencost import CostTracker, BudgetExceededError, track_openai

def alert(tracker):
    print(f"Budget exceeded! Spent ${tracker.total_cost:.2f}")

tracker = CostTracker(
    budget=5.00,
    on_budget_exceeded=alert,
    raise_on_budget=True
)

client = track_openai(OpenAI(), tracker)

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except BudgetExceededError as e:
    print(f"Stopped at ${e.total_cost:.2f} (budget: ${e.budget:.2f})")

print(f"Total: ${tracker.total_cost:.4f} across {tracker.request_count} requests")

Features

  • Real-time cost tracking during LLM API calls
  • Budget alerts via callback and/or exception
  • OpenAI SDK support — track chat.completions and embeddings
  • Anthropic SDK support — track messages API
  • Async support — works with AsyncOpenAI and AsyncAnthropic
  • Streaming support — costs tracked after stream completes
  • Per-model cost aggregation via cost_by_model property
  • RAG cost tracking — separate budgets for embeddings vs completions
  • Automatic exit summary — prints cost report when program ends
  • Thread-safe for concurrent usage
  • Accurate pricing for 1600+ models via litellm's pricing database

OpenAI SDK Integration

Wrapping a Client

Use track_openai() to wrap an OpenAI client instance:

from openai import OpenAI, AsyncOpenAI
from tokencost import CostTracker, track_openai

tracker = CostTracker(budget=1.0)

# Wrap sync client
client = track_openai(OpenAI(), tracker)

# Or wrap async client
async_client = track_openai(AsyncOpenAI(), tracker)

# Both chat completions and embeddings are tracked
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

embeddings = client.embeddings.create(
    model="text-embedding-3-small",
    input=["Hello world"]
)

print(f"Total: ${tracker.total_cost:.6f}")
print(f"Completions: ${tracker.completion_cost:.6f}")
print(f"Embeddings: ${tracker.embedding_cost:.6f}")

Global Patching

Use patch_openai() to automatically track all OpenAI client instances:

from openai import OpenAI
from tokencost import CostTracker, patch_openai, unpatch_openai

tracker = CostTracker()
patch_openai(tracker)

# All clients now track costs automatically
client = OpenAI()
response = client.chat.completions.create(...)

print(f"Cost: ${tracker.total_cost:.6f}")

# Remove patches when done
unpatch_openai()

Streaming Support

Streaming responses are fully supported with automatic cost tracking:

client = track_openai(OpenAI(), tracker)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

# Cost is tracked after stream completes
print(f"\nCost: ${tracker.total_cost:.6f}")

Anthropic SDK Integration

Wrapping a Client

Use track_anthropic() to wrap an Anthropic client instance:

from anthropic import Anthropic, AsyncAnthropic
from tokencost import CostTracker, track_anthropic

tracker = CostTracker(budget=1.0)

# Wrap sync client
client = track_anthropic(Anthropic(), tracker)

# Or wrap async client
async_client = track_anthropic(AsyncAnthropic(), tracker)

# Messages are tracked automatically
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)

print(f"Cost: ${tracker.total_cost:.6f}")

Global Patching

Use patch_anthropic() to automatically track all Anthropic client instances:

from anthropic import Anthropic
from tokencost import CostTracker, patch_anthropic, unpatch_anthropic

tracker = CostTracker()
patch_anthropic(tracker)

# All clients now track costs automatically
client = Anthropic()
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)

print(f"Cost: ${tracker.total_cost:.6f}")

# Remove patches when done
unpatch_anthropic()

Streaming Support

Streaming responses are fully supported:

client = track_anthropic(Anthropic(), tracker)

with client.messages.stream(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="")

# Cost is tracked after stream completes
print(f"\nCost: ${tracker.total_cost:.6f}")

Note: Anthropic does not provide embedding models. For embeddings, use OpenAI, Voyage AI, or other embedding providers.

RAG Cost Tracking

For RAG applications, you can set separate budgets for embeddings and completions:

from tokencost import (
    CostTracker,
    EmbeddingBudgetExceededError,
    CompletionBudgetExceededError,
)

tracker = CostTracker(
    budget=1.00,              # Total budget
    embedding_budget=0.10,    # Limit embedding costs
    completion_budget=0.90,   # Limit completion costs
    raise_on_budget=True
)

# With separate callbacks
tracker = CostTracker(
    embedding_budget=0.10,
    completion_budget=0.50,
    on_embedding_budget_exceeded=lambda t: print("Embedding budget exceeded!"),
    on_completion_budget_exceeded=lambda t: print("Completion budget exceeded!"),
)

# Track costs by type
print(f"Embedding cost: ${tracker.embedding_cost:.6f} ({tracker.embedding_count} requests)")
print(f"Completion cost: ${tracker.completion_cost:.6f} ({tracker.completion_count} requests)")

# Check budget status
print(f"Embedding budget exceeded: {tracker.embedding_budget_exceeded}")
print(f"Completion budget exceeded: {tracker.completion_budget_exceeded}")

Async Support

import asyncio
from openai import AsyncOpenAI
from anthropic import AsyncAnthropic
from tokencost import CostTracker, track_openai, track_anthropic

async def main():
    tracker = CostTracker()

    # Async OpenAI
    openai_client = track_openai(AsyncOpenAI(), tracker)
    response = await openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )

    # Async Anthropic
    anthropic_client = track_anthropic(AsyncAnthropic(), tracker)
    response = await anthropic_client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello!"}]
    )

    print(f"Cost: ${tracker.total_cost:.6f}")

asyncio.run(main())

Per-Model Cost Breakdown

from openai import OpenAI
from anthropic import Anthropic
from tokencost import CostTracker, track_openai, track_anthropic

tracker = CostTracker()

openai_client = track_openai(OpenAI(), tracker)
anthropic_client = track_anthropic(Anthropic(), tracker)

# Make calls to different models...
openai_client.chat.completions.create(model="gpt-4o", messages=[...])
openai_client.chat.completions.create(model="gpt-4o-mini", messages=[...])
anthropic_client.messages.create(model="claude-opus-4-6", max_tokens=1024, messages=[...])

# Get cost breakdown by model
for model, cost in tracker.cost_by_model.items():
    print(f"{model}: ${cost:.6f}")

API Reference

CostTracker

CostTracker(
    budget: float | None = None,           # Total spending limit in USD
    embedding_budget: float | None = None, # Embedding-specific budget
    completion_budget: float | None = None,# Completion-specific budget
    on_budget_exceeded: Callable | None = None,  # Callback when total exceeded
    on_embedding_budget_exceeded: Callable | None = None,  # Callback for embeddings
    on_completion_budget_exceeded: Callable | None = None, # Callback for completions
    raise_on_budget: bool = False,         # Raise exception when exceeded
    print_summary: bool = True             # Print summary on program exit
)

Properties:

  • total_cost: float — Running total in USD
  • request_count: int — Number of successful requests
  • history: list[dict] — All logged requests
  • budget: float | None — Configured total budget
  • budget_exceeded: bool — Whether total budget has been exceeded
  • cost_by_model: dict[str, float] — Cost aggregated by model name
  • embedding_cost: float — Total embedding cost in USD
  • completion_cost: float — Total completion cost in USD
  • embedding_count: int — Number of embedding requests
  • completion_count: int — Number of completion requests
  • embedding_budget: float | None — Configured embedding budget
  • completion_budget: float | None — Configured completion budget
  • embedding_budget_exceeded: bool — Whether embedding budget exceeded
  • completion_budget_exceeded: bool — Whether completion budget exceeded
  • cost_by_request_type: dict[str, float] — Cost breakdown by type

Methods:

  • reset() — Clear all tracked data

OpenAI Integration

# Wrap a client instance
track_openai(client, tracker) -> WrappedClient

# Global patching
patch_openai(tracker)   # Patch all OpenAI clients
unpatch_openai()        # Remove patches

Anthropic Integration

# Wrap a client instance
track_anthropic(client, tracker) -> WrappedClient

# Global patching
patch_anthropic(tracker)   # Patch all Anthropic clients
unpatch_anthropic()        # Remove patches

Exceptions

class BudgetExceededError(Exception):
    budget: float       # Configured budget
    total_cost: float   # Actual spend when exceeded

class EmbeddingBudgetExceededError(BudgetExceededError):
    # Raised when embedding budget is exceeded

class CompletionBudgetExceededError(BudgetExceededError):
    # Raised when completion budget is exceeded

Pricing Utilities

from tokencost import (
    calculate_cost,
    calculate_embedding_cost,
    get_model_pricing,
    is_embedding_model,
    list_models,
)

# Calculate cost for a completion
cost = calculate_cost("gpt-4o", prompt_tokens=1000, completion_tokens=500)

# Calculate cost for embeddings
cost = calculate_embedding_cost("text-embedding-3-small", input_tokens=1000)

# Get pricing info for a model
pricing = get_model_pricing("gpt-4o")
print(pricing["input_cost_per_token"])

# Check if model is an embedding model
is_embedding_model("text-embedding-3-small")  # True

# List all supported models
models = list_models()

Exit Summary

When your program ends, a cost summary is automatically printed:

==================================================
LLM COST SUMMARY
==================================================
Total Cost:     $0.002459
Total Requests: 5
Total Budget:   $1.00 (OK)
Remaining:      $0.997541
--------------------------------------------------
By Type:
  Embeddings:  1 requests = $0.000500 | Budget: $0.10 (OK)
  Completions: 4 requests = $0.001959 | Budget: $0.90 (OK)
--------------------------------------------------
Requests:
  1. [C] gpt-4: 7+18 tokens = $0.000750
  2. [C] gpt-4: 13+17 tokens = $0.000900
  3. [E] text-embedding-3-small: 100+0 tokens = $0.000500
  4. [C] gpt-3.5-turbo: 8+82 tokens = $0.000166
  5. [C] gpt-3.5-turbo: 10+58 tokens = $0.000143
==================================================

[C] = Completion, [E] = Embedding. Disable with print_summary=False.

History Entry Format

Each request is logged with:

{
    "model": "gpt-4",
    "prompt_tokens": 150,
    "completion_tokens": 50,
    "cost": 0.0123,
    "timestamp": "2026-02-22T10:30:00Z",
    "request_type": "completion"  # or "embedding"
}

Development

git clone https://github.com/Paawan13/llm-tokencost.git
cd tokencost
pip install -e ".[dev]"
pytest

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_tokencost-0.7.0.tar.gz (30.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_tokencost-0.7.0-py3-none-any.whl (23.1 kB view details)

Uploaded Python 3

File details

Details for the file llm_tokencost-0.7.0.tar.gz.

File metadata

  • Download URL: llm_tokencost-0.7.0.tar.gz
  • Upload date:
  • Size: 30.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_tokencost-0.7.0.tar.gz
Algorithm Hash digest
SHA256 c5ea5974274bf981cdee1a89be45eba8c2ed92aff32d04931ac4c609d2cba0a5
MD5 3e92b76e985bf3eaab4e07a54d017392
BLAKE2b-256 b4dcbd06a3b93d1b26763f34537c670b38729cc7ab07b8f3752f1ac52026558c

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_tokencost-0.7.0.tar.gz:

Publisher: publish.yml on Paawan13/llm-tokencost

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llm_tokencost-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: llm_tokencost-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 23.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_tokencost-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fddeda7daa142994397f61611fc6253186773f5f900ab2f0ab4ced747963ba6b
MD5 7d4069b43a6a1cb681a52b0f7ff864f4
BLAKE2b-256 a1f821706ef5b93c522be09f06586f11d4e02c639a18465825976e53872ec9a9

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_tokencost-0.7.0-py3-none-any.whl:

Publisher: publish.yml on Paawan13/llm-tokencost

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page