A lightweight Python library for tracking OpenAI and Anthropic SDK costs with budget alerts

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

tokencost

A lightweight Python library for tracking LLM API costs with budget alerts and spending limits. Works directly with OpenAI and Anthropic SDKs.

Installation

pip install llm-tokencost

With provider SDKs:

# For OpenAI SDK integration
pip install llm-tokencost[openai]

# For Anthropic SDK integration
pip install llm-tokencost[anthropic]

# For all providers
pip install llm-tokencost[all]

Quick Start

With OpenAI SDK

from openai import OpenAI
from tokencost import CostTracker, track_openai

tracker = CostTracker(budget=1.0)
client = track_openai(OpenAI(), tracker)

# Use the client as normal - costs are tracked automatically
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(f"Cost: ${tracker.total_cost:.6f}")

With Anthropic SDK

from anthropic import Anthropic
from tokencost import CostTracker, track_anthropic

tracker = CostTracker(budget=1.0)
client = track_anthropic(Anthropic(), tracker)

# Use the client as normal - costs are tracked automatically
response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)

print(f"Cost: ${tracker.total_cost:.6f}")

With Budget Alerts

from openai import OpenAI
from tokencost import CostTracker, BudgetExceededError, track_openai

def alert(tracker):
    print(f"Budget exceeded! Spent ${tracker.total_cost:.2f}")

tracker = CostTracker(
    budget=5.00,
    on_budget_exceeded=alert,
    raise_on_budget=True
)

client = track_openai(OpenAI(), tracker)

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except BudgetExceededError as e:
    print(f"Stopped at ${e.total_cost:.2f} (budget: ${e.budget:.2f})")

print(f"Total: ${tracker.total_cost:.4f} across {tracker.request_count} requests")

Features

Real-time cost tracking during LLM API calls
Budget alerts via callback and/or exception
OpenAI SDK support — track chat.completions and embeddings
Anthropic SDK support — track messages API
Async support — works with AsyncOpenAI and AsyncAnthropic
Streaming support — costs tracked after stream completes
Per-model cost aggregation via cost_by_model property
RAG cost tracking — separate budgets for embeddings vs completions
Automatic exit summary — prints cost report when program ends
Thread-safe for concurrent usage
Accurate pricing for 1600+ models via litellm's pricing database

OpenAI SDK Integration

Wrapping a Client

Use track_openai() to wrap an OpenAI client instance:

from openai import OpenAI, AsyncOpenAI
from tokencost import CostTracker, track_openai

tracker = CostTracker(budget=1.0)

# Wrap sync client
client = track_openai(OpenAI(), tracker)

# Or wrap async client
async_client = track_openai(AsyncOpenAI(), tracker)

# Both chat completions and embeddings are tracked
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

embeddings = client.embeddings.create(
    model="text-embedding-3-small",
    input=["Hello world"]
)

print(f"Total: ${tracker.total_cost:.6f}")
print(f"Completions: ${tracker.completion_cost:.6f}")
print(f"Embeddings: ${tracker.embedding_cost:.6f}")

Global Patching

Use patch_openai() to automatically track all OpenAI client instances:

from openai import OpenAI
from tokencost import CostTracker, patch_openai, unpatch_openai

tracker = CostTracker()
patch_openai(tracker)

# All clients now track costs automatically
client = OpenAI()
response = client.chat.completions.create(...)

print(f"Cost: ${tracker.total_cost:.6f}")

# Remove patches when done
unpatch_openai()

Streaming Support

Streaming responses are fully supported with automatic cost tracking:

client = track_openai(OpenAI(), tracker)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

# Cost is tracked after stream completes
print(f"\nCost: ${tracker.total_cost:.6f}")

Anthropic SDK Integration

Wrapping a Client

Use track_anthropic() to wrap an Anthropic client instance:

from anthropic import Anthropic, AsyncAnthropic
from tokencost import CostTracker, track_anthropic

tracker = CostTracker(budget=1.0)

# Wrap sync client
client = track_anthropic(Anthropic(), tracker)

# Or wrap async client
async_client = track_anthropic(AsyncAnthropic(), tracker)

# Messages are tracked automatically
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)

print(f"Cost: ${tracker.total_cost:.6f}")

Global Patching

Use patch_anthropic() to automatically track all Anthropic client instances:

from anthropic import Anthropic
from tokencost import CostTracker, patch_anthropic, unpatch_anthropic

tracker = CostTracker()
patch_anthropic(tracker)

# All clients now track costs automatically
client = Anthropic()
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)

print(f"Cost: ${tracker.total_cost:.6f}")

# Remove patches when done
unpatch_anthropic()

Streaming Support

Streaming responses are fully supported:

client = track_anthropic(Anthropic(), tracker)

with client.messages.stream(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="")

# Cost is tracked after stream completes
print(f"\nCost: ${tracker.total_cost:.6f}")

Note: Anthropic does not provide embedding models. For embeddings, use OpenAI, Voyage AI, or other embedding providers.

RAG Cost Tracking

For RAG applications, you can set separate budgets for embeddings and completions:

from tokencost import (
    CostTracker,
    EmbeddingBudgetExceededError,
    CompletionBudgetExceededError,
)

tracker = CostTracker(
    budget=1.00,              # Total budget
    embedding_budget=0.10,    # Limit embedding costs
    completion_budget=0.90,   # Limit completion costs
    raise_on_budget=True
)

# With separate callbacks
tracker = CostTracker(
    embedding_budget=0.10,
    completion_budget=0.50,
    on_embedding_budget_exceeded=lambda t: print("Embedding budget exceeded!"),
    on_completion_budget_exceeded=lambda t: print("Completion budget exceeded!"),
)

# Track costs by type
print(f"Embedding cost: ${tracker.embedding_cost:.6f} ({tracker.embedding_count} requests)")
print(f"Completion cost: ${tracker.completion_cost:.6f} ({tracker.completion_count} requests)")

# Check budget status
print(f"Embedding budget exceeded: {tracker.embedding_budget_exceeded}")
print(f"Completion budget exceeded: {tracker.completion_budget_exceeded}")

Async Support

import asyncio
from openai import AsyncOpenAI
from anthropic import AsyncAnthropic
from tokencost import CostTracker, track_openai, track_anthropic

async def main():
    tracker = CostTracker()

    # Async OpenAI
    openai_client = track_openai(AsyncOpenAI(), tracker)
    response = await openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )

    # Async Anthropic
    anthropic_client = track_anthropic(AsyncAnthropic(), tracker)
    response = await anthropic_client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello!"}]
    )

    print(f"Cost: ${tracker.total_cost:.6f}")

asyncio.run(main())

Per-Model Cost Breakdown

from openai import OpenAI
from anthropic import Anthropic
from tokencost import CostTracker, track_openai, track_anthropic

tracker = CostTracker()

openai_client = track_openai(OpenAI(), tracker)
anthropic_client = track_anthropic(Anthropic(), tracker)

# Make calls to different models...
openai_client.chat.completions.create(model="gpt-4o", messages=[...])
openai_client.chat.completions.create(model="gpt-4o-mini", messages=[...])
anthropic_client.messages.create(model="claude-opus-4-6", max_tokens=1024, messages=[...])

# Get cost breakdown by model
for model, cost in tracker.cost_by_model.items():
    print(f"{model}: ${cost:.6f}")

API Reference

CostTracker

CostTracker(
    budget: float | None = None,           # Total spending limit in USD
    embedding_budget: float | None = None, # Embedding-specific budget
    completion_budget: float | None = None,# Completion-specific budget
    on_budget_exceeded: Callable | None = None,  # Callback when total exceeded
    on_embedding_budget_exceeded: Callable | None = None,  # Callback for embeddings
    on_completion_budget_exceeded: Callable | None = None, # Callback for completions
    raise_on_budget: bool = False,         # Raise exception when exceeded
    print_summary: bool = True             # Print summary on program exit
)

Properties:

total_cost: float — Running total in USD
request_count: int — Number of successful requests
history: list[dict] — All logged requests
budget: float | None — Configured total budget
budget_exceeded: bool — Whether total budget has been exceeded
cost_by_model: dict[str, float] — Cost aggregated by model name
embedding_cost: float — Total embedding cost in USD
completion_cost: float — Total completion cost in USD
embedding_count: int — Number of embedding requests
completion_count: int — Number of completion requests
embedding_budget: float | None — Configured embedding budget
completion_budget: float | None — Configured completion budget
embedding_budget_exceeded: bool — Whether embedding budget exceeded
completion_budget_exceeded: bool — Whether completion budget exceeded
cost_by_request_type: dict[str, float] — Cost breakdown by type

Methods:

reset() — Clear all tracked data

OpenAI Integration

# Wrap a client instance
track_openai(client, tracker) -> WrappedClient

# Global patching
patch_openai(tracker)   # Patch all OpenAI clients
unpatch_openai()        # Remove patches

Anthropic Integration

# Wrap a client instance
track_anthropic(client, tracker) -> WrappedClient

# Global patching
patch_anthropic(tracker)   # Patch all Anthropic clients
unpatch_anthropic()        # Remove patches

Exceptions

class BudgetExceededError(Exception):
    budget: float       # Configured budget
    total_cost: float   # Actual spend when exceeded

class EmbeddingBudgetExceededError(BudgetExceededError):
    # Raised when embedding budget is exceeded

class CompletionBudgetExceededError(BudgetExceededError):
    # Raised when completion budget is exceeded

Pricing Utilities

from tokencost import (
    calculate_cost,
    calculate_embedding_cost,
    get_model_pricing,
    is_embedding_model,
    list_models,
)

# Calculate cost for a completion
cost = calculate_cost("gpt-4o", prompt_tokens=1000, completion_tokens=500)

# Calculate cost for embeddings
cost = calculate_embedding_cost("text-embedding-3-small", input_tokens=1000)

# Get pricing info for a model
pricing = get_model_pricing("gpt-4o")
print(pricing["input_cost_per_token"])

# Check if model is an embedding model
is_embedding_model("text-embedding-3-small")  # True

# List all supported models
models = list_models()

Exit Summary

When your program ends, a cost summary is automatically printed:

==================================================
LLM COST SUMMARY
==================================================
Total Cost:     $0.002459
Total Requests: 5
Total Budget:   $1.00 (OK)
Remaining:      $0.997541
--------------------------------------------------
By Type:
  Embeddings:  1 requests = $0.000500 | Budget: $0.10 (OK)
  Completions: 4 requests = $0.001959 | Budget: $0.90 (OK)
--------------------------------------------------
Requests:
  1. [C] gpt-4: 7+18 tokens = $0.000750
  2. [C] gpt-4: 13+17 tokens = $0.000900
  3. [E] text-embedding-3-small: 100+0 tokens = $0.000500
  4. [C] gpt-3.5-turbo: 8+82 tokens = $0.000166
  5. [C] gpt-3.5-turbo: 10+58 tokens = $0.000143
==================================================

[C] = Completion, [E] = Embedding. Disable with print_summary=False.

History Entry Format

Each request is logged with:

{
    "model": "gpt-4",
    "prompt_tokens": 150,
    "completion_tokens": 50,
    "cost": 0.0123,
    "timestamp": "2026-02-22T10:30:00Z",
    "request_type": "completion"  # or "embedding"
}

Development

git clone https://github.com/Paawan13/llm-tokencost.git
cd tokencost
pip install -e ".[dev]"
pytest

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Paawan13

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.7.0

Feb 27, 2026

0.3.0

Feb 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_tokencost-0.7.0.tar.gz (30.9 kB view details)

Uploaded Feb 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_tokencost-0.7.0-py3-none-any.whl (23.1 kB view details)

Uploaded Feb 27, 2026 Python 3

File details

Details for the file llm_tokencost-0.7.0.tar.gz.

File metadata

Download URL: llm_tokencost-0.7.0.tar.gz
Upload date: Feb 27, 2026
Size: 30.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_tokencost-0.7.0.tar.gz
Algorithm	Hash digest
SHA256	`c5ea5974274bf981cdee1a89be45eba8c2ed92aff32d04931ac4c609d2cba0a5`
MD5	`3e92b76e985bf3eaab4e07a54d017392`
BLAKE2b-256	`b4dcbd06a3b93d1b26763f34537c670b38729cc7ab07b8f3752f1ac52026558c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_tokencost-0.7.0.tar.gz:

Publisher: publish.yml on Paawan13/llm-tokencost

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llm_tokencost-0.7.0.tar.gz
- Subject digest: c5ea5974274bf981cdee1a89be45eba8c2ed92aff32d04931ac4c609d2cba0a5
- Sigstore transparency entry: 1003332254
- Sigstore integration time: Feb 27, 2026
Source repository:
- Permalink: Paawan13/llm-tokencost@7d4267efea719b466c2d6449fa7e4cfaf40451a7
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Paawan13
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@7d4267efea719b466c2d6449fa7e4cfaf40451a7
- Trigger Event: workflow_dispatch

File details

Details for the file llm_tokencost-0.7.0-py3-none-any.whl.

File metadata

Download URL: llm_tokencost-0.7.0-py3-none-any.whl
Upload date: Feb 27, 2026
Size: 23.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_tokencost-0.7.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fddeda7daa142994397f61611fc6253186773f5f900ab2f0ab4ced747963ba6b`
MD5	`7d4069b43a6a1cb681a52b0f7ff864f4`
BLAKE2b-256	`a1f821706ef5b93c522be09f06586f11d4e02c639a18465825976e53872ec9a9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_tokencost-0.7.0-py3-none-any.whl:

Publisher: publish.yml on Paawan13/llm-tokencost

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llm_tokencost-0.7.0-py3-none-any.whl
- Subject digest: fddeda7daa142994397f61611fc6253186773f5f900ab2f0ab4ced747963ba6b
- Sigstore transparency entry: 1003332262
- Sigstore integration time: Feb 27, 2026
Source repository:
- Permalink: Paawan13/llm-tokencost@7d4267efea719b466c2d6449fa7e4cfaf40451a7
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Paawan13
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@7d4267efea719b466c2d6449fa7e4cfaf40451a7
- Trigger Event: workflow_dispatch

llm-tokencost 0.7.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

tokencost

Installation

Quick Start

With OpenAI SDK

With Anthropic SDK

With Budget Alerts

Features

OpenAI SDK Integration

Wrapping a Client

Global Patching

Streaming Support

Anthropic SDK Integration

Wrapping a Client

Global Patching

Streaming Support

RAG Cost Tracking

Async Support

Per-Model Cost Breakdown

API Reference

CostTracker

OpenAI Integration

Anthropic Integration

Exceptions

Pricing Utilities

Exit Summary

History Entry Format

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance