A lightweight Python library for tracking OpenAI and Anthropic SDK costs with budget alerts
Project description
tokencost
A lightweight Python library for tracking LLM API costs with budget alerts and spending limits. Works directly with OpenAI and Anthropic SDKs.
Installation
pip install llm-tokencost
With provider SDKs:
# For OpenAI SDK integration
pip install llm-tokencost[openai]
# For Anthropic SDK integration
pip install llm-tokencost[anthropic]
# For all providers
pip install llm-tokencost[all]
Quick Start
With OpenAI SDK
from openai import OpenAI
from tokencost import CostTracker, track_openai
tracker = CostTracker(budget=1.0)
client = track_openai(OpenAI(), tracker)
# Use the client as normal - costs are tracked automatically
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
print(f"Cost: ${tracker.total_cost:.6f}")
With Anthropic SDK
from anthropic import Anthropic
from tokencost import CostTracker, track_anthropic
tracker = CostTracker(budget=1.0)
client = track_anthropic(Anthropic(), tracker)
# Use the client as normal - costs are tracked automatically
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}]
)
print(f"Cost: ${tracker.total_cost:.6f}")
With Budget Alerts
from openai import OpenAI
from tokencost import CostTracker, BudgetExceededError, track_openai
def alert(tracker):
print(f"Budget exceeded! Spent ${tracker.total_cost:.2f}")
tracker = CostTracker(
budget=5.00,
on_budget_exceeded=alert,
raise_on_budget=True
)
client = track_openai(OpenAI(), tracker)
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
except BudgetExceededError as e:
print(f"Stopped at ${e.total_cost:.2f} (budget: ${e.budget:.2f})")
print(f"Total: ${tracker.total_cost:.4f} across {tracker.request_count} requests")
Features
- Real-time cost tracking during LLM API calls
- Budget alerts via callback and/or exception
- OpenAI SDK support — track
chat.completionsandembeddings - Anthropic SDK support — track
messagesAPI - Async support — works with
AsyncOpenAIandAsyncAnthropic - Streaming support — costs tracked after stream completes
- Per-model cost aggregation via
cost_by_modelproperty - RAG cost tracking — separate budgets for embeddings vs completions
- Automatic exit summary — prints cost report when program ends
- Thread-safe for concurrent usage
- Accurate pricing for 1600+ models via litellm's pricing database
OpenAI SDK Integration
Wrapping a Client
Use track_openai() to wrap an OpenAI client instance:
from openai import OpenAI, AsyncOpenAI
from tokencost import CostTracker, track_openai
tracker = CostTracker(budget=1.0)
# Wrap sync client
client = track_openai(OpenAI(), tracker)
# Or wrap async client
async_client = track_openai(AsyncOpenAI(), tracker)
# Both chat completions and embeddings are tracked
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
embeddings = client.embeddings.create(
model="text-embedding-3-small",
input=["Hello world"]
)
print(f"Total: ${tracker.total_cost:.6f}")
print(f"Completions: ${tracker.completion_cost:.6f}")
print(f"Embeddings: ${tracker.embedding_cost:.6f}")
Global Patching
Use patch_openai() to automatically track all OpenAI client instances:
from openai import OpenAI
from tokencost import CostTracker, patch_openai, unpatch_openai
tracker = CostTracker()
patch_openai(tracker)
# All clients now track costs automatically
client = OpenAI()
response = client.chat.completions.create(...)
print(f"Cost: ${tracker.total_cost:.6f}")
# Remove patches when done
unpatch_openai()
Streaming Support
Streaming responses are fully supported with automatic cost tracking:
client = track_openai(OpenAI(), tracker)
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
# Cost is tracked after stream completes
print(f"\nCost: ${tracker.total_cost:.6f}")
Anthropic SDK Integration
Wrapping a Client
Use track_anthropic() to wrap an Anthropic client instance:
from anthropic import Anthropic, AsyncAnthropic
from tokencost import CostTracker, track_anthropic
tracker = CostTracker(budget=1.0)
# Wrap sync client
client = track_anthropic(Anthropic(), tracker)
# Or wrap async client
async_client = track_anthropic(AsyncAnthropic(), tracker)
# Messages are tracked automatically
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}]
)
print(f"Cost: ${tracker.total_cost:.6f}")
Global Patching
Use patch_anthropic() to automatically track all Anthropic client instances:
from anthropic import Anthropic
from tokencost import CostTracker, patch_anthropic, unpatch_anthropic
tracker = CostTracker()
patch_anthropic(tracker)
# All clients now track costs automatically
client = Anthropic()
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}]
)
print(f"Cost: ${tracker.total_cost:.6f}")
# Remove patches when done
unpatch_anthropic()
Streaming Support
Streaming responses are fully supported:
client = track_anthropic(Anthropic(), tracker)
with client.messages.stream(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}]
) as stream:
for text in stream.text_stream:
print(text, end="")
# Cost is tracked after stream completes
print(f"\nCost: ${tracker.total_cost:.6f}")
Note: Anthropic does not provide embedding models. For embeddings, use OpenAI, Voyage AI, or other embedding providers.
RAG Cost Tracking
For RAG applications, you can set separate budgets for embeddings and completions:
from tokencost import (
CostTracker,
EmbeddingBudgetExceededError,
CompletionBudgetExceededError,
)
tracker = CostTracker(
budget=1.00, # Total budget
embedding_budget=0.10, # Limit embedding costs
completion_budget=0.90, # Limit completion costs
raise_on_budget=True
)
# With separate callbacks
tracker = CostTracker(
embedding_budget=0.10,
completion_budget=0.50,
on_embedding_budget_exceeded=lambda t: print("Embedding budget exceeded!"),
on_completion_budget_exceeded=lambda t: print("Completion budget exceeded!"),
)
# Track costs by type
print(f"Embedding cost: ${tracker.embedding_cost:.6f} ({tracker.embedding_count} requests)")
print(f"Completion cost: ${tracker.completion_cost:.6f} ({tracker.completion_count} requests)")
# Check budget status
print(f"Embedding budget exceeded: {tracker.embedding_budget_exceeded}")
print(f"Completion budget exceeded: {tracker.completion_budget_exceeded}")
Async Support
import asyncio
from openai import AsyncOpenAI
from anthropic import AsyncAnthropic
from tokencost import CostTracker, track_openai, track_anthropic
async def main():
tracker = CostTracker()
# Async OpenAI
openai_client = track_openai(AsyncOpenAI(), tracker)
response = await openai_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
# Async Anthropic
anthropic_client = track_anthropic(AsyncAnthropic(), tracker)
response = await anthropic_client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}]
)
print(f"Cost: ${tracker.total_cost:.6f}")
asyncio.run(main())
Per-Model Cost Breakdown
from openai import OpenAI
from anthropic import Anthropic
from tokencost import CostTracker, track_openai, track_anthropic
tracker = CostTracker()
openai_client = track_openai(OpenAI(), tracker)
anthropic_client = track_anthropic(Anthropic(), tracker)
# Make calls to different models...
openai_client.chat.completions.create(model="gpt-4o", messages=[...])
openai_client.chat.completions.create(model="gpt-4o-mini", messages=[...])
anthropic_client.messages.create(model="claude-opus-4-6", max_tokens=1024, messages=[...])
# Get cost breakdown by model
for model, cost in tracker.cost_by_model.items():
print(f"{model}: ${cost:.6f}")
API Reference
CostTracker
CostTracker(
budget: float | None = None, # Total spending limit in USD
embedding_budget: float | None = None, # Embedding-specific budget
completion_budget: float | None = None,# Completion-specific budget
on_budget_exceeded: Callable | None = None, # Callback when total exceeded
on_embedding_budget_exceeded: Callable | None = None, # Callback for embeddings
on_completion_budget_exceeded: Callable | None = None, # Callback for completions
raise_on_budget: bool = False, # Raise exception when exceeded
print_summary: bool = True # Print summary on program exit
)
Properties:
total_cost: float— Running total in USDrequest_count: int— Number of successful requestshistory: list[dict]— All logged requestsbudget: float | None— Configured total budgetbudget_exceeded: bool— Whether total budget has been exceededcost_by_model: dict[str, float]— Cost aggregated by model nameembedding_cost: float— Total embedding cost in USDcompletion_cost: float— Total completion cost in USDembedding_count: int— Number of embedding requestscompletion_count: int— Number of completion requestsembedding_budget: float | None— Configured embedding budgetcompletion_budget: float | None— Configured completion budgetembedding_budget_exceeded: bool— Whether embedding budget exceededcompletion_budget_exceeded: bool— Whether completion budget exceededcost_by_request_type: dict[str, float]— Cost breakdown by type
Methods:
reset()— Clear all tracked data
OpenAI Integration
# Wrap a client instance
track_openai(client, tracker) -> WrappedClient
# Global patching
patch_openai(tracker) # Patch all OpenAI clients
unpatch_openai() # Remove patches
Anthropic Integration
# Wrap a client instance
track_anthropic(client, tracker) -> WrappedClient
# Global patching
patch_anthropic(tracker) # Patch all Anthropic clients
unpatch_anthropic() # Remove patches
Exceptions
class BudgetExceededError(Exception):
budget: float # Configured budget
total_cost: float # Actual spend when exceeded
class EmbeddingBudgetExceededError(BudgetExceededError):
# Raised when embedding budget is exceeded
class CompletionBudgetExceededError(BudgetExceededError):
# Raised when completion budget is exceeded
Pricing Utilities
from tokencost import (
calculate_cost,
calculate_embedding_cost,
get_model_pricing,
is_embedding_model,
list_models,
)
# Calculate cost for a completion
cost = calculate_cost("gpt-4o", prompt_tokens=1000, completion_tokens=500)
# Calculate cost for embeddings
cost = calculate_embedding_cost("text-embedding-3-small", input_tokens=1000)
# Get pricing info for a model
pricing = get_model_pricing("gpt-4o")
print(pricing["input_cost_per_token"])
# Check if model is an embedding model
is_embedding_model("text-embedding-3-small") # True
# List all supported models
models = list_models()
Exit Summary
When your program ends, a cost summary is automatically printed:
==================================================
LLM COST SUMMARY
==================================================
Total Cost: $0.002459
Total Requests: 5
Total Budget: $1.00 (OK)
Remaining: $0.997541
--------------------------------------------------
By Type:
Embeddings: 1 requests = $0.000500 | Budget: $0.10 (OK)
Completions: 4 requests = $0.001959 | Budget: $0.90 (OK)
--------------------------------------------------
Requests:
1. [C] gpt-4: 7+18 tokens = $0.000750
2. [C] gpt-4: 13+17 tokens = $0.000900
3. [E] text-embedding-3-small: 100+0 tokens = $0.000500
4. [C] gpt-3.5-turbo: 8+82 tokens = $0.000166
5. [C] gpt-3.5-turbo: 10+58 tokens = $0.000143
==================================================
[C] = Completion, [E] = Embedding. Disable with print_summary=False.
History Entry Format
Each request is logged with:
{
"model": "gpt-4",
"prompt_tokens": 150,
"completion_tokens": 50,
"cost": 0.0123,
"timestamp": "2026-02-22T10:30:00Z",
"request_type": "completion" # or "embedding"
}
Development
git clone https://github.com/Paawan13/llm-tokencost.git
cd tokencost
pip install -e ".[dev]"
pytest
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_tokencost-0.7.0.tar.gz.
File metadata
- Download URL: llm_tokencost-0.7.0.tar.gz
- Upload date:
- Size: 30.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5ea5974274bf981cdee1a89be45eba8c2ed92aff32d04931ac4c609d2cba0a5
|
|
| MD5 |
3e92b76e985bf3eaab4e07a54d017392
|
|
| BLAKE2b-256 |
b4dcbd06a3b93d1b26763f34537c670b38729cc7ab07b8f3752f1ac52026558c
|
Provenance
The following attestation bundles were made for llm_tokencost-0.7.0.tar.gz:
Publisher:
publish.yml on Paawan13/llm-tokencost
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llm_tokencost-0.7.0.tar.gz -
Subject digest:
c5ea5974274bf981cdee1a89be45eba8c2ed92aff32d04931ac4c609d2cba0a5 - Sigstore transparency entry: 1003332254
- Sigstore integration time:
-
Permalink:
Paawan13/llm-tokencost@7d4267efea719b466c2d6449fa7e4cfaf40451a7 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Paawan13
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7d4267efea719b466c2d6449fa7e4cfaf40451a7 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file llm_tokencost-0.7.0-py3-none-any.whl.
File metadata
- Download URL: llm_tokencost-0.7.0-py3-none-any.whl
- Upload date:
- Size: 23.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fddeda7daa142994397f61611fc6253186773f5f900ab2f0ab4ced747963ba6b
|
|
| MD5 |
7d4069b43a6a1cb681a52b0f7ff864f4
|
|
| BLAKE2b-256 |
a1f821706ef5b93c522be09f06586f11d4e02c639a18465825976e53872ec9a9
|
Provenance
The following attestation bundles were made for llm_tokencost-0.7.0-py3-none-any.whl:
Publisher:
publish.yml on Paawan13/llm-tokencost
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llm_tokencost-0.7.0-py3-none-any.whl -
Subject digest:
fddeda7daa142994397f61611fc6253186773f5f900ab2f0ab4ced747963ba6b - Sigstore transparency entry: 1003332262
- Sigstore integration time:
-
Permalink:
Paawan13/llm-tokencost@7d4267efea719b466c2d6449fa7e4cfaf40451a7 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Paawan13
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7d4267efea719b466c2d6449fa7e4cfaf40451a7 -
Trigger Event:
workflow_dispatch
-
Statement type: