Universal LLM token counting and cost management. Track, compare, and optimize your LLM API spending.
Project description
tokonomix
Know what your LLM calls actually cost.
tokonomix is a lightweight Python library for token counting and cost management across every major LLM provider. It replaces guesswork with exact numbers — so you can track spending, set budgets, and compare providers before committing to a model.
from tokonomix import estimate_cost, calculate_cost, compare_models
# How much will this prompt cost?
est = estimate_cost("Explain quantum computing in simple terms", model="gpt-4o")
print(f"Input: {est.estimated_input_tokens} tokens, ${est.estimated_input_cost:.6f}")
# Track exact costs from API responses
usage = calculate_cost("claude-sonnet-4-20250514", input_tokens=1500, output_tokens=800)
print(f"Total: ${usage.total_cost:.6f}")
# Which model is cheapest for this prompt?
results = compare_models("Your long prompt here...", output_tokens=2000)
for r in results[:5]:
print(f" {r['model']:<30} ${r['total_cost']:.6f}")
Why tokonomix?
Every team using LLM APIs has the same question: "how much is this costing us?"
Existing solutions are either unmaintained (tokencost hasn't been updated in 7 months), locked behind a signup wall, or buried inside massive frameworks. tokonomix is none of those things. It's a focused library that does one job well.
What you get:
- Accurate pricing for 40+ models across OpenAI, Anthropic, Google, Mistral, DeepSeek, xAI, and Cohere
- Proper handling of cached input tokens, thinking/reasoning tokens, and batch pricing
- Token counting via tiktoken (with a fallback estimator when tiktoken isn't installed)
- Cost tracking with decorators and context managers
- Budget management with threshold alerts
- Cross-provider comparison to find the cheapest model for any input
- A CLI for quick estimates without writing code
Install
pip install tokonomix
For token counting with tiktoken (recommended):
pip install tokonomix[tiktoken]
For the CLI:
pip install tokonomix[cli]
Everything:
pip install tokonomix[all]
Usage
Count tokens
from tokonomix import count_tokens, count_message_tokens
count_tokens("Hello, world!", model="gpt-4o")
# 4
count_message_tokens(
[
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "What is 2+2?"},
],
model="gpt-4o",
)
# 18
Estimate cost before calling the API
from tokonomix import estimate_cost
est = estimate_cost("Your prompt text here", model="claude-sonnet-4-20250514")
print(est.estimated_input_tokens) # 5
print(est.estimated_input_cost) # Decimal('0.000015')
print(est.estimated_max_output_cost) # Decimal('0.240000')
Calculate exact cost from API response
from tokonomix import calculate_cost
# After getting token counts from the API response:
usage = calculate_cost(
model="gpt-4o",
input_tokens=1500,
output_tokens=800,
cached_tokens=500, # prompt caching discount
)
print(usage.total_cost) # Decimal('0.010875')
print(usage.input_cost) # Decimal('0.002875')
print(usage.output_cost) # Decimal('0.008000')
Track costs across multiple calls
from tokonomix import CostTracker
with CostTracker() as tracker:
# After each API call, record the usage:
tracker.record("gpt-4o", input_tokens=500, output_tokens=200)
tracker.record("gpt-4o", input_tokens=300, output_tokens=150)
tracker.record("claude-sonnet-4-20250514", input_tokens=1000, output_tokens=500)
print(tracker.total_cost) # Decimal('0.013325')
print(tracker.by_model()) # {'claude-sonnet-4-20250514': ..., 'gpt-4o': ...}
print(tracker.by_provider()) # {'anthropic': ..., 'openai': ...}
print(tracker.summary())
Use the decorator for automatic tracking
from tokonomix import track_cost
@track_cost(model="gpt-4o")
def ask_gpt(prompt: str) -> dict:
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
)
return {
"text": response.choices[0].message.content,
"input_tokens": response.usage.prompt_tokens,
"output_tokens": response.usage.completion_tokens,
}
result = ask_gpt("What is Python?")
print(ask_gpt.get_last_usage().total_cost)
print(ask_gpt.get_total_cost())
Set budgets
from tokonomix import Budget, BudgetExceededError
budget = Budget(limit=5.00, period="daily")
budget.on_threshold(0.8, lambda b: print(f"Warning: {b.utilization:.0%} of daily budget used"))
# In your API call loop:
try:
usage = calculate_cost("gpt-4o", input_tokens=1000, output_tokens=500)
budget.record(usage.total_cost)
except BudgetExceededError:
print("Daily budget exceeded, switching to cheaper model")
Compare providers
from tokonomix import compare_models, cheapest_model, format_comparison
# Compare all models
results = compare_models("Your prompt text", output_tokens=1000)
print(format_comparison(results, top_n=10))
# Find the absolute cheapest
model = cheapest_model("Your prompt text", min_context_window=128000)
print(f"Use {model.model_id}: ${model.input_per_million}/M input")
Look up model details
from tokonomix import get_model, list_models, find_models, Provider
model = get_model("gpt-4o")
print(model.input_per_million) # Decimal('2.50')
print(model.cached_input_per_million) # Decimal('1.25')
print(model.context_window) # 128000
# List all Anthropic models
for m in list_models(Provider.ANTHROPIC):
print(f"{m.model_id}: ${m.input_per_million}/M in, ${m.output_per_million}/M out")
# Search models
for m in find_models("claude"):
print(m.model_id)
CLI
# Estimate cost for a prompt
tokonomix estimate "What is the meaning of life?" -m gpt-4o
# Estimate from a file
tokonomix estimate @prompt.txt -m claude-sonnet-4-20250514
# Compare costs across all providers
tokonomix compare "Your prompt" -n 10
# Filter by provider
tokonomix compare "Your prompt" -p openai,anthropic
# List all models
tokonomix models
# List models for a specific provider
tokonomix models -p google
# Get detailed pricing
tokonomix price gpt-4.1
# Find the cheapest model
tokonomix cheapest "Your prompt" -c 128000
Supported Models
| Provider | Models | Cached Pricing | Thinking Tokens |
|---|---|---|---|
| OpenAI | GPT-4.1, GPT-4o, o1, o3, o4-mini, embeddings | Yes | Yes (o-series) |
| Anthropic | Claude Opus 4, Sonnet 4, 3.7/3.5 Sonnet, 3.5 Haiku | Yes | Yes (Opus 4) |
| Gemini 2.5 Pro/Flash, 2.0 Flash, 1.5 Pro/Flash | Yes | Yes (2.5 series) | |
| Mistral | Large, Small, Codestral, Pixtral | No | No |
| DeepSeek | Chat (V3), Reasoner (R1) | Yes | Yes (Reasoner) |
| xAI | Grok-2, Grok-3, Grok-3 Mini | No | Yes (Mini) |
| Cohere | Command R+, Command R, embeddings | No | No |
Prices are verified against official provider pricing pages. If you notice a discrepancy, please open an issue.
Updating Prices
Model pricing changes frequently. When prices change:
- Update the relevant entries in
src/tokonomix/models.py - Run the tests to verify consistency
- Submit a PR
We aim to update prices within 48 hours of provider announcements.
How It Works
- Token counting uses tiktoken for accurate BPE tokenization. For non-OpenAI models, tiktoken's
o200k_baseencoding provides a reasonable approximation. If tiktoken isn't installed, a word-based heuristic kicks in. - Pricing is stored as
Decimalvalues to avoid floating-point rounding issues. $2.50 per million tokens is exactly $0.0000025 per token, not $0.0000024999999999. - The tracker is thread-safe and uses monotonic timestamps for period-based budgets.
Contributing
Contributions are welcome — especially pricing updates, new provider support, and bug fixes.
git clone https://github.com/zbhatti/tokonomix.git
cd tokonomix
pip install -e ".[all]"
pip install pytest ruff mypy
pytest
See Also
Part of the stef41 LLM toolkit — open-source tools for every stage of the LLM lifecycle:
| Project | What it does |
|---|---|
| datacrux | Training data quality — dedup, PII, contamination |
| castwright | Synthetic instruction data generation |
| datamix | Dataset mixing & curriculum optimization |
| toksight | Tokenizer analysis & comparison |
| trainpulse | Training health monitoring |
| ckpt | Checkpoint inspection, diffing & merging |
| quantbench | Quantization quality analysis |
| infermark | Inference benchmarking |
| modeldiff | Behavioral regression testing |
| vibesafe | AI-generated code safety scanner |
| injectionguard | Prompt injection detection |
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tokonomix-0.3.0.tar.gz.
File metadata
- Download URL: tokonomix-0.3.0.tar.gz
- Upload date:
- Size: 45.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1570c35548f95e6310a3c878716dfb64575674fec710d5f3ea2cd0b2f3f5a829
|
|
| MD5 |
d6ed98be0480c0dfac527f2e9938d4ef
|
|
| BLAKE2b-256 |
9098837796913997017cea9554aad794ce8cef4846549e7ca04f34e98f48c325
|
File details
Details for the file tokonomix-0.3.0-py3-none-any.whl.
File metadata
- Download URL: tokonomix-0.3.0-py3-none-any.whl
- Upload date:
- Size: 32.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9218cabdaf40d855a1d25be2b8db0eb2f39a6a6af1daf49dc9da6ba8f8f78777
|
|
| MD5 |
57c2f0e5bc3d7afce728ef00559f33c2
|
|
| BLAKE2b-256 |
d9c38b01fbbb1eb62ca3f94adfe0c717a38ad1db9c65b14eff35fc66e83f71bf
|