Pre-call cost estimation, tracking, and budgets for OpenAI, Gemini, and Claude
Project description
LLM Cost Guardian
Pre-call cost estimation, session budget tracking, and transparent cost reporting for OpenAI, Anthropic (Claude), and Google Gemini.
Know what an API call will cost before you make it. Track cumulative spend across your session. Set soft or hard budgets. Works in Python scripts and Jupyter notebooks.
Table of Contents
- Features
- Installation
- Quick Start
- Usage Guide
- Sample output
- Supported providers
- Pricing source
- Limitations
- Alternative installation (wheel)
- Feedback & contributing
Features
- Pre-call cost table — shows text tokens, image tokens (using official per-provider formulas), and max output cost before the call is made
- Precise image token estimation — OpenAI tile/patch formulas, Anthropic pixel formula, Gemini tile formula
- Post-call actual cost — tracks real token counts from the API response; reports per-call cost and cumulative session total after every call
- Session budget — set a USD limit; soft mode warns without blocking, strict mode raises an exception
- Cumulative tracking — share one
TokenTrackeracross multiple clients to track spend across your entire session - Modality disclaimer — warns when audio, video, or document content is detected (cost not computed for those)
- Works everywhere — plain
print()output, compatible with Python scripts and Jupyter notebooks - Pricing from LiteLLM — 395+ models loaded from the open-source LiteLLM pricing JSON
Installation
# Base package (no provider SDK included)
pip install llm-token-guardian
# With a specific provider SDK
pip install "llm-token-guardian[openai]"
pip install "llm-token-guardian[anthropic]"
pip install "llm-token-guardian[google]"
# All providers
pip install "llm-token-guardian[all]"
If
pip installis unavailable in your environment, see Alternative installation (wheel).
Quick Start
import openai
from llm_token_guardian import TokenTracker, budget, wrap_openai_sync
tracker = TokenTracker()
client = wrap_openai_sync(openai.OpenAI(), tracker, reporting="both")
with budget(max_cost_usd=0.10, tracker=tracker, strict=False):
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Explain LLM cost tracking."}],
max_completion_tokens=128,
)
print(response.choices[0].message.content)
print(f"Session total: ${tracker.usage.total_cost_usd:.8f} USD")
Usage Guide
Wrapping your client
Wrap your existing provider client — no need to change how you call the API.
OpenAI
import openai
from llm_token_guardian import TokenTracker, wrap_openai_sync
tracker = TokenTracker()
client = wrap_openai_sync(openai.OpenAI(), tracker)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
max_completion_tokens=64,
)
Anthropic (Claude)
import anthropic
from llm_token_guardian import TokenTracker, wrap_anthropic_sync
tracker = TokenTracker()
client = wrap_anthropic_sync(anthropic.Anthropic(), tracker)
response = client.messages.create(
model="claude-haiku-4-5",
max_tokens=64,
messages=[{"role": "user", "content": "Hello!"}],
)
Google Gemini
from google import genai
from llm_token_guardian import TokenTracker, wrap_gemini_sync
tracker = TokenTracker()
client = wrap_gemini_sync(genai.Client(api_key="..."), "gemini-2.0-flash", tracker)
response = client.generate_content("Hello!")
Reporting modes
Pass reporting= to any wrap_* function to control output verbosity:
| Mode | Output |
|---|---|
"both" |
Pre-call estimate table + post-call actual cost (default) |
"pre" |
Pre-call estimate table only |
"post" |
Post-call actual cost only |
"none" |
Silent — no output at all |
client = wrap_openai_sync(openai.OpenAI(), tracker, reporting="post")
Session tracking
Pass the same TokenTracker instance to all wrapped clients to accumulate cost across all calls in a session. The post-call summary after every call shows both the per-call cost and the running session total:
tracker = TokenTracker()
openai_client = wrap_openai_sync(openai.OpenAI(), tracker)
claude_client = wrap_anthropic_sync(anthropic.Anthropic(), tracker)
openai_client.chat.completions.create(...) # post-call shows: "Session: $X (1 call)"
claude_client.messages.create(...) # post-call shows: "Session: $Y (2 calls)"
# Full summary at any time
print(f"Total spend : ${tracker.usage.total_cost_usd:.8f} USD")
print(f"Total calls : {tracker.usage.calls}")
print(f"Total tokens : {tracker.usage.total_tokens:,}")
Budget control
Use budget() as a context manager to set a spending limit.
from llm_token_guardian import budget, TokenTracker, wrap_openai_sync
tracker = TokenTracker()
client = wrap_openai_sync(openai.OpenAI(), tracker)
# Soft mode — warn when budget is exceeded, but never block the call
with budget(max_cost_usd=0.05, tracker=tracker, strict=False):
client.chat.completions.create(...)
# Strict mode — raise BudgetExceeded if the pre-call estimate exceeds remaining budget
with budget(max_cost_usd=0.05, tracker=tracker, strict=True):
client.chat.completions.create(...)
The budget is cumulative — it subtracts the actual cost of each call, so the remaining budget shrinks as you make calls inside the context.
Vision / image requests
Image costs are estimated before the call using official per-provider token formulas:
| Provider | Formula |
|---|---|
OpenAI gpt-4o, gpt-4.1, o-series |
Tile-based: scale → 512px tiles × 170 tokens + 85 base |
OpenAI gpt-4.1-mini, gpt-4.1-nano, o4-mini |
Patch-based: 32px patches × per-model multiplier |
| Anthropic Claude | ceil(width × height / 750) tokens |
| Google Gemini | ≤384px both dims → 258 tokens; larger → ceil(w/768) × ceil(h/768) × 258 |
Pass images the same way you normally would — the wrapper detects and measures them automatically:
import base64
image_b64 = base64.b64encode(open("photo.jpg", "rb").read()).decode()
# OpenAI — data URI in image_url block
client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": [
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_b64}"}},
{"type": "text", "text": "What is in this image?"},
]}],
max_completion_tokens=64,
)
# Anthropic — base64 source block
client.messages.create(
model="claude-haiku-4-5",
max_tokens=64,
messages=[{"role": "user", "content": [
{"type": "image", "source": {
"type": "base64", "media_type": "image/jpeg", "data": image_b64,
}},
{"type": "text", "text": "What is in this image?"},
]}],
)
# Gemini — Part.from_bytes
from google.genai import types
client.generate_content([
types.Part.from_bytes(data=open("photo.jpg", "rb").read(), mime_type="image/jpeg"),
"What is in this image?",
])
Unsupported modalities: If audio, video, or PDF document content is detected, a warning is printed. The API call still proceeds — only text and image cost estimates are affected.
Jupyter notebook usage
llm-token-guardian uses plain print() with flush=True and requires no display libraries. It works in Jupyter notebooks without any changes.
# Jupyter notebook cell:
import openai
from llm_token_guardian import TokenTracker, budget, wrap_openai_sync
tracker = TokenTracker()
client = wrap_openai_sync(openai.OpenAI(), tracker, reporting="both")
with budget(max_cost_usd=0.10, tracker=tracker, strict=False):
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is 2 + 2?"}],
max_completion_tokens=32,
)
print(response.choices[0].message.content)
The pre-call cost table and post-call summary print inline in the cell output.
Sample output
[Pre-call] gpt-4o (openai)
Source : https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json
Prices as of: February 19, 2026
Budget : $0.099821 remaining of $0.100000 total
Component Tokens Cost (USD)
──────────────────────────────────────────────────
Text input ~9 $0.00004500
Image (1024×1024 px) ~765 $0.00382500
Max output 64 $0.00032000
──────────────────────────────────────────────────
Estimated max total ~838 $0.00419000
Response: A golden retriever sitting on a park bench.
[Post-call] gpt-4o
This call : $0.00187500 USD (12 in + 23 out tokens)
Session : $0.00266000 USD (2 calls total)
Budget : $0.097340 remaining of $0.100000 total
Supported providers
| Provider | Models loaded | Wrapper |
|---|---|---|
| OpenAI | 210+ (GPT-4o, GPT-4.1, o-series, …) | wrap_openai_sync |
| Anthropic | 31+ (Claude Haiku, Sonnet, Opus variants) | wrap_anthropic_sync |
| 154+ (Gemini 2.0 Flash, 1.5 Pro/Flash, …) | wrap_gemini_sync |
List all available models and their prices:
from llm_token_guardian import list_models
for name, price in list_models().items():
print(f"{name:50s} ${price.input_per_1k:.6f}/1K in ${price.output_per_1k:.6f}/1K out")
Look up a specific model:
from llm_token_guardian import get_price
p = get_price("gpt-4o")
print(f"Input : ${p.input_per_1k:.6f} / 1K tokens")
print(f"Output: ${p.output_per_1k:.6f} / 1K tokens")
print(f"Vision: {p.supports_vision}")
print(f"Max input tokens : {p.max_input_tokens:,}")
print(f"Max output tokens: {p.max_output_tokens:,}")
Pricing source
All pricing data is loaded from the open-source LiteLLM pricing file:
model_prices_and_context_window.json
Bundled snapshot date: February 19, 2026
To refresh with the latest prices at runtime:
from llm_token_guardian import refresh_pricing
refresh_pricing() # downloads latest from GitHub
Limitations
-
Text and image only — cost estimation covers text and image inputs. If you pass audio, video, or document (PDF) content, a warning is displayed but no cost is computed for those modalities. The API call still proceeds normally.
-
Estimates vs. actual billing — the pre-call table shows an upper bound (assumes all
max_output_tokensare used). The post-call cost is computed from actual token counts returned by the API using our stored price-per-token rates. This closely matches your dashboard in most cases, but can differ due to:- Prompt caching discounts (Anthropic cache read/write, OpenAI cached prompt tokens)
- Batch API pricing (usually 50% discount)
- Volume discounts or custom pricing tiers
- Price changes after the bundled snapshot date
-
Always verify on your provider dashboard — use this tool as a helpful guide, not a billing authority:
-
Synchronous wrappers are fully featured — async variants (
wrap_anthropic_async,wrap_gemini_async) are included but follow the same interface pattern. -
Model coverage — if a model is not in the pricing database, a
ModelNotFoundErroris raised explaining which providers are supported.
Alternative installation (wheel)
If pip install llm-token-guardian is unavailable, install from a pre-built .whl file.
Download the wheel from the Releases page, then:
pip install llm_token_guardian-0.1.0-py3-none-any.whl
# With a provider extra:
pip install "llm_token_guardian-0.1.0-py3-none-any.whl[openai]"
pip install "llm_token_guardian-0.1.0-py3-none-any.whl[anthropic]"
pip install "llm_token_guardian-0.1.0-py3-none-any.whl[google]"
Build the wheel yourself from source:
git clone https://github.com/iamsaugatpandey/llm-token-guardian.git
cd llm-token-guardian
pip install build
python -m build
# Outputs dist/llm_token_guardian-0.1.0-py3-none-any.whl
pip install dist/llm_token_guardian-0.1.0-py3-none-any.whl
Feedback & contributing
Email: saugatpandey02@gmail.com Feedback, questions, and feature suggestions are very welcome.
GitHub Issues: github.com/iamsaugatpandey/llm-token-guardian/issues Bug reports, feature requests, and general discussions.
Contributing: The repository will be public on GitHub — pull requests are welcome! Fork, open an issue to discuss your idea, and submit a PR.
⭐ Star the repo if you find this useful — it helps others discover the project and motivates continued development!
Pricing data sourced from BerriAI/litellm — thank you to the LiteLLM team for maintaining this open dataset.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_token_guardian-0.1.1.tar.gz.
File metadata
- Download URL: llm_token_guardian-0.1.1.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
425db0d2c8c53cabc6bc181a63effdf78f386906d7bb4326c072a3002914444a
|
|
| MD5 |
1797e6d907671a0ec215e780b5bdd53f
|
|
| BLAKE2b-256 |
7ba0ef4bf39e73a74de8d4992e6f51b79c4cd6c9a98aa1616c75d379925ef8f4
|
File details
Details for the file llm_token_guardian-0.1.1-py3-none-any.whl.
File metadata
- Download URL: llm_token_guardian-0.1.1-py3-none-any.whl
- Upload date:
- Size: 101.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac144ee135b6a877479c38c63a02bce04b5e34256f1e6fe8474b71fd3cf69028
|
|
| MD5 |
baf6598bf38c53911b30c0c037329238
|
|
| BLAKE2b-256 |
d4cfe0b12b5ddb4a543c22351cdbecc98ecd239b0a0a684a8fc03c72e0a2fef2
|