Skip to main content

Pre-call cost estimation, tracking, and budgets for OpenAI, Gemini, and Claude

Project description

LLM Cost Guardian

Pre-call cost estimation, session budget tracking, and transparent cost reporting for OpenAI, Anthropic (Claude), and Google Gemini.

Know what an API call will cost before you make it. Track cumulative spend across your session. Set soft or hard budgets. Works in Python scripts and Jupyter notebooks.


Table of Contents


Features

  • Pre-call cost table — shows text tokens, image tokens (using official per-provider formulas), and max output cost before the call is made
  • Precise image token estimation — OpenAI tile/patch formulas, Anthropic pixel formula, Gemini tile formula
  • Post-call actual cost — tracks real token counts from the API response; reports per-call cost and cumulative session total after every call
  • Session budget — set a USD limit; soft mode warns without blocking, strict mode raises an exception
  • Cumulative tracking — share one TokenTracker across multiple clients to track spend across your entire session
  • Modality disclaimer — warns when audio, video, or document content is detected (cost not computed for those)
  • Works everywhere — plain print() output, compatible with Python scripts and Jupyter notebooks
  • Pricing from LiteLLM — 395+ models loaded from the open-source LiteLLM pricing JSON

Installation

# Base package (no provider SDK included)
pip install llm-token-guardian

# With a specific provider SDK
pip install "llm-token-guardian[openai]"
pip install "llm-token-guardian[anthropic]"
pip install "llm-token-guardian[google]"

# All providers
pip install "llm-token-guardian[all]"

If pip install is unavailable in your environment, see Alternative installation (wheel).


Quick Start

import openai
from llm_token_guardian import TokenTracker, budget, wrap_openai_sync

tracker = TokenTracker()
client = wrap_openai_sync(openai.OpenAI(), tracker, reporting="both")

with budget(max_cost_usd=0.10, tracker=tracker, strict=False):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Explain LLM cost tracking."}],
        max_completion_tokens=128,
    )
    print(response.choices[0].message.content)

print(f"Session total: ${tracker.usage.total_cost_usd:.8f} USD")

Usage Guide

Wrapping your client

Wrap your existing provider client — no need to change how you call the API.

OpenAI

import openai
from llm_token_guardian import TokenTracker, wrap_openai_sync

tracker = TokenTracker()
client = wrap_openai_sync(openai.OpenAI(), tracker)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    max_completion_tokens=64,
)

Anthropic (Claude)

import anthropic
from llm_token_guardian import TokenTracker, wrap_anthropic_sync

tracker = TokenTracker()
client = wrap_anthropic_sync(anthropic.Anthropic(), tracker)

response = client.messages.create(
    model="claude-haiku-4-5",
    max_tokens=64,
    messages=[{"role": "user", "content": "Hello!"}],
)

Google Gemini

from google import genai
from llm_token_guardian import TokenTracker, wrap_gemini_sync

tracker = TokenTracker()
client = wrap_gemini_sync(genai.Client(api_key="..."), "gemini-2.0-flash", tracker)

response = client.generate_content("Hello!")

Reporting modes

Pass reporting= to any wrap_* function to control output verbosity:

Mode Output
"both" Pre-call estimate table + post-call actual cost (default)
"pre" Pre-call estimate table only
"post" Post-call actual cost only
"none" Silent — no output at all
client = wrap_openai_sync(openai.OpenAI(), tracker, reporting="post")

Session tracking

Pass the same TokenTracker instance to all wrapped clients to accumulate cost across all calls in a session. The post-call summary after every call shows both the per-call cost and the running session total:

tracker = TokenTracker()

openai_client  = wrap_openai_sync(openai.OpenAI(), tracker)
claude_client  = wrap_anthropic_sync(anthropic.Anthropic(), tracker)

openai_client.chat.completions.create(...)   # post-call shows: "Session: $X (1 call)"
claude_client.messages.create(...)           # post-call shows: "Session: $Y (2 calls)"

# Full summary at any time
print(f"Total spend  : ${tracker.usage.total_cost_usd:.8f} USD")
print(f"Total calls  : {tracker.usage.calls}")
print(f"Total tokens : {tracker.usage.total_tokens:,}")

Budget control

Use budget() as a context manager to set a spending limit.

from llm_token_guardian import budget, TokenTracker, wrap_openai_sync

tracker = TokenTracker()
client = wrap_openai_sync(openai.OpenAI(), tracker)

# Soft mode — warn when budget is exceeded, but never block the call
with budget(max_cost_usd=0.05, tracker=tracker, strict=False):
    client.chat.completions.create(...)

# Strict mode — raise BudgetExceeded if the pre-call estimate exceeds remaining budget
with budget(max_cost_usd=0.05, tracker=tracker, strict=True):
    client.chat.completions.create(...)

The budget is cumulative — it subtracts the actual cost of each call, so the remaining budget shrinks as you make calls inside the context.


Vision / image requests

Image costs are estimated before the call using official per-provider token formulas:

Provider Formula
OpenAI gpt-4o, gpt-4.1, o-series Tile-based: scale → 512px tiles × 170 tokens + 85 base
OpenAI gpt-4.1-mini, gpt-4.1-nano, o4-mini Patch-based: 32px patches × per-model multiplier
Anthropic Claude ceil(width × height / 750) tokens
Google Gemini ≤384px both dims → 258 tokens; larger → ceil(w/768) × ceil(h/768) × 258

Pass images the same way you normally would — the wrapper detects and measures them automatically:

import base64
image_b64 = base64.b64encode(open("photo.jpg", "rb").read()).decode()

# OpenAI — data URI in image_url block
client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_b64}"}},
        {"type": "text", "text": "What is in this image?"},
    ]}],
    max_completion_tokens=64,
)

# Anthropic — base64 source block
client.messages.create(
    model="claude-haiku-4-5",
    max_tokens=64,
    messages=[{"role": "user", "content": [
        {"type": "image", "source": {
            "type": "base64", "media_type": "image/jpeg", "data": image_b64,
        }},
        {"type": "text", "text": "What is in this image?"},
    ]}],
)

# Gemini — Part.from_bytes
from google.genai import types
client.generate_content([
    types.Part.from_bytes(data=open("photo.jpg", "rb").read(), mime_type="image/jpeg"),
    "What is in this image?",
])

Unsupported modalities: If audio, video, or PDF document content is detected, a warning is printed. The API call still proceeds — only text and image cost estimates are affected.


Jupyter notebook usage

llm-token-guardian uses plain print() with flush=True and requires no display libraries. It works in Jupyter notebooks without any changes.

# Jupyter notebook cell:
import openai
from llm_token_guardian import TokenTracker, budget, wrap_openai_sync

tracker = TokenTracker()
client = wrap_openai_sync(openai.OpenAI(), tracker, reporting="both")

with budget(max_cost_usd=0.10, tracker=tracker, strict=False):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "What is 2 + 2?"}],
        max_completion_tokens=32,
    )

print(response.choices[0].message.content)

The pre-call cost table and post-call summary print inline in the cell output.


Sample output

[Pre-call]  gpt-4o  (openai)
  Source      : https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json
  Prices as of: February 19, 2026
  Budget      : $0.099821 remaining of $0.100000 total

  Component                   Tokens      Cost (USD)
  ──────────────────────────────────────────────────
  Text input                      ~9      $0.00004500
  Image  (1024×1024 px)         ~765      $0.00382500
  Max output                      64      $0.00032000
  ──────────────────────────────────────────────────
  Estimated max total           ~838      $0.00419000

Response: A golden retriever sitting on a park bench.

[Post-call] gpt-4o
  This call   : $0.00187500 USD  (12 in + 23 out tokens)
  Session     : $0.00266000 USD  (2 calls total)
  Budget      : $0.097340 remaining of $0.100000 total

Supported providers

Provider Models loaded Wrapper
OpenAI 210+ (GPT-4o, GPT-4.1, o-series, …) wrap_openai_sync
Anthropic 31+ (Claude Haiku, Sonnet, Opus variants) wrap_anthropic_sync
Google 154+ (Gemini 2.0 Flash, 1.5 Pro/Flash, …) wrap_gemini_sync

List all available models and their prices:

from llm_token_guardian import list_models

for name, price in list_models().items():
    print(f"{name:50s}  ${price.input_per_1k:.6f}/1K in   ${price.output_per_1k:.6f}/1K out")

Look up a specific model:

from llm_token_guardian import get_price

p = get_price("gpt-4o")
print(f"Input : ${p.input_per_1k:.6f} / 1K tokens")
print(f"Output: ${p.output_per_1k:.6f} / 1K tokens")
print(f"Vision: {p.supports_vision}")
print(f"Max input tokens : {p.max_input_tokens:,}")
print(f"Max output tokens: {p.max_output_tokens:,}")

Pricing source

All pricing data is loaded from the open-source LiteLLM pricing file:

model_prices_and_context_window.json

Bundled snapshot date: February 19, 2026

To refresh with the latest prices at runtime:

from llm_token_guardian import refresh_pricing
refresh_pricing()  # downloads latest from GitHub

Limitations

  1. Text and image only — cost estimation covers text and image inputs. If you pass audio, video, or document (PDF) content, a warning is displayed but no cost is computed for those modalities. The API call still proceeds normally.

  2. Estimates vs. actual billing — the pre-call table shows an upper bound (assumes all max_output_tokens are used). The post-call cost is computed from actual token counts returned by the API using our stored price-per-token rates. This closely matches your dashboard in most cases, but can differ due to:

    • Prompt caching discounts (Anthropic cache read/write, OpenAI cached prompt tokens)
    • Batch API pricing (usually 50% discount)
    • Volume discounts or custom pricing tiers
    • Price changes after the bundled snapshot date
  3. Always verify on your provider dashboard — use this tool as a helpful guide, not a billing authority:

  4. Synchronous wrappers are fully featured — async variants (wrap_anthropic_async, wrap_gemini_async) are included but follow the same interface pattern.

  5. Model coverage — if a model is not in the pricing database, a ModelNotFoundError is raised explaining which providers are supported.


Alternative installation (wheel)

If pip install llm-token-guardian is unavailable, install from a pre-built .whl file.

Download the wheel from the Releases page, then:

pip install llm_token_guardian-0.1.0-py3-none-any.whl

# With a provider extra:
pip install "llm_token_guardian-0.1.0-py3-none-any.whl[openai]"
pip install "llm_token_guardian-0.1.0-py3-none-any.whl[anthropic]"
pip install "llm_token_guardian-0.1.0-py3-none-any.whl[google]"

Build the wheel yourself from source:

git clone https://github.com/iamsaugatpandey/llm-token-guardian.git
cd llm-token-guardian
pip install build
python -m build
# Outputs dist/llm_token_guardian-0.1.0-py3-none-any.whl
pip install dist/llm_token_guardian-0.1.0-py3-none-any.whl

Feedback & contributing

Email: saugatpandey02@gmail.com Feedback, questions, and feature suggestions are very welcome.

GitHub Issues: github.com/iamsaugatpandey/llm-token-guardian/issues Bug reports, feature requests, and general discussions.

Contributing: The repository will be public on GitHub — pull requests are welcome! Fork, open an issue to discuss your idea, and submit a PR.

⭐ Star the repo if you find this useful — it helps others discover the project and motivates continued development!


Pricing data sourced from BerriAI/litellm — thank you to the LiteLLM team for maintaining this open dataset.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_token_guardian-0.1.1.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_token_guardian-0.1.1-py3-none-any.whl (101.9 kB view details)

Uploaded Python 3

File details

Details for the file llm_token_guardian-0.1.1.tar.gz.

File metadata

  • Download URL: llm_token_guardian-0.1.1.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for llm_token_guardian-0.1.1.tar.gz
Algorithm Hash digest
SHA256 425db0d2c8c53cabc6bc181a63effdf78f386906d7bb4326c072a3002914444a
MD5 1797e6d907671a0ec215e780b5bdd53f
BLAKE2b-256 7ba0ef4bf39e73a74de8d4992e6f51b79c4cd6c9a98aa1616c75d379925ef8f4

See more details on using hashes here.

File details

Details for the file llm_token_guardian-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_token_guardian-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ac144ee135b6a877479c38c63a02bce04b5e34256f1e6fe8474b71fd3cf69028
MD5 baf6598bf38c53911b30c0c037329238
BLAKE2b-256 d4cfe0b12b5ddb4a543c22351cdbecc98ecd239b0a0a684a8fc03c72e0a2fef2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page