Pre-call cost estimation, tracking, and budgets for OpenAI, Gemini, and Claude

These details have not been verified by PyPI

Project links

Project description

LLM Cost Guardian

Pre-call cost estimation, session budget tracking, and transparent cost reporting for OpenAI, Anthropic (Claude), and Google Gemini.

Know what an API call will cost before you make it. Track cumulative spend across your session. Set soft or hard budgets. Works in Python scripts and Jupyter notebooks.

Features
Installation
Quick Start
Usage Guide
Sample output
Supported providers
Pricing source
Limitations
Alternative installation (wheel)
Feedback & contributing

Features

Pre-call cost table — shows text tokens, image tokens (using official per-provider formulas), and max output cost before the call is made
Precise image token estimation — OpenAI tile/patch formulas, Anthropic pixel formula, Gemini tile formula
Post-call actual cost — tracks real token counts from the API response; reports per-call cost and cumulative session total after every call
Session budget — set a USD limit; soft mode warns without blocking, strict mode raises an exception
Cumulative tracking — share one TokenTracker across multiple clients to track spend across your entire session
Modality disclaimer — warns when audio, video, or document content is detected (cost not computed for those)
Works everywhere — plain print() output, compatible with Python scripts and Jupyter notebooks
Pricing from LiteLLM — 395+ models loaded from the open-source LiteLLM pricing JSON

Installation

# Base package (no provider SDK included)
pip install llm-token-guardian

# With a specific provider SDK
pip install "llm-token-guardian[openai]"
pip install "llm-token-guardian[anthropic]"
pip install "llm-token-guardian[google]"

# All providers
pip install "llm-token-guardian[all]"

If pip install is unavailable in your environment, see Alternative installation (wheel).

Quick Start

import openai
from llm_token_guardian import TokenTracker, budget, wrap_openai_sync

tracker = TokenTracker()
client = wrap_openai_sync(openai.OpenAI(), tracker, reporting="both")

with budget(max_cost_usd=0.10, tracker=tracker, strict=False):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Explain LLM cost tracking."}],
        max_completion_tokens=128,
    )
    print(response.choices[0].message.content)

print(f"Session total: ${tracker.usage.total_cost_usd:.8f} USD")

Usage Guide

Wrapping your client

Wrap your existing provider client — no need to change how you call the API.

OpenAI

import openai
from llm_token_guardian import TokenTracker, wrap_openai_sync

tracker = TokenTracker()
client = wrap_openai_sync(openai.OpenAI(), tracker)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    max_completion_tokens=64,
)

Anthropic (Claude)

import anthropic
from llm_token_guardian import TokenTracker, wrap_anthropic_sync

tracker = TokenTracker()
client = wrap_anthropic_sync(anthropic.Anthropic(), tracker)

response = client.messages.create(
    model="claude-haiku-4-5",
    max_tokens=64,
    messages=[{"role": "user", "content": "Hello!"}],
)

Google Gemini

from google import genai
from llm_token_guardian import TokenTracker, wrap_gemini_sync

tracker = TokenTracker()
client = wrap_gemini_sync(genai.Client(api_key="..."), "gemini-2.0-flash", tracker)

response = client.generate_content("Hello!")

Reporting modes

Pass reporting= to any wrap_* function to control output verbosity:

Mode	Output
`"both"`	Pre-call estimate table + post-call actual cost (default)
`"pre"`	Pre-call estimate table only
`"post"`	Post-call actual cost only
`"none"`	Silent — no output at all

client = wrap_openai_sync(openai.OpenAI(), tracker, reporting="post")

Session tracking

Pass the same TokenTracker instance to all wrapped clients to accumulate cost across all calls in a session. The post-call summary after every call shows both the per-call cost and the running session total:

tracker = TokenTracker()

openai_client  = wrap_openai_sync(openai.OpenAI(), tracker)
claude_client  = wrap_anthropic_sync(anthropic.Anthropic(), tracker)

openai_client.chat.completions.create(...)   # post-call shows: "Session: $X (1 call)"
claude_client.messages.create(...)           # post-call shows: "Session: $Y (2 calls)"

# Full summary at any time
print(f"Total spend  : ${tracker.usage.total_cost_usd:.8f} USD")
print(f"Total calls  : {tracker.usage.calls}")
print(f"Total tokens : {tracker.usage.total_tokens:,}")

Budget control

Use budget() as a context manager to set a spending limit.

from llm_token_guardian import budget, TokenTracker, wrap_openai_sync

tracker = TokenTracker()
client = wrap_openai_sync(openai.OpenAI(), tracker)

# Soft mode — warn when budget is exceeded, but never block the call
with budget(max_cost_usd=0.05, tracker=tracker, strict=False):
    client.chat.completions.create(...)

# Strict mode — raise BudgetExceeded if the pre-call estimate exceeds remaining budget
with budget(max_cost_usd=0.05, tracker=tracker, strict=True):
    client.chat.completions.create(...)

The budget is cumulative — it subtracts the actual cost of each call, so the remaining budget shrinks as you make calls inside the context.

Vision / image requests

Image costs are estimated before the call using official per-provider token formulas:

Provider	Formula
OpenAI `gpt-4o`, `gpt-4.1`, o-series	Tile-based: scale → 512px tiles × 170 tokens + 85 base
OpenAI `gpt-4.1-mini`, `gpt-4.1-nano`, `o4-mini`	Patch-based: 32px patches × per-model multiplier
Anthropic Claude	`ceil(width × height / 750)` tokens
Google Gemini	≤384px both dims → 258 tokens; larger → `ceil(w/768) × ceil(h/768) × 258`

Pass images the same way you normally would — the wrapper detects and measures them automatically:

import base64
image_b64 = base64.b64encode(open("photo.jpg", "rb").read()).decode()

# OpenAI — data URI in image_url block
client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_b64}"}},
        {"type": "text", "text": "What is in this image?"},
    ]}],
    max_completion_tokens=64,
)

# Anthropic — base64 source block
client.messages.create(
    model="claude-haiku-4-5",
    max_tokens=64,
    messages=[{"role": "user", "content": [
        {"type": "image", "source": {
            "type": "base64", "media_type": "image/jpeg", "data": image_b64,
        }},
        {"type": "text", "text": "What is in this image?"},
    ]}],
)

# Gemini — Part.from_bytes
from google.genai import types
client.generate_content([
    types.Part.from_bytes(data=open("photo.jpg", "rb").read(), mime_type="image/jpeg"),
    "What is in this image?",
])

Unsupported modalities: If audio, video, or PDF document content is detected, a warning is printed. The API call still proceeds — only text and image cost estimates are affected.

Jupyter notebook usage

llm-token-guardian uses plain print() with flush=True and requires no display libraries. It works in Jupyter notebooks without any changes.

# Jupyter notebook cell:
import openai
from llm_token_guardian import TokenTracker, budget, wrap_openai_sync

tracker = TokenTracker()
client = wrap_openai_sync(openai.OpenAI(), tracker, reporting="both")

with budget(max_cost_usd=0.10, tracker=tracker, strict=False):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "What is 2 + 2?"}],
        max_completion_tokens=32,
    )

print(response.choices[0].message.content)

The pre-call cost table and post-call summary print inline in the cell output.

Sample output

[Pre-call]  gpt-4o  (openai)
  Source      : https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json
  Prices as of: February 19, 2026
  Budget      : $0.099821 remaining of $0.100000 total

  Component                   Tokens      Cost (USD)
  ──────────────────────────────────────────────────
  Text input                      ~9      $0.00004500
  Image  (1024×1024 px)         ~765      $0.00382500
  Max output                      64      $0.00032000
  ──────────────────────────────────────────────────
  Estimated max total           ~838      $0.00419000

Response: A golden retriever sitting on a park bench.

[Post-call] gpt-4o
  This call   : $0.00187500 USD  (12 in + 23 out tokens)
  Session     : $0.00266000 USD  (2 calls total)
  Budget      : $0.097340 remaining of $0.100000 total

Supported providers

Provider	Models loaded	Wrapper
OpenAI	210+ (GPT-4o, GPT-4.1, o-series, …)	`wrap_openai_sync`
Anthropic	31+ (Claude Haiku, Sonnet, Opus variants)	`wrap_anthropic_sync`
Google	154+ (Gemini 2.0 Flash, 1.5 Pro/Flash, …)	`wrap_gemini_sync`

List all available models and their prices:

from llm_token_guardian import list_models

for name, price in list_models().items():
    print(f"{name:50s}  ${price.input_per_1k:.6f}/1K in   ${price.output_per_1k:.6f}/1K out")

Look up a specific model:

from llm_token_guardian import get_price

p = get_price("gpt-4o")
print(f"Input : ${p.input_per_1k:.6f} / 1K tokens")
print(f"Output: ${p.output_per_1k:.6f} / 1K tokens")
print(f"Vision: {p.supports_vision}")
print(f"Max input tokens : {p.max_input_tokens:,}")
print(f"Max output tokens: {p.max_output_tokens:,}")

Pricing source

All pricing data is loaded from the open-source LiteLLM pricing file:

model_prices_and_context_window.json

Bundled snapshot date: February 19, 2026

To refresh with the latest prices at runtime:

from llm_token_guardian import refresh_pricing
refresh_pricing()  # downloads latest from GitHub

Limitations

Text and image only — cost estimation covers text and image inputs. If you pass audio, video, or document (PDF) content, a warning is displayed but no cost is computed for those modalities. The API call still proceeds normally.
Estimates vs. actual billing — the pre-call table shows an upper bound (assumes all max_output_tokens are used). The post-call cost is computed from actual token counts returned by the API using our stored price-per-token rates. This closely matches your dashboard in most cases, but can differ due to:
- Prompt caching discounts (Anthropic cache read/write, OpenAI cached prompt tokens)
- Batch API pricing (usually 50% discount)
- Volume discounts or custom pricing tiers
- Price changes after the bundled snapshot date
Always verify on your provider dashboard — use this tool as a helpful guide, not a billing authority:
Synchronous wrappers are fully featured — async variants (wrap_anthropic_async, wrap_gemini_async) are included but follow the same interface pattern.
Model coverage — if a model is not in the pricing database, a ModelNotFoundError is raised explaining which providers are supported.

Alternative installation (wheel)

If pip install llm-token-guardian is unavailable, install from a pre-built .whl file.

Download the wheel from the Releases page, then:

pip install llm_token_guardian-0.1.0-py3-none-any.whl

# With a provider extra:
pip install "llm_token_guardian-0.1.0-py3-none-any.whl[openai]"
pip install "llm_token_guardian-0.1.0-py3-none-any.whl[anthropic]"
pip install "llm_token_guardian-0.1.0-py3-none-any.whl[google]"

Build the wheel yourself from source:

git clone https://github.com/iamsaugatpandey/llm-token-guardian.git
cd llm-token-guardian
pip install build
python -m build
# Outputs dist/llm_token_guardian-0.1.0-py3-none-any.whl
pip install dist/llm_token_guardian-0.1.0-py3-none-any.whl

Feedback & contributing

Email: saugatpandey02@gmail.com Feedback, questions, and feature suggestions are very welcome.

GitHub Issues: github.com/iamsaugatpandey/llm-token-guardian/issues Bug reports, feature requests, and general discussions.

Contributing: The repository will be public on GitHub — pull requests are welcome! Fork, open an issue to discuss your idea, and submit a PR.

⭐ Star the repo if you find this useful — it helps others discover the project and motivates continued development!

Pricing data sourced from BerriAI/litellm — thank you to the LiteLLM team for maintaining this open dataset.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Feb 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_token_guardian-0.1.1.tar.gz (1.2 MB view details)

Uploaded Feb 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_token_guardian-0.1.1-py3-none-any.whl (101.9 kB view details)

Uploaded Feb 20, 2026 Python 3

File details

Details for the file llm_token_guardian-0.1.1.tar.gz.

File metadata

Download URL: llm_token_guardian-0.1.1.tar.gz
Upload date: Feb 20, 2026
Size: 1.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for llm_token_guardian-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`425db0d2c8c53cabc6bc181a63effdf78f386906d7bb4326c072a3002914444a`
MD5	`1797e6d907671a0ec215e780b5bdd53f`
BLAKE2b-256	`7ba0ef4bf39e73a74de8d4992e6f51b79c4cd6c9a98aa1616c75d379925ef8f4`

See more details on using hashes here.

File details

Details for the file llm_token_guardian-0.1.1-py3-none-any.whl.

File metadata

Download URL: llm_token_guardian-0.1.1-py3-none-any.whl
Upload date: Feb 20, 2026
Size: 101.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for llm_token_guardian-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ac144ee135b6a877479c38c63a02bce04b5e34256f1e6fe8474b71fd3cf69028`
MD5	`baf6598bf38c53911b30c0c037329238`
BLAKE2b-256	`d4cfe0b12b5ddb4a543c22351cdbecc98ecd239b0a0a684a8fc03c72e0a2fef2`

See more details on using hashes here.

llm-token-guardian 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LLM Cost Guardian

Table of Contents

Features

Installation

Quick Start

Usage Guide

Wrapping your client

OpenAI

Anthropic (Claude)

Google Gemini

Reporting modes

Session tracking

Budget control

Vision / image requests

Jupyter notebook usage

Sample output

Supported providers

Pricing source

Limitations

Alternative installation (wheel)

Feedback & contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes