LLM token cost monitoring, budget enforcement, and optimization.

These details have not been verified by PyPI

Project description

🛡️ TokenShield

Real-time token cost monitoring, budget enforcement, and optimization for LLM applications.

Stop burning money on LLM API calls. TokenShield gives you per-request cost tracking, budget gates, and automatic optimization — before the invoice arrives.

The Problem

Month 1:  $50    "This is cheap!"
Month 2:  $200   "Growth is normal"
Month 3:  $3,400 "WHAT HAPPENED?!"

LLM costs are invisible until the bill arrives. A single misconfigured loop, a verbose system prompt, or an unbound tool list can 10x your spend overnight.

The Solution

from tokenshield import Shield, BudgetPolicy

shield = Shield(
    model="gpt-4o",
    policy=BudgetPolicy(
        max_cost_per_request=0.05,     # $0.05 per request
        max_cost_per_hour=2.00,        # $2/hour
        max_cost_per_day=20.00,        # $20/day
        alert_threshold_pct=80,        # Alert at 80% of any limit
    )
)

# Wrap any LLM call
result = shield.call(
    messages=[{"role": "user", "content": "Summarize this order"}],
    tools=tool_schemas,
)

print(shield.report())
# ┌─────────────────────────────────┐
# │ Requests today:     142         │
# │ Tokens (in/out):    89K / 12K   │
# │ Cost today:         $4.23       │
# │ Budget remaining:   $15.77      │
# │ Avg cost/request:   $0.030      │
# │ Most expensive:     search (48%)│
# └─────────────────────────────────┘

Features

Feature	Description
Cost Tracking	Per-request, per-hour, per-day cost accumulation with model-aware pricing
Budget Gates	Hard limits that reject calls before they execute (no surprise bills)
Alert Hooks	Webhook/callback when approaching budget thresholds
Token Estimation	Pre-flight token count estimation before calling the API
Model Pricing DB	Built-in pricing for GPT-4o, Claude, Gemini, Mistral, and custom models
Optimization Tips	Automatic suggestions: "Your system prompt is 4,200 tokens — consider trimming"
Dashboard Export	JSON/CSV export for cost dashboards and observability tools
Async Support	Full async/await support for high-throughput applications

Architecture

┌──────────────────────────────────────────────────────────┐
│                     Your Application                      │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  ┌──────────┐   ┌──────────┐   ┌──────────────────────┐ │
│  │ shield   │──→│ estimator│──→│ budget_gate          │ │
│  │ .call()  │   │ (tokens) │   │ (allow / reject)     │ │
│  └──────────┘   └──────────┘   └──────────┬───────────┘ │
│       │                                     │            │
│       │         ┌──────────┐   ┌───────────▼──────────┐ │
│       │         │ tracker  │←──│ LLM API call         │ │
│       │         │ (costs)  │   │ (litellm / openai)   │ │
│       │         └────┬─────┘   └──────────────────────┘ │
│       │              │                                   │
│  ┌────▼──────────────▼─────┐   ┌──────────────────────┐ │
│  │ reporter                │   │ alert_hooks          │ │
│  │ (dashboard / export)    │   │ (webhook / callback) │ │
│  └─────────────────────────┘   └──────────────────────┘ │
│                                                          │
└──────────────────────────────────────────────────────────┘

Quick Start

pip install tokenshield

Basic Usage

from tokenshield import Shield

shield = Shield(model="gpt-4o")

# Track a call (wrap your existing LLM call)
result = shield.call(messages=[...])

# Check current spend
print(f"Today: ${shield.tracker.cost_today:.2f}")

Budget Enforcement

from tokenshield import Shield, BudgetPolicy

shield = Shield(
    model="gpt-4o",
    policy=BudgetPolicy(max_cost_per_request=0.10)
)

try:
    result = shield.call(messages=huge_prompt)
except shield.BudgetExceeded as e:
    print(f"Blocked! Estimated cost ${e.estimated_cost:.3f} exceeds limit")

Alert Hooks

shield = Shield(
    model="gpt-4o",
    policy=BudgetPolicy(max_cost_per_day=20.00, alert_threshold_pct=80),
    on_alert=lambda msg: slack.post(channel="#llm-costs", text=msg),
)

Optimization Suggestions

tips = shield.optimize(messages, tools)
# [
#   "System prompt is 3,800 tokens (63% of input). Consider compressing.",
#   "18 tools bound but only 3 used. Use dynamic tool binding to save ~2,250 tokens.",
#   "History has 45 messages. Consider windowing to last 20.",
# ]

Pricing Database

Built-in pricing (updated monthly):

Model	Input ($/1M)	Output ($/1M)	Context
gpt-4o	$2.50	$10.00	128K
gpt-4o-mini	$0.15	$0.60	128K
claude-3.5-sonnet	$3.00	$15.00	200K
claude-3-haiku	$0.25	$1.25	200K
gemini-1.5-pro	$1.25	$5.00	1M
mistral-large	$2.00	$6.00	128K

Add custom models:

shield.pricing.add("my-finetuned-model", input=5.00, output=15.00)

Documentation

Architecture & Data Flow — Mermaid diagrams of the full pipeline
Benchmarks — Cost savings measurements across real workloads
API Reference — Full class/method documentation

License

MIT — see LICENSE

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

2.0.0

May 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenshield_ai-2.0.0.tar.gz (12.5 kB view details)

Uploaded May 7, 2026 Source

File details

Details for the file tokenshield_ai-2.0.0.tar.gz.

File metadata

Download URL: tokenshield_ai-2.0.0.tar.gz
Upload date: May 7, 2026
Size: 12.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for tokenshield_ai-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`7db181e421fd98d1682a8cc9c9e221138f364e8a792320941dd7ca522a1a80f4`
MD5	`3be56b705efac8b87e767ffcf9e22780`
BLAKE2b-256	`06d9b1d0db13be39e0e498679532bf7395c947c1e6ac8019a2662fd9e49ec6a9`

See more details on using hashes here.

tokenshield-ai 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers