LLM token cost monitoring, budget enforcement, and optimization.
Project description
๐ก๏ธ TokenShield
Real-time token cost monitoring, budget enforcement, and optimization for LLM applications.
Stop burning money on LLM API calls. TokenShield gives you per-request cost tracking, budget gates, and automatic optimization โ before the invoice arrives.
The Problem
Month 1: $50 "This is cheap!"
Month 2: $200 "Growth is normal"
Month 3: $3,400 "WHAT HAPPENED?!"
LLM costs are invisible until the bill arrives. A single misconfigured loop, a verbose system prompt, or an unbound tool list can 10x your spend overnight.
The Solution
from tokenshield import Shield, BudgetPolicy
shield = Shield(
model="gpt-4o",
policy=BudgetPolicy(
max_cost_per_request=0.05, # $0.05 per request
max_cost_per_hour=2.00, # $2/hour
max_cost_per_day=20.00, # $20/day
alert_threshold_pct=80, # Alert at 80% of any limit
)
)
# Wrap any LLM call
result = shield.call(
messages=[{"role": "user", "content": "Summarize this order"}],
tools=tool_schemas,
)
print(shield.report())
# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# โ Requests today: 142 โ
# โ Tokens (in/out): 89K / 12K โ
# โ Cost today: $4.23 โ
# โ Budget remaining: $15.77 โ
# โ Avg cost/request: $0.030 โ
# โ Most expensive: search (48%)โ
# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Features
| Feature | Description |
|---|---|
| Cost Tracking | Per-request, per-hour, per-day cost accumulation with model-aware pricing |
| Budget Gates | Hard limits that reject calls before they execute (no surprise bills) |
| Alert Hooks | Webhook/callback when approaching budget thresholds |
| Token Estimation | Pre-flight token count estimation before calling the API |
| Model Pricing DB | Built-in pricing for GPT-4o, Claude, Gemini, Mistral, and custom models |
| Optimization Tips | Automatic suggestions: "Your system prompt is 4,200 tokens โ consider trimming" |
| Dashboard Export | JSON/CSV export for cost dashboards and observability tools |
| Async Support | Full async/await support for high-throughput applications |
Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Your Application โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ shield โโโโโ estimatorโโโโโ budget_gate โ โ
โ โ .call() โ โ (tokens) โ โ (allow / reject) โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโฌโโโโโโโโโโโโ โ
โ โ โ โ
โ โ โโโโโโโโโโโโ โโโโโโโโโโโโโผโโโโโโโโโโโ โ
โ โ โ tracker โโโโโ LLM API call โ โ
โ โ โ (costs) โ โ (litellm / openai) โ โ
โ โ โโโโโโฌโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ
โ โโโโโโผโโโโโโโโโโโโโโโผโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ reporter โ โ alert_hooks โ โ
โ โ (dashboard / export) โ โ (webhook / callback) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Quick Start
pip install tokenshield
Basic Usage
from tokenshield import Shield
shield = Shield(model="gpt-4o")
# Track a call (wrap your existing LLM call)
result = shield.call(messages=[...])
# Check current spend
print(f"Today: ${shield.tracker.cost_today:.2f}")
Budget Enforcement
from tokenshield import Shield, BudgetPolicy
shield = Shield(
model="gpt-4o",
policy=BudgetPolicy(max_cost_per_request=0.10)
)
try:
result = shield.call(messages=huge_prompt)
except shield.BudgetExceeded as e:
print(f"Blocked! Estimated cost ${e.estimated_cost:.3f} exceeds limit")
Alert Hooks
shield = Shield(
model="gpt-4o",
policy=BudgetPolicy(max_cost_per_day=20.00, alert_threshold_pct=80),
on_alert=lambda msg: slack.post(channel="#llm-costs", text=msg),
)
Optimization Suggestions
tips = shield.optimize(messages, tools)
# [
# "System prompt is 3,800 tokens (63% of input). Consider compressing.",
# "18 tools bound but only 3 used. Use dynamic tool binding to save ~2,250 tokens.",
# "History has 45 messages. Consider windowing to last 20.",
# ]
Pricing Database
Built-in pricing (updated monthly):
| Model | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|
| gpt-4o | $2.50 | $10.00 | 128K |
| gpt-4o-mini | $0.15 | $0.60 | 128K |
| claude-3.5-sonnet | $3.00 | $15.00 | 200K |
| claude-3-haiku | $0.25 | $1.25 | 200K |
| gemini-1.5-pro | $1.25 | $5.00 | 1M |
| mistral-large | $2.00 | $6.00 | 128K |
Add custom models:
shield.pricing.add("my-finetuned-model", input=5.00, output=15.00)
Documentation
- Architecture & Data Flow โ Mermaid diagrams of the full pipeline
- Benchmarks โ Cost savings measurements across real workloads
- API Reference โ Full class/method documentation
License
MIT โ see LICENSE
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tokenshield_ai-2.0.0.tar.gz
(12.5 kB
view details)
File details
Details for the file tokenshield_ai-2.0.0.tar.gz.
File metadata
- Download URL: tokenshield_ai-2.0.0.tar.gz
- Upload date:
- Size: 12.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7db181e421fd98d1682a8cc9c9e221138f364e8a792320941dd7ca522a1a80f4
|
|
| MD5 |
3be56b705efac8b87e767ffcf9e22780
|
|
| BLAKE2b-256 |
06d9b1d0db13be39e0e498679532bf7395c947c1e6ac8019a2662fd9e49ec6a9
|