Real-time cost monitoring and budget enforcement for LLM API calls
Project description
LLM Cost Guardian
New here? Start with the Getting Started Guide.
Real-time cost monitoring and budget enforcement for LLM API calls.
Why?
LLM API costs can spiral out of control fast - a single runaway loop can burn through hundreds of dollars in minutes. LLM Cost Guardian wraps your existing clients with transparent tracking and automatic budget enforcement so you never get a surprise bill again.
Features
- ๐ Real-time cost tracking - automatic per-call cost calculation from token usage
- ๐ก๏ธ Budget enforcement - hard caps, soft warnings, and sliding window policies
- ๐ Drop-in wrappers - wrap OpenAI and Anthropic clients with one line of code
- ๐ Prometheus export - expose metrics for your monitoring stack
- ๐พ JSON & CSV export - save usage reports for analysis
- ๐ฅ๏ธ CLI tool - estimate costs and view reports from the terminal
- ๐งฉ Extensible - add custom models, policies, and exporters
- ๐ Thread-safe - safe for concurrent use in async applications
Quick Start
pip install llm-cost-guardian
from llm_cost_guardian import CostTracker, HardCapPolicy, BudgetManager
tracker = CostTracker()
budget = BudgetManager().add(HardCapPolicy(limit_usd=5.00))
# Track a call (or use the wrapper for automatic tracking)
tracker.record("gpt-4o", input_tokens=1500, output_tokens=800)
budget.enforce(tracker) # raises BudgetError if over limit
print(f"Cost so far: ${tracker.total_cost:.4f}")
Architecture
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ โ โ LLM Cost Guardian โ โ โ
โ Your Code โโโโโ>โ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโ>โ LLM API โ
โ โ โ โ Tracker โ โ Budget โ โ โ (OpenAI / โ
โ โ<โโโโโ โ (costs) โ โ (policies) โ โ<โโโโโ Anthropic / โ
โ โ โ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ โ Google) โ
โโโโโโโโโโโโโโโโ โ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโโ
โ โ Exporters โ โ CLI โ โ
โ โ (JSON/CSV/โ โ โ โ
โ โPrometheus)โ โ โ โ
โ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Usage
Basic Cost Tracking
from llm_cost_guardian import CostTracker
tracker = CostTracker()
# Record API calls manually
tracker.record("gpt-4o", input_tokens=1500, output_tokens=800)
tracker.record("claude-3-5-haiku-20241022", input_tokens=2000, output_tokens=600)
print(f"Total: ${tracker.total_cost:.6f}")
print(f"Tokens: {tracker.total_tokens:,}")
print(tracker.cost_by_model())
Drop-in Client Wrappers
Wrap your existing client - zero code changes needed:
from openai import OpenAI
from llm_cost_guardian import CostTracker, TrackedOpenAI
tracker = CostTracker()
client = TrackedOpenAI(OpenAI(), tracker)
# Use exactly like the normal client - costs tracked automatically
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
)
print(f"This call cost: ${tracker.total_cost:.6f}")
Works the same way with Anthropic:
from anthropic import Anthropic
from llm_cost_guardian import CostTracker, TrackedAnthropic
tracker = CostTracker()
client = TrackedAnthropic(Anthropic(), tracker)
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
)
Budget Policies
Stack multiple policies for layered protection:
from llm_cost_guardian import (
BudgetManager,
HardCapPolicy,
SoftWarningPolicy,
SlidingWindowPolicy,
CostTracker,
TrackedOpenAI,
)
tracker = CostTracker()
budget = BudgetManager(
on_warn=lambda result: print(f"WARNING: {result.message}")
)
budget.add(SoftWarningPolicy(warning_usd=1.00)) # warn at $1
budget.add(HardCapPolicy(limit_usd=5.00)) # block at $5
budget.add(SlidingWindowPolicy( # $0.50/hour max
limit_usd=0.50,
window_seconds=3600,
))
# Attach to a client
client = TrackedOpenAI(OpenAI(), tracker, budget)
# Budget is enforced automatically before each API call
Exporting Data
from llm_cost_guardian import to_json, to_csv, to_prometheus, save_json
# JSON string
print(to_json(tracker))
# CSV string
print(to_csv(tracker))
# Prometheus metrics
print(to_prometheus(tracker))
# Save to file
save_json(tracker, "usage_report.json")
CLI Usage
# List supported models and pricing
llm-cost-guardian models
llm-cost-guardian models --provider openai --json-output
# Estimate cost for a specific call
llm-cost-guardian estimate gpt-4o --input-tokens 10000 --output-tokens 5000
# View a saved report
llm-cost-guardian report usage_report.json
Prometheus Export
Expose a /metrics endpoint for your monitoring stack:
from flask import Flask, Response
from llm_cost_guardian import CostTracker, to_prometheus
app = Flask(__name__)
tracker = CostTracker() # shared instance
@app.route("/metrics")
def metrics():
return Response(to_prometheus(tracker), content_type="text/plain")
Output format:
# HELP llm_cost_guardian_total_cost_usd Total cost in USD
# TYPE llm_cost_guardian_total_cost_usd gauge
llm_cost_guardian_total_cost_usd 0.01234500
# HELP llm_cost_guardian_cost_by_model_usd Cost per model in USD
# TYPE llm_cost_guardian_cost_by_model_usd gauge
llm_cost_guardian_cost_by_model_usd{model="gpt-4o"} 0.00750000
Configuration
LLM Cost Guardian supports YAML configuration files:
# llm_cost_guardian.yml
budget:
hard_cap_usd: 10.00
soft_warning_usd: 5.00
sliding_window:
limit_usd: 2.00
window_seconds: 3600
export:
format: json
path: ./reports/usage.json
# Override or add custom model pricing
models:
my-fine-tuned-model:
provider: openai
input_cost_per_1m: 5.00
output_cost_per_1m: 15.00
Supported Models
| Model | Provider | Input / 1M tokens | Output / 1M tokens |
|---|---|---|---|
gpt-4o |
OpenAI | $2.50 | $10.00 |
gpt-4o-mini |
OpenAI | $0.15 | $0.60 |
gpt-4-turbo |
OpenAI | $10.00 | $30.00 |
gpt-4 |
OpenAI | $30.00 | $60.00 |
gpt-3.5-turbo |
OpenAI | $0.50 | $1.50 |
o1 |
OpenAI | $15.00 | $60.00 |
o1-mini |
OpenAI | $3.00 | $12.00 |
o3-mini |
OpenAI | $1.10 | $4.40 |
claude-opus-4-20250514 |
Anthropic | $15.00 | $75.00 |
claude-sonnet-4-20250514 |
Anthropic | $3.00 | $15.00 |
claude-3-5-sonnet-20241022 |
Anthropic | $3.00 | $15.00 |
claude-3-5-haiku-20241022 |
Anthropic | $0.80 | $4.00 |
claude-3-opus-20240229 |
Anthropic | $15.00 | $75.00 |
claude-3-haiku-20240307 |
Anthropic | $0.25 | $1.25 |
gemini-2.0-flash |
$0.10 | $0.40 | |
gemini-1.5-pro |
$1.25 | $5.00 | |
gemini-1.5-flash |
$0.075 | $0.30 |
Contributing
Contributions are welcome! See CONTRIBUTING.md for guidelines.
License
MIT License - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_cost_guardian-0.1.1.tar.gz.
File metadata
- Download URL: llm_cost_guardian-0.1.1.tar.gz
- Upload date:
- Size: 14.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f51258b432269d5b23ee8c1633486e5c4a39d794cdd2e67fcfd622891c72b2b2
|
|
| MD5 |
51f2e4e4802453a73c177469f017edcd
|
|
| BLAKE2b-256 |
a18567df1489d2c0b24ce23dee8cf92eea19cbc9a72117037ba26b2af31c1417
|
File details
Details for the file llm_cost_guardian-0.1.1-py3-none-any.whl.
File metadata
- Download URL: llm_cost_guardian-0.1.1-py3-none-any.whl
- Upload date:
- Size: 13.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da6e1d35009d3e60f2810735368acc7ec3d3dd62b9838be4311e8a275ff0c14d
|
|
| MD5 |
d985847912385c7410895ce6e9a41392
|
|
| BLAKE2b-256 |
8f6db351f1c6990d9d168278fe9d215429a221518833fc8ee9147d7e08efe60f
|