Skip to main content

Real-time cost monitoring and budget enforcement for LLM API calls

Project description

LLM Cost Guardian

New here? Start with the Getting Started Guide.

Real-time cost monitoring and budget enforcement for LLM API calls.

PyPI Python License: MIT Tests


Why?

LLM API costs can spiral out of control fast - a single runaway loop can burn through hundreds of dollars in minutes. LLM Cost Guardian wraps your existing clients with transparent tracking and automatic budget enforcement so you never get a surprise bill again.

Features

  • ๐Ÿ“Š Real-time cost tracking - automatic per-call cost calculation from token usage
  • ๐Ÿ›ก๏ธ Budget enforcement - hard caps, soft warnings, and sliding window policies
  • ๐Ÿ”Œ Drop-in wrappers - wrap OpenAI and Anthropic clients with one line of code
  • ๐Ÿ“ˆ Prometheus export - expose metrics for your monitoring stack
  • ๐Ÿ’พ JSON & CSV export - save usage reports for analysis
  • ๐Ÿ–ฅ๏ธ CLI tool - estimate costs and view reports from the terminal
  • ๐Ÿงฉ Extensible - add custom models, policies, and exporters
  • ๐Ÿ”’ Thread-safe - safe for concurrent use in async applications

Quick Start

pip install llm-cost-guardian
from llm_cost_guardian import CostTracker, HardCapPolicy, BudgetManager

tracker = CostTracker()
budget = BudgetManager().add(HardCapPolicy(limit_usd=5.00))

# Track a call (or use the wrapper for automatic tracking)
tracker.record("gpt-4o", input_tokens=1500, output_tokens=800)
budget.enforce(tracker)  # raises BudgetError if over limit
print(f"Cost so far: ${tracker.total_cost:.4f}")

Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              โ”‚     โ”‚        LLM Cost Guardian            โ”‚     โ”‚              โ”‚
โ”‚  Your Code   โ”‚โ”€โ”€โ”€โ”€>โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚โ”€โ”€โ”€โ”€>โ”‚   LLM API    โ”‚
โ”‚              โ”‚     โ”‚  โ”‚  Tracker  โ”‚   โ”‚   Budget     โ”‚   โ”‚     โ”‚  (OpenAI /   โ”‚
โ”‚              โ”‚<โ”€โ”€โ”€โ”€โ”‚  โ”‚  (costs)  โ”‚   โ”‚  (policies)  โ”‚   โ”‚<โ”€โ”€โ”€โ”€โ”‚  Anthropic / โ”‚
โ”‚              โ”‚     โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚     โ”‚   Google)    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                     โ”‚  โ”‚ Exporters โ”‚   โ”‚     CLI      โ”‚   โ”‚
                     โ”‚  โ”‚ (JSON/CSV/โ”‚   โ”‚              โ”‚   โ”‚
                     โ”‚  โ”‚Prometheus)โ”‚   โ”‚              โ”‚   โ”‚
                     โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
                     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Usage

Basic Cost Tracking

from llm_cost_guardian import CostTracker

tracker = CostTracker()

# Record API calls manually
tracker.record("gpt-4o", input_tokens=1500, output_tokens=800)
tracker.record("claude-3-5-haiku-20241022", input_tokens=2000, output_tokens=600)

print(f"Total: ${tracker.total_cost:.6f}")
print(f"Tokens: {tracker.total_tokens:,}")
print(tracker.cost_by_model())

Drop-in Client Wrappers

Wrap your existing client - zero code changes needed:

from openai import OpenAI
from llm_cost_guardian import CostTracker, TrackedOpenAI

tracker = CostTracker()
client = TrackedOpenAI(OpenAI(), tracker)

# Use exactly like the normal client - costs tracked automatically
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)

print(f"This call cost: ${tracker.total_cost:.6f}")

Works the same way with Anthropic:

from anthropic import Anthropic
from llm_cost_guardian import CostTracker, TrackedAnthropic

tracker = CostTracker()
client = TrackedAnthropic(Anthropic(), tracker)

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)

Budget Policies

Stack multiple policies for layered protection:

from llm_cost_guardian import (
    BudgetManager,
    HardCapPolicy,
    SoftWarningPolicy,
    SlidingWindowPolicy,
    CostTracker,
    TrackedOpenAI,
)

tracker = CostTracker()
budget = BudgetManager(
    on_warn=lambda result: print(f"WARNING: {result.message}")
)
budget.add(SoftWarningPolicy(warning_usd=1.00))       # warn at $1
budget.add(HardCapPolicy(limit_usd=5.00))              # block at $5
budget.add(SlidingWindowPolicy(                         # $0.50/hour max
    limit_usd=0.50,
    window_seconds=3600,
))

# Attach to a client
client = TrackedOpenAI(OpenAI(), tracker, budget)
# Budget is enforced automatically before each API call

Exporting Data

from llm_cost_guardian import to_json, to_csv, to_prometheus, save_json

# JSON string
print(to_json(tracker))

# CSV string
print(to_csv(tracker))

# Prometheus metrics
print(to_prometheus(tracker))

# Save to file
save_json(tracker, "usage_report.json")

CLI Usage

# List supported models and pricing
llm-cost-guardian models
llm-cost-guardian models --provider openai --json-output

# Estimate cost for a specific call
llm-cost-guardian estimate gpt-4o --input-tokens 10000 --output-tokens 5000

# View a saved report
llm-cost-guardian report usage_report.json

Prometheus Export

Expose a /metrics endpoint for your monitoring stack:

from flask import Flask, Response
from llm_cost_guardian import CostTracker, to_prometheus

app = Flask(__name__)
tracker = CostTracker()  # shared instance

@app.route("/metrics")
def metrics():
    return Response(to_prometheus(tracker), content_type="text/plain")

Output format:

# HELP llm_cost_guardian_total_cost_usd Total cost in USD
# TYPE llm_cost_guardian_total_cost_usd gauge
llm_cost_guardian_total_cost_usd 0.01234500
# HELP llm_cost_guardian_cost_by_model_usd Cost per model in USD
# TYPE llm_cost_guardian_cost_by_model_usd gauge
llm_cost_guardian_cost_by_model_usd{model="gpt-4o"} 0.00750000

Configuration

LLM Cost Guardian supports YAML configuration files:

# llm_cost_guardian.yml
budget:
  hard_cap_usd: 10.00
  soft_warning_usd: 5.00
  sliding_window:
    limit_usd: 2.00
    window_seconds: 3600

export:
  format: json
  path: ./reports/usage.json

# Override or add custom model pricing
models:
  my-fine-tuned-model:
    provider: openai
    input_cost_per_1m: 5.00
    output_cost_per_1m: 15.00

Supported Models

Model Provider Input / 1M tokens Output / 1M tokens
gpt-4o OpenAI $2.50 $10.00
gpt-4o-mini OpenAI $0.15 $0.60
gpt-4-turbo OpenAI $10.00 $30.00
gpt-4 OpenAI $30.00 $60.00
gpt-3.5-turbo OpenAI $0.50 $1.50
o1 OpenAI $15.00 $60.00
o1-mini OpenAI $3.00 $12.00
o3-mini OpenAI $1.10 $4.40
claude-opus-4-20250514 Anthropic $15.00 $75.00
claude-sonnet-4-20250514 Anthropic $3.00 $15.00
claude-3-5-sonnet-20241022 Anthropic $3.00 $15.00
claude-3-5-haiku-20241022 Anthropic $0.80 $4.00
claude-3-opus-20240229 Anthropic $15.00 $75.00
claude-3-haiku-20240307 Anthropic $0.25 $1.25
gemini-2.0-flash Google $0.10 $0.40
gemini-1.5-pro Google $1.25 $5.00
gemini-1.5-flash Google $0.075 $0.30

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_cost_guardian-0.1.1.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_cost_guardian-0.1.1-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file llm_cost_guardian-0.1.1.tar.gz.

File metadata

  • Download URL: llm_cost_guardian-0.1.1.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for llm_cost_guardian-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f51258b432269d5b23ee8c1633486e5c4a39d794cdd2e67fcfd622891c72b2b2
MD5 51f2e4e4802453a73c177469f017edcd
BLAKE2b-256 a18567df1489d2c0b24ce23dee8cf92eea19cbc9a72117037ba26b2af31c1417

See more details on using hashes here.

File details

Details for the file llm_cost_guardian-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_cost_guardian-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 da6e1d35009d3e60f2810735368acc7ec3d3dd62b9838be4311e8a275ff0c14d
MD5 d985847912385c7410895ce6e9a41392
BLAKE2b-256 8f6db351f1c6990d9d168278fe9d215429a221518833fc8ee9147d7e08efe60f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page