Skip to main content

Real-time cost tracking, budget enforcement, and usage analytics for LLM applications

Project description

LLM Cost Guard

PyPI version Python 3.9+ License: MIT

Real-time cost tracking, budget enforcement, and usage analytics for LLM applications. Supports OpenAI, Anthropic, AWS Bedrock, and more.

Features

  • Real-time Cost Tracking: Track costs as they happen, not when the bill arrives
  • Budget Enforcement: Set limits with configurable actions (warn, throttle, block)
  • Multi-Provider Support: OpenAI, Anthropic, AWS Bedrock, Google Vertex AI
  • LangChain Integration: Native callback support for LangChain applications
  • Rate Limiting: Control request rates per model, provider, or custom tags
  • Hierarchical Tracking: Group related LLM calls with spans
  • Flexible Storage: In-memory, SQLite, PostgreSQL, Redis, DynamoDB backends
  • Zero External Dependencies: Works offline with no external services required

Installation

pip install llm-cost-guard

With optional integrations:

# LangChain support
pip install llm-cost-guard[langchain]

# AWS Bedrock support
pip install llm-cost-guard[bedrock]

# All optional dependencies
pip install llm-cost-guard[all]

Quick Start

Basic Usage

from llm_cost_guard import CostTracker

tracker = CostTracker()

# Decorator-based tracking
@tracker.track
def my_llm_call():
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    return response

result = my_llm_call()

# Check costs
print(tracker.last_call().total_cost)  # $0.0015

With Budget Enforcement

from llm_cost_guard import CostTracker, Budget, BudgetAction

tracker = CostTracker(
    budgets=[
        Budget(
            name="daily",
            limit=10.00,
            period="day",
            action=BudgetAction.WARN
        ),
        Budget(
            name="monthly",
            limit=500.00,
            period="month",
            action=BudgetAction.BLOCK
        ),
    ]
)

# Get notified when approaching limits
@tracker.on_budget_warning
def handle_warning(budget, current):
    print(f"Warning: Budget '{budget.name}' at {current/budget.limit*100:.0f}%")

@tracker.on_budget_exceeded
def handle_exceeded(budget):
    print(f"Budget '{budget.name}' exceeded!")

Manual Recording

# For custom integrations
record = tracker.record(
    provider="openai",
    model="gpt-4o",
    input_tokens=1234,
    output_tokens=567,
    tags={"team": "search", "feature": "autocomplete"}
)

print(record.total_cost)  # $0.0208

Wrapped Clients

from llm_cost_guard import CostTracker
from llm_cost_guard.clients import TrackedOpenAI

tracker = CostTracker()
client = TrackedOpenAI(tracker=tracker)

# Automatic tracking - no decorators needed
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

LangChain Integration

from llm_cost_guard import CostTracker
from llm_cost_guard.integrations.langchain import CostTrackingCallback

tracker = CostTracker()

llm = ChatOpenAI(
    model="gpt-4o",
    callbacks=[CostTrackingCallback(tracker)]
)

result = llm.invoke("Hello!")
print(tracker.last_call().total_cost)

Hierarchical Tracking (Spans)

# Track costs for complex operations like agents
with tracker.span("customer_support_agent", tags={"user_id": "123"}) as span:
    result = agent.invoke(query)
    
    print(span.total_cost)      # $0.45 (sum of all calls)
    print(span.call_count)      # 5
    print(span.models_used)     # ["gpt-4o", "gpt-3.5-turbo"]

Configuration

Storage Backends

# In-memory (default, development)
tracker = CostTracker(backend="memory")

# SQLite (single-machine persistence)
tracker = CostTracker(backend="sqlite:///costs.db")

# PostgreSQL (production)
tracker = CostTracker(backend="postgresql://user:pass@host/db")

# Redis (distributed, real-time)
tracker = CostTracker(backend="redis://localhost:6379/0")

Rate Limiting

from llm_cost_guard import CostTracker, RateLimit

tracker = CostTracker(
    rate_limits=[
        RateLimit(
            name="requests-per-minute",
            limit=100,
            period="minute",
            scope="global"
        ),
        RateLimit(
            name="user-requests",
            limit=10,
            period="minute",
            scope="tag:user_id"
        )
    ]
)

Fail-Safe Modes

tracker = CostTracker(
    # Block LLM calls if tracking fails (strict)
    on_tracking_failure="block",
    
    # Allow LLM calls but log warning (available)
    # on_tracking_failure="allow",
    
    # Use in-memory fallback temporarily
    # on_tracking_failure="fallback",
)

CLI

# View current costs
llm-cost-guard status

# Generate report
llm-cost-guard report --period day --group-by model

# Check health
llm-cost-guard health

# List supported models and pricing
llm-cost-guard models --provider openai

# Export data
llm-cost-guard export --format csv --output costs.csv

Supported Providers

Provider Models
OpenAI GPT-4o, GPT-4, GPT-3.5, o1, Embeddings, DALL-E
Anthropic Claude 3.5, Claude 3, Claude 2
AWS Bedrock Claude, Titan, Llama, Mistral, Cohere
Google Vertex AI Gemini 1.5, Gemini 1.0, PaLM 2

Reporting

# Daily summary
tracker.daily_report()

# Cost by model
tracker.report_by_model(period="week")

# Query with filters
report = tracker.get_costs(
    start_date="2024-01-01",
    end_date="2024-01-31",
    tags={"team": "search"},
    group_by=["model", "feature"]
)

# Export to DataFrame
df = tracker.to_dataframe()

Security

  • No API key logging: Keys are never stored, logged, or transmitted
  • No prompt storage by default: Only metadata (tokens, cost) stored
  • PII redaction: Optional redaction for user IDs
  • Encryption support: For SQL/Redis backends
tracker = CostTracker(
    store_prompts=False,          # Default: never store prompts
    redact_user_ids=True,         # Hash user IDs in storage
)

Custom Pricing

For negotiated enterprise rates:

tracker = CostTracker(
    pricing_overrides={
        "openai/gpt-4": {
            "input_cost_per_1k": 0.02,    # Your negotiated rate
            "output_cost_per_1k": 0.04,
        }
    }
)

Contributing

Contributions are welcome! Please read our contributing guidelines and submit pull requests.

License

MIT License - see LICENSE for details.

Author

Prashant Dudami - AI/ML Architect & LLM Infrastructure Expert

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_cost_guard-0.1.2.tar.gz (56.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_cost_guard-0.1.2-py3-none-any.whl (52.0 kB view details)

Uploaded Python 3

File details

Details for the file llm_cost_guard-0.1.2.tar.gz.

File metadata

  • Download URL: llm_cost_guard-0.1.2.tar.gz
  • Upload date:
  • Size: 56.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.16

File hashes

Hashes for llm_cost_guard-0.1.2.tar.gz
Algorithm Hash digest
SHA256 ad3ccb5720f1150ce86ec043452e6402365d24b6c1934f8d9a7e08666ad91b3e
MD5 58484fca6828db325cd6bbfeb22a529e
BLAKE2b-256 13a1d7a37c8997c86d1cdd595a547ec8a760e8be30215da5b7b47d0c737a8f58

See more details on using hashes here.

File details

Details for the file llm_cost_guard-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: llm_cost_guard-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 52.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.16

File hashes

Hashes for llm_cost_guard-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 40392874eb3dd21e40e7b0bffd92b6420f5c2a4df41d149f06ee4f1570747e5d
MD5 2fcaf69ce3415e0da9c815b7a5697886
BLAKE2b-256 ddc625e63930c363005a27719d0b7261e26632cf79542c298250f790d61a7a3f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page