Skip to main content

LLM Token Optimization and Cost Management for AI Product Managers and Developers

Project description

๐ŸŽฏ Token Calculator

PyPI version Downloads

Production-Ready LLM Cost Management and Observability for AI Product Managers

Token Calculator is the comprehensive toolkit for building, monitoring, and optimizing production AI agents. Track costs across multi-agent workflows, detect context rot before it causes hallucinations, and make data-driven decisions about model selectionโ€”all with enterprise-grade observability.

๐ŸŽฏ Built for AI Product Managers

If you're building AI agents in production, you know the challenges:

  • ๐Ÿ’ธ Cost Blindness: You don't see costs until the monthly bill arrives
  • ๐Ÿค– Multi-Agent Complexity: Hard to track which agent in your workflow costs what
  • ๐Ÿ”ฅ Context Rot: Conversations degrade over time, causing hallucinations
  • ๐Ÿ“Š No Visibility: Can't debug token usage through complex agent workflows
  • ๐ŸŽฒ Model Selection: Guessing which model offers the best cost/quality trade-off
  • โš ๏ธ Production Incidents: Context overflows break your app at 2 AM

Token Calculator solves all of these problems.

โœจ Key Features for Production AI

๐Ÿ“Š Cost Tracking with Multi-Dimensional Analysis

Track every LLM call with custom labels, query costs by any dimension, and identify cost anomalies before they become incidents.

from token_calculator import CostTracker, create_storage

# Track with custom dimensions
tracker = CostTracker(
    storage=create_storage("sqlite", db_path="costs.db"),
    default_labels={"environment": "production", "team": "ai"}
)

tracker.track_call(
    model="gpt-4",
    input_tokens=1000,
    output_tokens=500,
    agent_id="customer-support",
    user_id="user-123",
    session_id="session-456"
)

# Query costs by any dimension
report = tracker.get_costs(
    start_date="this-month",
    group_by=["agent_id", "model"],
    filters={"environment": "production"}
)
print(report)
# Output:
# Cost Report (1,234 calls)
#   Total Cost: $456.78
#   Breakdown:
#     customer-support | gpt-4: $234.56
#     rag-agent | gpt-4o: $123.45

๐Ÿค– Multi-Agent Workflow Tracking

Track token usage across complex agent orchestrations, identify bottlenecks, and optimize inter-agent communication.

from token_calculator import WorkflowTracker

tracker = WorkflowTracker(workflow_id="customer-support-v2")

# Track each agent in your workflow
with tracker.track_agent("router", model="gpt-4o-mini") as ctx:
    result = router.run(query)
    ctx.track_call(input_tokens=150, output_tokens=20)

with tracker.track_agent("executor", model="gpt-4") as ctx:
    final = executor.run(result)
    ctx.track_call(input_tokens=800, output_tokens=300)

# Analyze workflow
analysis = tracker.analyze()
print(analysis)
# Output:
# Workflow Analysis: customer-support-v2
#   Total Cost: $0.0520
#   Bottleneck: executor ($0.0450)
#   Efficiency: 75/100
#   Recommendations:
#     โ€ข executor accounts for >50% of cost

๐Ÿฅ Context Health Monitoring

Detect context rot, prevent hallucinations, and intelligently compress conversations before quality degrades.

from token_calculator import ConversationMonitor

monitor = ConversationMonitor(model="gpt-4", agent_id="support-agent")

for user_msg, assistant_msg in conversation:
    monitor.add_turn(user_msg, assistant_msg)

    health = monitor.check_health()

    if health.status == "context_rot":
        # Compress before quality degrades
        compressed = monitor.compress_context(
            strategy="semantic",
            target_tokens=4000,
            keep_recent=3
        )
        # Reset conversation with compressed context

print(health)
# Output:
# โš ๏ธ Context Health: CONTEXT_ROT
#   Quality Score: 65/100
#   Context Usage: 78.5%
#   Rot: 45.0%
#   Warnings:
#     โš ๏ธ  45% of context appears irrelevant
#   Recommendations:
#     ๐Ÿ’ก Use compress_context() to remove irrelevant context

๐Ÿ“ˆ Cost Forecasting & Budgeting

Forecast future costs, set budgets, and get alerted before you overspend.

from token_calculator import CostForecaster, BudgetTracker

forecaster = CostForecaster(storage=tracker.storage)

# Forecast next month
forecast = forecaster.forecast_monthly(agent_id="rag-agent")
print(forecast)
# Output:
# ๐Ÿ“ˆ Monthly Forecast:
#   Predicted: $1,234.56
#   Range: $987.65 - $1,481.47
#   Trend: increasing

# Set budget and track
budget = BudgetTracker(storage=tracker.storage)
budget.set_budget(amount=10000, period="monthly")

status = budget.get_status()
if not status.on_track:
    print(f"โš ๏ธ Projected overage: ${status.projected_overage:.2f}")

๐Ÿšจ Real-Time Alerting

Get notified immediately when costs spike, contexts overflow, or budgets are exceeded.

from token_calculator import AlertManager, AlertRule

alerts = AlertManager(webhook_url="https://hooks.slack.com/...")

# Cost spike alert
alerts.add_rule(AlertRule(
    name="cost-spike",
    condition=lambda e: e.cost > 1.0,
    severity="warning",
    message_template="High cost call: ${cost:.2f} for {agent_id}",
    channels=["console", "webhook"]
))

# Budget alert
alerts.add_budget_alert(
    budget_amount=10000,
    threshold_pct=0.8,  # Alert at 80%
    severity="warning"
)

# Alerts trigger automatically
triggered = alerts.check_event(event)

๐ŸŽฏ Model Recommendation Engine

Stop guessing which model to use. Get data-driven recommendations based on your usage patterns.

from token_calculator import ModelSelector

selector = ModelSelector(storage=tracker.storage)

# Get recommendation
rec = selector.recommend(
    current_model="gpt-4",
    requirements={"max_cost_per_1k": 0.01},
    usage_context="simple_qa"
)

print(rec)
# Output:
# ๐Ÿ’ก Model Recommendation: gpt-4o-mini
#    Current: gpt-4
#    Monthly Savings: $450.00
#    Quality Impact: -10%
#    Confidence: 85%
#    Reasoning: gpt-4o-mini costs <50% of gpt-4. Fast, cost-effective for simple Q&A

# A/B test the recommendation
test = selector.create_ab_test(
    name="gpt4-vs-gpt4o",
    model_a="gpt-4",
    model_b="gpt-4o",
    traffic_split=0.1,
    duration_days=7
)

# After 7 days...
results = selector.get_test_results(test)
print(results.recommendation)

๐Ÿ”Œ One-Line LangChain Integration

Already using LangChain? Add tracking with one line of code.

from langchain_openai import ChatOpenAI
from token_calculator import CostTracker, create_storage
from token_calculator.integrations.langchain import TokenCalculatorCallback

tracker = CostTracker(storage=create_storage("sqlite", db_path="costs.db"))

callback = TokenCalculatorCallback(
    tracker=tracker,
    agent_id="my-agent",
    environment="production"
)

# Just add callbacks parameter!
llm = ChatOpenAI(callbacks=[callback])

# All LLM calls are now tracked automatically
result = llm.invoke("Hello!")

# Check costs
report = tracker.get_costs(start_date="today")

๐Ÿ“ฆ Installation

pip install token-calculator

Optional dependencies:

# For LangChain integration
pip install token-calculator[langchain]

# For PostgreSQL storage
pip install token-calculator[postgres]

# All optional dependencies
pip install token-calculator[all]

๐Ÿš€ Quick Start

1. Basic Cost Tracking

from token_calculator import CostTracker, create_storage

tracker = CostTracker(
    storage=create_storage("sqlite", db_path="costs.db")
)

# Track LLM calls
tracker.track_call(
    model="gpt-4",
    input_tokens=1000,
    output_tokens=500,
    agent_id="my-agent"
)

# Get costs
report = tracker.get_costs(start_date="this-month")
print(f"Total cost: ${report.total_cost:.2f}")

2. Multi-Agent Workflow

from token_calculator import WorkflowTracker

tracker = WorkflowTracker(workflow_id="my-workflow")

with tracker.track_agent("planner", model="gpt-4o") as ctx:
    # Your agent code
    ctx.track_call(input_tokens=500, output_tokens=100)

with tracker.track_agent("executor", model="gpt-4") as ctx:
    # Your agent code
    ctx.track_call(input_tokens=1000, output_tokens=300)

analysis = tracker.analyze()
print(f"Total cost: ${analysis.total_cost:.4f}")

3. Context Health Monitoring

from token_calculator import ConversationMonitor

monitor = ConversationMonitor(model="gpt-4")

monitor.add_turn(
    user_message="What's the weather?",
    assistant_message="I don't have real-time weather data."
)

health = monitor.check_health()
if health.status != "healthy":
    print(health.recommendations)

๐Ÿ“š Complete Examples

AI Product Manager Daily Workflow

See examples/ai_pm_daily_workflow.py for a complete example showing:

  • โœ… Morning cost review and anomaly detection
  • โœ… Budget tracking and forecasting
  • โœ… Multi-agent workflow tracking
  • โœ… Context health monitoring
  • โœ… Setting up alerts
  • โœ… Model selection and A/B testing
  • โœ… Incident investigation
  • โœ… Weekly executive reporting

LangChain Integration

See examples/langchain_integration.py for:

  • โœ… Basic LangChain integration
  • โœ… Chain tracking
  • โœ… Multi-agent RAG systems
  • โœ… Production monitoring
  • โœ… Model optimization

๐Ÿ—๏ธ Architecture

Token Calculator uses a modular architecture:

Application Layer (Your Code)
    โ†“
Tracking Layer (CostTracker, WorkflowTracker, ConversationMonitor)
    โ†“
Intelligence Layer (Forecaster, ModelSelector, HealthCheck)
    โ†“
Alert Layer (AlertManager, BudgetTracker)
    โ†“
Storage Layer (SQLite, PostgreSQL, In-Memory)

Storage Backends

  • In-Memory: Fast, for testing/development
  • SQLite: Production-ready for single-machine deployments
  • PostgreSQL: Multi-instance production deployments
# SQLite
storage = create_storage("sqlite", db_path="costs.db")

# PostgreSQL
storage = create_storage(
    "postgresql",
    host="localhost",
    database="token_calculator",
    user="user",
    password="pass"
)

# In-Memory
storage = create_storage("memory")

๐Ÿ“Š Supported Models

40+ models across 6 providers:

  • โœ… OpenAI: GPT-4, GPT-4 Turbo, GPT-4o, GPT-4o-mini, GPT-3.5 Turbo
  • โœ… Anthropic: Claude 4.5 Opus, Claude 3.5 Sonnet, Claude 3.5 Haiku
  • โœ… Google: Gemini Pro, Gemini 1.5 Pro, Gemini 1.5 Flash
  • โœ… Meta: Llama 2, Llama 3, Llama 3.1 (all sizes)
  • โœ… Mistral: Mistral 7B, 8x7B, Small, Medium, Large
  • โœ… Cohere: Command, Command R, Command R+

๐ŸŽฏ Use Cases

For AI Product Managers

  • ๐Ÿ“Š Track costs across all agents and workflows
  • ๐ŸŽฏ Identify which agents/users drive costs
  • ๐Ÿ“ˆ Forecast costs and plan budgets
  • ๐Ÿšจ Get alerted before incidents
  • ๐Ÿ’ก Optimize model selection for cost/quality
  • ๐Ÿ“‹ Generate executive reports

For AI Engineers

  • ๐Ÿ” Debug token usage in complex workflows
  • ๐Ÿฅ Monitor context health and prevent degradation
  • โšก Optimize prompts systematically
  • ๐Ÿงช A/B test different models
  • ๐Ÿ”Œ Integrate with existing LangChain apps

For AI Teams

  • ๐Ÿ’ฐ Shared budget tracking
  • ๐Ÿ“Š Cross-team cost visibility
  • ๐ŸŽฏ Standardized monitoring
  • ๐Ÿšจ Centralized alerting
  • ๐Ÿ“ˆ Trend analysis

๐Ÿ”ง Configuration

Environment Variables

# Storage
export TOKEN_CALC_STORAGE=sqlite
export TOKEN_CALC_STORAGE_PATH=/path/to/costs.db

# Alerts
export TOKEN_CALC_WEBHOOK_URL=https://hooks.slack.com/...

# Default labels
export TOKEN_CALC_DEFAULT_LABELS=environment:production,team:ai

Configuration File

# token_calculator.yaml
storage:
  backend: sqlite
  path: ./costs.db

tracking:
  default_labels:
    environment: production
    team: ai-platform

alerts:
  rules:
    - name: budget-exceeded
      type: budget
      threshold: 1.0
      severity: critical

budgets:
  - name: monthly-prod
    amount: 10000
    period: monthly

๐Ÿ“– Documentation

๐Ÿค Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

๐Ÿ“„ License

MIT License - see LICENSE for details.

๐Ÿ™ Acknowledgments

Built for AI Product Managers building the future of AI agents.

๐Ÿ“ž Support


Built with โค๏ธ for AI Product Managers

Stop guessing. Start measuring. Build better AI agents.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

token_calculator-2.2.0.tar.gz (89.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

token_calculator-2.2.0-py3-none-any.whl (70.9 kB view details)

Uploaded Python 3

File details

Details for the file token_calculator-2.2.0.tar.gz.

File metadata

  • Download URL: token_calculator-2.2.0.tar.gz
  • Upload date:
  • Size: 89.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for token_calculator-2.2.0.tar.gz
Algorithm Hash digest
SHA256 ea0e408af1089fca818cec4ac8429c234716d0043c6974f53e0c5e270ab085bd
MD5 92adda013fc8c3c80c800354f84942ef
BLAKE2b-256 01f5c3e9a3f581f707c1a76017a58b9987c9d258258a4d52dfa9a8ca81f1c30b

See more details on using hashes here.

File details

Details for the file token_calculator-2.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for token_calculator-2.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7cb463209616071c28cb8d072212484e9f3663e57d561147122641c9f5a7efa2
MD5 0e77ba9a715af50bdd93956cc388ce02
BLAKE2b-256 b39c9fd8613110928e38d9a25e09c9f108d9156a32908e4847ce9a11e6781b10

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page