LLM Token Optimization and Cost Management for AI Product Managers and Developers
Project description
๐ฏ Token Calculator
Production-Ready LLM Cost Management and Observability for AI Product Managers
Token Calculator is the comprehensive toolkit for building, monitoring, and optimizing production AI agents. Track costs across multi-agent workflows, detect context rot before it causes hallucinations, and make data-driven decisions about model selectionโall with enterprise-grade observability.
๐ฏ Built for AI Product Managers
If you're building AI agents in production, you know the challenges:
- ๐ธ Cost Blindness: You don't see costs until the monthly bill arrives
- ๐ค Multi-Agent Complexity: Hard to track which agent in your workflow costs what
- ๐ฅ Context Rot: Conversations degrade over time, causing hallucinations
- ๐ No Visibility: Can't debug token usage through complex agent workflows
- ๐ฒ Model Selection: Guessing which model offers the best cost/quality trade-off
- โ ๏ธ Production Incidents: Context overflows break your app at 2 AM
Token Calculator solves all of these problems.
โจ Key Features for Production AI
๐ Cost Tracking with Multi-Dimensional Analysis
Track every LLM call with custom labels, query costs by any dimension, and identify cost anomalies before they become incidents.
from token_calculator import CostTracker, create_storage
# Track with custom dimensions
tracker = CostTracker(
storage=create_storage("sqlite", db_path="costs.db"),
default_labels={"environment": "production", "team": "ai"}
)
tracker.track_call(
model="gpt-4",
input_tokens=1000,
output_tokens=500,
agent_id="customer-support",
user_id="user-123",
session_id="session-456"
)
# Query costs by any dimension
report = tracker.get_costs(
start_date="this-month",
group_by=["agent_id", "model"],
filters={"environment": "production"}
)
print(report)
# Output:
# Cost Report (1,234 calls)
# Total Cost: $456.78
# Breakdown:
# customer-support | gpt-4: $234.56
# rag-agent | gpt-4o: $123.45
๐ค Multi-Agent Workflow Tracking
Track token usage across complex agent orchestrations, identify bottlenecks, and optimize inter-agent communication.
from token_calculator import WorkflowTracker
tracker = WorkflowTracker(workflow_id="customer-support-v2")
# Track each agent in your workflow
with tracker.track_agent("router", model="gpt-4o-mini") as ctx:
result = router.run(query)
ctx.track_call(input_tokens=150, output_tokens=20)
with tracker.track_agent("executor", model="gpt-4") as ctx:
final = executor.run(result)
ctx.track_call(input_tokens=800, output_tokens=300)
# Analyze workflow
analysis = tracker.analyze()
print(analysis)
# Output:
# Workflow Analysis: customer-support-v2
# Total Cost: $0.0520
# Bottleneck: executor ($0.0450)
# Efficiency: 75/100
# Recommendations:
# โข executor accounts for >50% of cost
๐ฅ Context Health Monitoring
Detect context rot, prevent hallucinations, and intelligently compress conversations before quality degrades.
from token_calculator import ConversationMonitor
monitor = ConversationMonitor(model="gpt-4", agent_id="support-agent")
for user_msg, assistant_msg in conversation:
monitor.add_turn(user_msg, assistant_msg)
health = monitor.check_health()
if health.status == "context_rot":
# Compress before quality degrades
compressed = monitor.compress_context(
strategy="semantic",
target_tokens=4000,
keep_recent=3
)
# Reset conversation with compressed context
print(health)
# Output:
# โ ๏ธ Context Health: CONTEXT_ROT
# Quality Score: 65/100
# Context Usage: 78.5%
# Rot: 45.0%
# Warnings:
# โ ๏ธ 45% of context appears irrelevant
# Recommendations:
# ๐ก Use compress_context() to remove irrelevant context
๐ Cost Forecasting & Budgeting
Forecast future costs, set budgets, and get alerted before you overspend.
from token_calculator import CostForecaster, BudgetTracker
forecaster = CostForecaster(storage=tracker.storage)
# Forecast next month
forecast = forecaster.forecast_monthly(agent_id="rag-agent")
print(forecast)
# Output:
# ๐ Monthly Forecast:
# Predicted: $1,234.56
# Range: $987.65 - $1,481.47
# Trend: increasing
# Set budget and track
budget = BudgetTracker(storage=tracker.storage)
budget.set_budget(amount=10000, period="monthly")
status = budget.get_status()
if not status.on_track:
print(f"โ ๏ธ Projected overage: ${status.projected_overage:.2f}")
๐จ Real-Time Alerting
Get notified immediately when costs spike, contexts overflow, or budgets are exceeded.
from token_calculator import AlertManager, AlertRule
alerts = AlertManager(webhook_url="https://hooks.slack.com/...")
# Cost spike alert
alerts.add_rule(AlertRule(
name="cost-spike",
condition=lambda e: e.cost > 1.0,
severity="warning",
message_template="High cost call: ${cost:.2f} for {agent_id}",
channels=["console", "webhook"]
))
# Budget alert
alerts.add_budget_alert(
budget_amount=10000,
threshold_pct=0.8, # Alert at 80%
severity="warning"
)
# Alerts trigger automatically
triggered = alerts.check_event(event)
๐ฏ Model Recommendation Engine
Stop guessing which model to use. Get data-driven recommendations based on your usage patterns.
from token_calculator import ModelSelector
selector = ModelSelector(storage=tracker.storage)
# Get recommendation
rec = selector.recommend(
current_model="gpt-4",
requirements={"max_cost_per_1k": 0.01},
usage_context="simple_qa"
)
print(rec)
# Output:
# ๐ก Model Recommendation: gpt-4o-mini
# Current: gpt-4
# Monthly Savings: $450.00
# Quality Impact: -10%
# Confidence: 85%
# Reasoning: gpt-4o-mini costs <50% of gpt-4. Fast, cost-effective for simple Q&A
# A/B test the recommendation
test = selector.create_ab_test(
name="gpt4-vs-gpt4o",
model_a="gpt-4",
model_b="gpt-4o",
traffic_split=0.1,
duration_days=7
)
# After 7 days...
results = selector.get_test_results(test)
print(results.recommendation)
๐ One-Line LangChain Integration
Already using LangChain? Add tracking with one line of code.
from langchain_openai import ChatOpenAI
from token_calculator import CostTracker, create_storage
from token_calculator.integrations.langchain import TokenCalculatorCallback
tracker = CostTracker(storage=create_storage("sqlite", db_path="costs.db"))
callback = TokenCalculatorCallback(
tracker=tracker,
agent_id="my-agent",
environment="production"
)
# Just add callbacks parameter!
llm = ChatOpenAI(callbacks=[callback])
# All LLM calls are now tracked automatically
result = llm.invoke("Hello!")
# Check costs
report = tracker.get_costs(start_date="today")
๐ฆ Installation
pip install token-calculator
Optional dependencies:
# For LangChain integration
pip install token-calculator[langchain]
# For PostgreSQL storage
pip install token-calculator[postgres]
# All optional dependencies
pip install token-calculator[all]
๐ Quick Start
1. Basic Cost Tracking
from token_calculator import CostTracker, create_storage
tracker = CostTracker(
storage=create_storage("sqlite", db_path="costs.db")
)
# Track LLM calls
tracker.track_call(
model="gpt-4",
input_tokens=1000,
output_tokens=500,
agent_id="my-agent"
)
# Get costs
report = tracker.get_costs(start_date="this-month")
print(f"Total cost: ${report.total_cost:.2f}")
2. Multi-Agent Workflow
from token_calculator import WorkflowTracker
tracker = WorkflowTracker(workflow_id="my-workflow")
with tracker.track_agent("planner", model="gpt-4o") as ctx:
# Your agent code
ctx.track_call(input_tokens=500, output_tokens=100)
with tracker.track_agent("executor", model="gpt-4") as ctx:
# Your agent code
ctx.track_call(input_tokens=1000, output_tokens=300)
analysis = tracker.analyze()
print(f"Total cost: ${analysis.total_cost:.4f}")
3. Context Health Monitoring
from token_calculator import ConversationMonitor
monitor = ConversationMonitor(model="gpt-4")
monitor.add_turn(
user_message="What's the weather?",
assistant_message="I don't have real-time weather data."
)
health = monitor.check_health()
if health.status != "healthy":
print(health.recommendations)
๐ Complete Examples
AI Product Manager Daily Workflow
See examples/ai_pm_daily_workflow.py for a complete example showing:
- โ Morning cost review and anomaly detection
- โ Budget tracking and forecasting
- โ Multi-agent workflow tracking
- โ Context health monitoring
- โ Setting up alerts
- โ Model selection and A/B testing
- โ Incident investigation
- โ Weekly executive reporting
LangChain Integration
See examples/langchain_integration.py for:
- โ Basic LangChain integration
- โ Chain tracking
- โ Multi-agent RAG systems
- โ Production monitoring
- โ Model optimization
๐๏ธ Architecture
Token Calculator uses a modular architecture:
Application Layer (Your Code)
โ
Tracking Layer (CostTracker, WorkflowTracker, ConversationMonitor)
โ
Intelligence Layer (Forecaster, ModelSelector, HealthCheck)
โ
Alert Layer (AlertManager, BudgetTracker)
โ
Storage Layer (SQLite, PostgreSQL, In-Memory)
Storage Backends
- In-Memory: Fast, for testing/development
- SQLite: Production-ready for single-machine deployments
- PostgreSQL: Multi-instance production deployments
# SQLite
storage = create_storage("sqlite", db_path="costs.db")
# PostgreSQL
storage = create_storage(
"postgresql",
host="localhost",
database="token_calculator",
user="user",
password="pass"
)
# In-Memory
storage = create_storage("memory")
๐ Supported Models
40+ models across 6 providers:
- โ OpenAI: GPT-4, GPT-4 Turbo, GPT-4o, GPT-4o-mini, GPT-3.5 Turbo
- โ Anthropic: Claude 4.5 Opus, Claude 3.5 Sonnet, Claude 3.5 Haiku
- โ Google: Gemini Pro, Gemini 1.5 Pro, Gemini 1.5 Flash
- โ Meta: Llama 2, Llama 3, Llama 3.1 (all sizes)
- โ Mistral: Mistral 7B, 8x7B, Small, Medium, Large
- โ Cohere: Command, Command R, Command R+
๐ฏ Use Cases
For AI Product Managers
- ๐ Track costs across all agents and workflows
- ๐ฏ Identify which agents/users drive costs
- ๐ Forecast costs and plan budgets
- ๐จ Get alerted before incidents
- ๐ก Optimize model selection for cost/quality
- ๐ Generate executive reports
For AI Engineers
- ๐ Debug token usage in complex workflows
- ๐ฅ Monitor context health and prevent degradation
- โก Optimize prompts systematically
- ๐งช A/B test different models
- ๐ Integrate with existing LangChain apps
For AI Teams
- ๐ฐ Shared budget tracking
- ๐ Cross-team cost visibility
- ๐ฏ Standardized monitoring
- ๐จ Centralized alerting
- ๐ Trend analysis
๐ง Configuration
Environment Variables
# Storage
export TOKEN_CALC_STORAGE=sqlite
export TOKEN_CALC_STORAGE_PATH=/path/to/costs.db
# Alerts
export TOKEN_CALC_WEBHOOK_URL=https://hooks.slack.com/...
# Default labels
export TOKEN_CALC_DEFAULT_LABELS=environment:production,team:ai
Configuration File
# token_calculator.yaml
storage:
backend: sqlite
path: ./costs.db
tracking:
default_labels:
environment: production
team: ai-platform
alerts:
rules:
- name: budget-exceeded
type: budget
threshold: 1.0
severity: critical
budgets:
- name: monthly-prod
amount: 10000
period: monthly
๐ Documentation
- Product Requirements Document - Vision and requirements
- Architecture Design - Technical architecture
- Gap Analysis - Feature roadmap
๐ค Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
๐ License
MIT License - see LICENSE for details.
๐ Acknowledgments
Built for AI Product Managers building the future of AI agents.
๐ Support
- ๐ Issues: GitHub Issues
- ๐ฌ Discussions: GitHub Discussions
- ๐ง Email: Contact
Built with โค๏ธ for AI Product Managers
Stop guessing. Start measuring. Build better AI agents.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file token_calculator-2.2.0.tar.gz.
File metadata
- Download URL: token_calculator-2.2.0.tar.gz
- Upload date:
- Size: 89.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea0e408af1089fca818cec4ac8429c234716d0043c6974f53e0c5e270ab085bd
|
|
| MD5 |
92adda013fc8c3c80c800354f84942ef
|
|
| BLAKE2b-256 |
01f5c3e9a3f581f707c1a76017a58b9987c9d258258a4d52dfa9a8ca81f1c30b
|
File details
Details for the file token_calculator-2.2.0-py3-none-any.whl.
File metadata
- Download URL: token_calculator-2.2.0-py3-none-any.whl
- Upload date:
- Size: 70.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7cb463209616071c28cb8d072212484e9f3663e57d561147122641c9f5a7efa2
|
|
| MD5 |
0e77ba9a715af50bdd93956cc388ce02
|
|
| BLAKE2b-256 |
b39c9fd8613110928e38d9a25e09c9f108d9156a32908e4847ce9a11e6781b10
|