Skip to main content

Your AI copilot for LLM costs - Modern, plugin-based cost tracking for LangChain, LangGraph, and LlamaIndex with 5 usage patterns

Project description

token-copilot

Your AI copilot for LLM costs - Modern, plugin-based cost tracking for LangChain, LangGraph, and LlamaIndex

License: MIT Python 3.8+ PyPI version Version Tests


๐Ÿš€ What is token-copilot?

A lightweight, production-ready library for tracking and optimizing LLM costs. Track costs in real-time, enforce budgets, analyze usage patterns, and export data for reporting - all with minimal configuration.

Key Benefits:

  • ๐ŸŽฏ Zero Config - Start tracking with 2 lines of code
  • ๐Ÿ’ฐ Budget Control - Automatic budget enforcement and alerts
  • ๐Ÿ“Š Multi-Tenant - Track costs by user, org, session, or any dimension
  • ๐Ÿ”Œ Plugin-Based - Add features only when needed
  • ๐ŸŒ Framework Agnostic - Works with LangChain, LangGraph, LlamaIndex
  • โ˜๏ธ Azure OpenAI - Full support with automatic cost calculation

Verified Working:

  • โœ… 12/12 core tests passing with Azure OpenAI
  • โœ… Production-tested with real workloads
  • โœ… 19+ supported LLM models (OpenAI, Anthropic, Ollama)

โšก Quick Start

pip install token-copilot
from token_copilot import TokenCoPilot
from langchain_openai import ChatOpenAI

# Create copilot with budget
copilot = TokenCoPilot(budget_limit=10.00)
llm = ChatOpenAI(model="gpt-4o-mini", callbacks=[copilot])

# Use normally
result = llm.invoke("What is Python?")

# Get metrics
print(f"Cost: ${copilot.cost:.4f}")
print(f"Tokens: {copilot.tokens:,}")
print(f"Remaining: ${copilot.get_remaining_budget():.2f}")

That's it! You're now tracking costs and enforcing budgets.


๐Ÿ“ฆ Installation

Basic (Core Features)

pip install token-copilot

With Analytics

pip install token-copilot[analytics]

With Streaming

pip install token-copilot[streaming]

All Features

pip install token-copilot[all]

โœจ Features

Core Features

Feature Description
Cost Tracking Automatic token and cost tracking for all LLM calls
Budget Enforcement Hard stops at budget limits with configurable actions
Multi-Tenant Track costs by user, organization, session, or custom dimensions
DataFrame Export Export to pandas for advanced analytics and reporting
19+ Models Support for OpenAI, Anthropic, Azure OpenAI, and Ollama models
Real-time Stats Get total cost, tokens, averages, and remaining budget instantly

Optional Plugins

Plugin Description
Persistence Save cost history to SQLite or JSON for long-term tracking
Analytics Detect waste, anomalies, and efficiency issues automatically
Streaming Stream events to Webhook, Kafka, Syslog, or OpenTelemetry
Routing Intelligent model selection based on cost and quality
Adaptive Auto-adjust parameters based on remaining budget
Forecasting Predict when budget will be exhausted

Framework Support

  • โœ… LangChain - Full support via callbacks
  • โœ… LangGraph - Works with graph-based workflows
  • โœ… LlamaIndex - Dedicated callback handler
  • โœ… Azure OpenAI - Automatic model detection and cost calculation

๐ŸŽฏ Usage Patterns

Choose the style that fits your needs:

1. Minimal (Simplest)

from token_copilot import TokenCoPilot
from langchain_openai import ChatOpenAI

copilot = TokenCoPilot(budget_limit=10.00)
llm = ChatOpenAI(callbacks=[copilot])
result = llm.invoke("Hello!")
print(f"Cost: ${copilot.cost:.4f}")

2. Builder (Fluent API)

copilot = (TokenCoPilot(budget_limit=100.00)
    .with_persistence(backend=SQLiteBackend("costs.db"))
    .with_analytics(detect_anomalies=True)
    .with_adaptive()
    .build()
)
llm = ChatOpenAI(callbacks=[copilot])

3. Factory Presets

from token_copilot.presets import production

copilot = production(
    budget_limit=1000.00,
    webhook_url="https://monitoring.example.com",
    detect_anomalies=True
)
llm = ChatOpenAI(callbacks=[copilot])

4. Context Manager

from token_copilot import track_costs

with track_costs(budget_limit=5.00) as copilot:
    llm = ChatOpenAI(callbacks=[copilot])
    result = llm.invoke("Hello!")
    print(f"Cost: ${copilot.cost:.4f}")
# Automatic summary on exit

5. Decorator

from token_copilot.decorators import track_cost

@track_cost(budget_limit=5.00)
def process_text(text):
    llm = ChatOpenAI(callbacks=[process_text.copilot])
    return llm.invoke(f"Process: {text}")

result = process_text("my text")
print(f"Cost: ${process_text.copilot.cost:.4f}")

๐Ÿ“– Core Examples

Budget Enforcement

from token_copilot import TokenCoPilot, BudgetExceededError

copilot = TokenCoPilot(
    budget_limit=1.00,
    on_budget_exceeded="raise"  # Options: "raise", "warn", "ignore"
)

llm = ChatOpenAI(callbacks=[copilot])

try:
    result = llm.invoke("Expensive task...")
except BudgetExceededError as e:
    print(f"Budget exceeded: {e}")

Multi-Tenant Tracking

copilot = TokenCoPilot(budget_limit=100.00)
llm = ChatOpenAI(callbacks=[copilot])

# Track per user
result = llm.invoke(
    "Hello",
    config={
        "metadata": {
            "user_id": "user_123",
            "org_id": "org_456",
            "session_id": "session_789"
        }
    }
)

# Get costs by dimension
user_costs = copilot.tracker.get_costs_by("user_id")
org_costs = copilot.tracker.get_costs_by("org_id")

print(f"User user_123: ${user_costs['user_123']:.4f}")
print(f"Org org_456: ${org_costs['org_456']:.4f}")

DataFrame Export & Analytics

import pandas as pd

copilot = TokenCoPilot()
llm = ChatOpenAI(callbacks=[copilot])

# Make calls...
for i in range(100):
    result = llm.invoke(f"Task {i}")

# Export to DataFrame
df = copilot.to_dataframe()

# Analyze
print(df.groupby('user_id')['cost'].sum())
print(df.groupby('model')['cost'].mean())

# Time series
hourly_costs = df.resample('H')['cost'].sum()

# Save reports
df.to_csv('llm_costs.csv')
df.to_excel('llm_costs.xlsx')

Statistics & Metrics

copilot = TokenCoPilot(budget_limit=50.00)
llm = ChatOpenAI(callbacks=[copilot])

# Make some calls...

# Get statistics
stats = copilot.get_stats()
print(f"Total Calls: {stats['total_calls']}")
print(f"Total Cost: ${stats['total_cost']:.4f}")
print(f"Total Tokens: {stats['total_tokens']:,}")
print(f"Avg Cost/Call: ${stats['avg_cost_per_call']:.4f}")
print(f"Avg Tokens/Call: {stats['avg_tokens_per_call']:.1f}")

# Check remaining budget
remaining = copilot.get_remaining_budget()
print(f"Remaining: ${remaining:.2f}")

๐Ÿ”Œ Plugin Examples

Persistence (Save History)

from token_copilot import TokenCoPilot
from token_copilot.plugins import SQLiteBackend

# SQLite backend (production-ready)
backend = SQLiteBackend(db_path="costs.db")
copilot = (TokenCoPilot(budget_limit=100.00)
    .with_persistence(backend=backend, session_id="session_123")
)

llm = ChatOpenAI(callbacks=[copilot])
response = llm.invoke("Hello!")

# Query historical data
plugin = copilot._plugin_manager.get_plugins()[0]
summary = plugin.get_summary()
print(f"Total cost: ${summary['total_cost']:.2f}")
print(f"Total calls: {summary['total_calls']}")

# Get recent events
from datetime import datetime, timedelta
events = plugin.get_events(
    start_time=datetime.now() - timedelta(hours=24)
)

Analytics (Detect Issues)

copilot = (TokenCoPilot(budget_limit=100.00)
    .with_analytics(detect_anomalies=True, track_waste=True)
)

llm = ChatOpenAI(callbacks=[copilot])

# Make calls...
for i in range(50):
    result = llm.invoke(f"Task {i}")

# Get analytics
from token_copilot.plugins.analytics import AnalyticsPlugin
analytics = copilot._plugin_manager.get_plugins(AnalyticsPlugin)[0]

# Check for anomalies
anomalies = analytics.get_anomalies(minutes=60)
for anomaly in anomalies:
    print(f"[{anomaly.severity}] {anomaly.message}")

Adaptive (Budget-Based Adjustments)

copilot = (TokenCoPilot(budget_limit=100.00)
    .with_adaptive()
)

from token_copilot.plugins.adaptive import AdaptivePlugin
adaptive = copilot._plugin_manager.get_plugins(AdaptivePlugin)[0]

# Get current budget tier
tier_info = adaptive.get_tier_info()
print(f"Budget tier: {tier_info['tier_name']}")  # abundant, comfortable, constrained, critical
print(f"Remaining: ${tier_info['remaining']:.2f}")

# Operations automatically adjust based on tier
# - abundant: high quality, max tokens
# - comfortable: balanced
# - constrained: conservative
# - critical: minimal usage

๐ŸŒ Azure OpenAI Support

Full support with automatic cost calculation:

from token_copilot import TokenCoPilot
from langchain_openai import AzureChatOpenAI
import os

# Configure Azure OpenAI
llm = AzureChatOpenAI(
    azure_deployment="gpt-4o-mini",
    api_version="2024-02-15-preview",
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_key=os.getenv("AZURE_OPENAI_API_KEY")
)

# Use with token-copilot (costs tracked automatically)
copilot = TokenCoPilot(budget_limit=10.00)
response = llm.invoke("Hello!", config={"callbacks": [copilot]})

print(f"Cost: ${copilot.cost:.6f}")
print(f"Tokens: {copilot.tokens:,}")

Environment Setup (.env file):

AZURE_OPENAI_API_KEY=your-api-key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_VERSION=2024-02-15-preview
AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-4o-mini

Supported Models:

  • โœ… gpt-4o-mini (all versions)
  • โœ… gpt-4o (all versions)
  • โœ… gpt-4-turbo (all versions)
  • โœ… gpt-3.5-turbo (all versions)

๐ŸŽฏ LangGraph Support

Works seamlessly with LangGraph workflows:

from langgraph.graph import StateGraph, START
from langchain_openai import ChatOpenAI
from token_copilot import TokenCoPilot

copilot = TokenCoPilot(budget_limit=10.00)

# Create graph
builder = StateGraph(State)
builder.add_node("agent", agent_node)
builder.add_edge(START, "agent")
graph = builder.compile()

# Run with cost tracking
result = graph.invoke(
    {"messages": [("user", "Hello")]},
    config={"callbacks": [copilot]}
)

print(f"Total cost: ${copilot.cost:.4f}")
print(f"Total tokens: {copilot.tokens:,}")

๐Ÿ“š API Reference

TokenCoPilot Class

copilot = TokenCoPilot(
    budget_limit=100.00,           # Optional: Budget limit in USD
    budget_period="total",         # "total", "daily", "monthly", "per_user", "per_org"
    on_budget_exceeded="raise"     # "raise", "warn", "ignore"
)

Properties:

  • copilot.cost - Total cost in USD
  • copilot.tokens - Total tokens used
  • copilot.budget_limit - Current budget limit

Core Methods:

  • get_total_cost() - Get total cost
  • get_total_tokens() - Get total tokens
  • get_stats() - Get summary statistics (dict)
  • get_remaining_budget(metadata=None) - Get remaining budget
  • to_dataframe() - Export to pandas DataFrame

Builder Methods:

  • .with_persistence(backend, session_id) - Add persistence
  • .with_analytics(detect_anomalies=True) - Add analytics
  • .with_streaming(webhook_url=...) - Add streaming
  • .with_adaptive() - Add adaptive operations
  • .with_forecasting(forecast_hours=48) - Add forecasting
  • .build() - Finalize (optional)

Pricing Utilities

from token_copilot import get_model_config, calculate_cost, list_models

# Get model configuration
config = get_model_config("gpt-4o-mini")
print(config.input_cost_per_1m)   # Cost per 1M input tokens
print(config.output_cost_per_1m)  # Cost per 1M output tokens

# Calculate cost
cost = calculate_cost("gpt-4o-mini", input_tokens=1000, output_tokens=500)
print(f"Cost: ${cost:.6f}")

# List all supported models
models = list_models()  # Returns 19+ model IDs

Direct Tracker Usage (Without LangChain)

from token_copilot.tracking import MultiTenantTracker

tracker = MultiTenantTracker()
entry = tracker.track(
    model="gpt-4o-mini",
    input_tokens=100,
    output_tokens=50,
    metadata={"user_id": "user_123"}
)

print(f"Cost: ${entry.cost:.6f}")
print(f"Total: ${tracker.get_total_cost():.6f}")

๐Ÿญ Factory Presets

Pre-configured setups for common scenarios:

from token_copilot.presets import basic, development, production, enterprise

# Basic - Just cost tracking
copilot = basic(budget_limit=10.00)

# Development - With logging and anomaly detection
copilot = development(budget_limit=50.00, detect_anomalies=True)

# Production - Full monitoring with alerts
copilot = production(
    budget_limit=1000.00,
    webhook_url="https://monitoring.example.com",
    detect_anomalies=True,
    enable_forecasting=True
)

# Enterprise - All features enabled
copilot = enterprise(
    budget_limit=10000.00,
    kafka_brokers=["kafka:9092"],
    otlp_endpoint="http://collector:4318",
    enable_all=True
)

๐Ÿ” Real-World Example

Complete chatbot with cost tracking:

from token_copilot import TokenCoPilot, BudgetExceededError
from langchain_openai import ChatOpenAI

def chatbot():
    copilot = TokenCoPilot(budget_limit=5.00, on_budget_exceeded="warn")
    llm = ChatOpenAI(model="gpt-4o-mini", callbacks=[copilot])

    print("Chatbot started! (type 'quit' to exit)")

    while True:
        user_input = input("\nYou: ")
        if user_input.lower() == 'quit':
            break

        # Check budget
        if copilot.get_remaining_budget() <= 0:
            print("Budget exhausted!")
            break

        try:
            response = llm.invoke(user_input)
            print(f"Bot: {response.content}")
            print(f"Cost this turn: ${copilot.tracker.get_last_cost():.6f}")
        except BudgetExceededError:
            print("Budget limit reached!")
            break

    # Final stats
    stats = copilot.get_stats()
    print(f"\n๐Ÿ“Š Session Summary:")
    print(f"  Total turns: {stats['total_calls']}")
    print(f"  Total cost: ${stats['total_cost']:.4f}")
    print(f"  Avg cost/turn: ${stats['avg_cost_per_call']:.4f}")

if __name__ == "__main__":
    chatbot()

โ“ FAQ

Q: Does this work with streaming responses? A: Currently tracks costs after completion. Streaming support coming in v1.0.3.

Q: Can I use without LangChain? A: Yes! Use MultiTenantTracker directly (see API Reference above).

Q: How accurate is the cost tracking? A: Uses official pricing from OpenAI and Anthropic. Updated regularly. 100% accurate for supported models.

Q: Which usage pattern should I use? A:

  • Getting started: Minimal or Factory presets
  • Production: Builder or Production preset
  • Reusable code: Decorators or Context managers

Q: Can I create custom plugins? A: Yes! Extend the Plugin base class:

from token_copilot.core import Plugin

class MyPlugin(Plugin):
    def on_cost_tracked(self, model, tokens, cost, metadata):
        print(f"Custom logic: ${cost:.6f}")

copilot = TokenCoPilot()
copilot.add_plugin(MyPlugin())

๐Ÿ“š Documentation

Full Documentation

๐Ÿ“– Complete Documentation (Single Page HTML) - Open in your browser for comprehensive guide with all features, examples, and API reference.


๐Ÿค Contributing

Contributions are welcome! We appreciate all contributions, from bug reports to new features.

How to Contribute

  1. Report Bugs - Open an issue with details
  2. Suggest Features - Share your ideas in discussions
  3. Submit PRs - Fork, create a branch, and submit a pull request
  4. Improve Docs - Help make documentation better
  5. Share Examples - Contribute real-world usage examples

Development Setup

# Clone the repository
git clone https://github.com/scionoftech/token-copilot.git
cd token-copilot

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
python test_core.py

Guidelines

  • Write clear commit messages
  • Add tests for new features
  • Update documentation as needed
  • Follow existing code style
  • Be respectful and constructive

All contributions, big or small, are appreciated!

๐Ÿ”— Links


๐Ÿ“ License

MIT License - see LICENSE file.


๐Ÿ™ Acknowledgments

Built with โค๏ธ by Sai Kumar Yava

If you find this useful, please โญ star the repo!


๐Ÿš€ Quick Start Checklist

  • Install: pip install token-copilot
  • Import: from token_copilot import TokenCoPilot
  • Create: copilot = TokenCoPilot(budget_limit=10.00)
  • Use: llm = ChatOpenAI(callbacks=[copilot])
  • Track: print(f"Cost: ${copilot.cost:.4f}")
  • You're done! ๐ŸŽ‰

Need help? Open an issue or check the documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

token_copilot-1.0.2.tar.gz (134.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

token_copilot-1.0.2-py3-none-any.whl (99.6 kB view details)

Uploaded Python 3

File details

Details for the file token_copilot-1.0.2.tar.gz.

File metadata

  • Download URL: token_copilot-1.0.2.tar.gz
  • Upload date:
  • Size: 134.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for token_copilot-1.0.2.tar.gz
Algorithm Hash digest
SHA256 989dcd6c22f142d009aff706144e7178d56f52217fb6749fe4b129b5a5249049
MD5 c5c87709268359aeeab7c1a0165184ab
BLAKE2b-256 f349bc31e59ef09d0a0112b6344c78f8c2ccc2495af068cecbf87d97cc5fc297

See more details on using hashes here.

File details

Details for the file token_copilot-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: token_copilot-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 99.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for token_copilot-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 680478b42be71d2ae94c975e94c73951d743a86b046e287fa1b793dbe6e5d3b6
MD5 6698f614fd87df9bff6247ba2d37e1ce
BLAKE2b-256 94e359a12dd8381548ff8b090b9371069f3e757962400b2a4f527befb1200d6d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page