Skip to main content

Track LLM costs across OpenAI, Anthropic, LangChain, and 2000+ models with zero code changes

Project description

AgentCost SDK

Zero-friction LLM cost tracking for OpenAI, Anthropic, and LangChain applications.

Installation

pip install agentcost

Or install from source:

cd agentcost-sdk
pip install -e .

Quick Start

from agentcost import track_costs

# 2 lines to add cost tracking!
track_costs.init(api_key="your_api_key", project_id="my-project")

# OpenAI — automatically tracked
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}])

# Anthropic — automatically tracked
from anthropic import Anthropic
client = Anthropic()
message = client.messages.create(model="claude-3-5-sonnet-20241022", max_tokens=100, messages=[{"role": "user", "content": "Hello!"}])

# LangChain — automatically tracked
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4")
response = llm.invoke("Hello!")

Features

  • Zero Code Changes: Monkey patches OpenAI, Anthropic, and LangChain — your code works as-is
  • Automatic Tracking: Captures all create(), invoke(), ainvoke(), stream(), astream() calls
  • Accurate Tokens: Uses tiktoken for precise token counting
  • Real-Time Costs: Calculates costs using up-to-date model pricing
  • Batched Sending: Efficient network usage (size-based + time-based batching)
  • Rate Limiting: Built-in rate limiter to protect your backend
  • Local Mode: Test without a backend

Configuration

track_costs.init(
    # Required for cloud mode
    api_key="sk_...",
    project_id="my-project",

    # Optional settings
    base_url="https://api.agentcost.tech",   # Your backend URL
    batch_size=10,                          # Events before auto-flush
    flush_interval=5.0,                     # Seconds between flushes
    debug=True,                             # Enable debug logging
    default_agent_name="my-agent",          # Default agent tag
    local_mode=False,                       # Store locally (no backend)
    enabled=True,                           # Enable/disable tracking

    # Custom pricing (overrides defaults)
    custom_pricing={
        "my-custom-model": {"input": 0.001, "output": 0.002}
    },

    # Global metadata (attached to all events)
    global_metadata={
        "environment": "production",
        "version": "1.0.0"
    }
)

Agent Tagging

Tag LLM calls by agent for granular analytics:

# Option 1: Set default agent
track_costs.set_agent_name("router-agent")

# Option 2: Context manager (recommended)
with track_costs.agent("technical-agent"):
    llm.invoke("How do I fix this?")  # Tagged as "technical-agent"

with track_costs.agent("billing-agent"):
    llm.invoke("What's my balance?")  # Tagged as "billing-agent"

Metadata

Attach custom metadata for filtering/grouping:

# Persistent metadata
track_costs.add_metadata("user_id", "user_123")
track_costs.add_metadata("tenant_id", "acme_corp")

# Temporary metadata (context manager)
with track_costs.metadata(conversation_id="conv_456", step="routing"):
    llm.invoke("Route this query")

Local Testing

Test without running a backend:

track_costs.init(local_mode=True, debug=True)

# Make LLM calls
llm.invoke("Hello!")
llm.invoke("World!")

# Retrieve captured events
events = track_costs.get_local_events()
for event in events:
    print(f"Model: {event['model']}")
    print(f"Tokens: {event['total_tokens']}")
    print(f"Cost: ${event['cost']:.6f}")

Streaming Support

Streaming calls are automatically tracked:

# Sync streaming
for chunk in llm.stream("Tell me a story"):
    print(chunk.content, end="")
# Event recorded after stream completes

# Async streaming
async for chunk in llm.astream("Tell me a story"):
    print(chunk.content, end="")
# Event recorded after stream completes

Event Structure

Each tracked event contains:

{
    "agent_name": "my-agent",
    "model": "gpt-4",
    "input_tokens": 150,
    "output_tokens": 80,
    "total_tokens": 230,
    "cost": 0.0093,            # USD, real-time calculated
    "latency_ms": 1234,        # Measured latency
    "timestamp": "2026-01-23T10:30:45.123Z",
    "success": True,
    "error": None,
    "streaming": False,
    "metadata": {"conversation_id": "conv_456"}
}

Dynamic Pricing (Real-Time Updates)

The SDK automatically fetches the latest pricing from the backend. This means:

  • No code changes when model prices change
  • Pricing is cached for 24 hours (efficient)
  • Falls back to built-in defaults if backend is unavailable

How It Works

# SDK automatically fetches pricing from backend
track_costs.init(
    api_key="...",
    project_id="...",
    base_url="http://localhost:8000",  # If running locally
)

# Prices are fetched once and cached
# GET http://localhost:8000/v1/pricing → {"pricing": {"gpt-4": {"input": 0.03, ...}}}

Manually Update Pricing

from agentcost.cost_calculator import refresh_pricing, update_pricing

# Force refresh from backend
refresh_pricing()

# Or manually set pricing (doesn't require backend)
update_pricing({
    "my-custom-model": {"input": 0.001, "output": 0.002}
})

Backend Pricing API

# Get all pricing
curl http://localhost:8000/v1/pricing

# Get specific model
curl http://localhost:8000/v1/pricing/gpt-4

# Update pricing (admin)
curl -X POST http://localhost:8000/v1/pricing \
  -H "Content-Type: application/json" \
  -d '{"gpt-4": {"input": 0.025, "output": 0.05}}'

Supported Models (2000+)

The SDK supports 2000+ models across 45+ providers through dynamic pricing sync with the backend. Models are automatically updated when pricing changes.

Provider Models
OpenAI gpt-4, gpt-4-turbo, gpt-4o, gpt-4o-mini, gpt-3.5-turbo, o1, o1-mini
Anthropic claude-3-opus/sonnet/haiku, claude-3.5-sonnet/haiku, claude-4-opus
Google gemini-pro, gemini-1.5-pro/flash, gemini-2.0-flash
Groq llama-3.1-8b/70b, llama-3.2-3b, llama-3.3-70b, mixtral-8x7b
DeepSeek deepseek-chat, deepseek-coder, deepseek-reasoner
Cohere command, command-light, command-r, command-r-plus
Mistral mistral-small/medium/large
Together AI llama-3-70b/8b-chat, meta-llama models
Replicate Various open-source models
OpenRouter Aggregated models from multiple providers
Perplexity pplx models
xAI Grok models
Amazon Amazon Nova, Titan models
Azure Azure OpenAI models
AWS Bedrock models (Claude, Llama, Mistral)
Anyscale Anyscale endpoints
Cerebras Cerebras models
Cloudflare Workers AI models
Databricks DBRX, Meta Llama models
DeepInfra Various hosted models
Fireworks Fireworks AI models
Hyperbolic Hyperbolic models
Jina AI Embedding models
Lambda Lambda models
MiniMax MiniMax models
Moonshot Moonshot models
Sambanova Samba models
Voyage Embedding models
IBM watsonx models
AI21 AI21 Labs models
Aleph Alpha Aleph Alpha models
Novita Novita hosted models
Gradient AI Gradient endpoints
Dashscope Dashscope models (Alibaba)
Friendliai Friendliai models
GMI GMI models
Llamagate Llamagate models
Morph Morph models
NLP Cloud NLP Cloud endpoints
Nscale Nscale models
Oracle OCI generative models
OVHCloud OVHCloud models
Vercel Vercel AI Gateway, v0 models
Weights & Biases Wandb models
Zai Zai models

Note: The full list of 2000+ models is dynamically loaded from the backend. Run track_costs.init() with a valid API key to access all supported models.

Statistics

stats = track_costs.get_stats()
print(f"Events sent: {stats['batcher']['events_sent']}")
print(f"Batches sent: {stats['batcher']['batches_sent']}")

Graceful Shutdown

track_costs.flush()     # Send pending events
track_costs.shutdown()  # Full shutdown

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentcost-0.1.3.tar.gz (38.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentcost-0.1.3-py3-none-any.whl (36.0 kB view details)

Uploaded Python 3

File details

Details for the file agentcost-0.1.3.tar.gz.

File metadata

  • Download URL: agentcost-0.1.3.tar.gz
  • Upload date:
  • Size: 38.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for agentcost-0.1.3.tar.gz
Algorithm Hash digest
SHA256 ee7c0ae5755762c67a55ca91c20026977398f06d7160a89c55a631684c853ba4
MD5 21f0bbd5049995aaaeb29c8decd109ac
BLAKE2b-256 22fd2e7327f5fa695539545c62e2059927b37ab2664195ca0c294c833c648214

See more details on using hashes here.

File details

Details for the file agentcost-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: agentcost-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 36.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for agentcost-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3205dfecac207dfa2bfb060049eb1d02302c27e86fe94631be15702a0af77e61
MD5 e68f205df9104a72dbb072dc81ff96cd
BLAKE2b-256 e8a2a7d352abfa883e8412327ca10d59526083f11348c5f0512d01fa642dd5ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page