Skip to main content

Track LLM API costs per request. Know where your tokens go.

Project description

LLM Cost Tracker

Track LLM API costs per request. Know where your tokens go.

Zero dependencies. Pure Python. Works with any LLM provider.

pip install llm-cost-tracker

Quickstart

from llm_cost_tracker import CostTracker

tracker = CostTracker("./llm_costs.db")

# Track an OpenAI call
tracker.record(
    prompt_tokens=847,
    completion_tokens=234,
    model="gpt-4o-mini",
    provider="openai",
)

# Track an Anthropic call
tracker.record(
    prompt_tokens=1200,
    completion_tokens=890,
    model="claude-3-5-sonnet",
    provider="anthropic",
)

# Track a request you handled locally (no LLM call)
tracker.record(
    prompt_tokens=0,
    completion_tokens=0,
    model="gpt-4o-mini",
    provider="openai",
    route="local",
    prompt_text="where is the login function defined",
    intent="code_lookup",
)

# See where your money is going
report = tracker.report(window="7d")
print(f"Total cost: ${report['total_cost_usd']:.4f}")
print(f"Total tokens: {report['total_tokens']:,}")
print(f"Requests: {report['total_requests']}")
print(f"Local vs external: {report['local_count']} / {report['external_count']}")
print(f"Estimated savings: ${report['total_saved_full_modeled_usd']:.4f}")
print(f"Cost by model: {report['cost_by_model']}")

# The important part — how much are you wasting?
print(f"\n--- Waste Analysis ---")
print(f"Avoidable external requests: {report['avoidable_external_requests']}")
print(f"Money wasted on unnecessary LLM calls: ${report['avoidable_cost_usd']:.4f}")
print(f"Additional savings from model downgrades: ${report['potential_model_downgrade_savings_usd']:.4f}")
print(report['optimization_summary'])

What it tracks

Every call to tracker.record() stores:

  • Tokens used — prompt + completion, per request
  • Cost in USD — calculated from built-in pricing tables (40+ models)
  • Route — was this handled locally or sent to an LLM?
  • Counterfactual savings — if you handled it locally, how much did you save vs sending it to the LLM?
  • Model, provider, intent, session — slice your costs any way you want

Reports

# Last 7 days, grouped by model
report = tracker.report(window="7d", group_by="model")

# Last 24 hours, specific session
report = tracker.report(window="1d", session_key="user-123")

# All time
report = tracker.report()

Report fields:

  • total_requests, total_cost_usd, total_tokens
  • total_prompt_tokens, total_completion_tokens
  • local_count, external_count
  • total_saved_prompt_only_usd, total_saved_full_modeled_usd
  • requests_by_route, cost_by_model, cost_by_provider
  • tokens_by_model, savings_by_intent
  • avoidable_external_requests — requests sent to LLM that didn't need one
  • avoidable_cost_usd — money wasted on those unnecessary calls
  • avoidable_percent — what % of your external calls were avoidable
  • potential_model_downgrade_savings_usd — savings from using cheaper models
  • optimization_summary — human-readable summary of waste found
  • breakdown (when group_by is specified)

Snapshots (for dashboards & cron jobs)

# Capture a daily snapshot
snapshot = tracker.capture_snapshot(window_hours=24, job_name="daily-cost-report")
print(f"Net savings: ${snapshot['net_savings_conservative_usd']:.4f}")

# View recent snapshots
for s in tracker.snapshots(limit=7):
    print(f"{s['job_name']}: saved ${s['saved_full_modeled_usd']:.4f}, spent ${s['external_cost_usd']:.4f}")

Built-in pricing (40+ models)

Pricing is built in for OpenAI, Anthropic, Google, Meta, Mistral, and DeepSeek models. Prices are USD per 1M tokens and auto-matched by model name.

from llm_cost_tracker import lookup_pricing

inp, out, source = lookup_pricing("gpt-4o-mini")
print(f"Input: ${inp}/1M tokens, Output: ${out}/1M tokens")
# Input: $0.15/1M tokens, Output: $0.6/1M tokens

Custom pricing:

from llm_cost_tracker.pricing import DEFAULT_PRICING
DEFAULT_PRICING["my-custom-model"] = (1.00, 3.00)  # input, output per 1M tokens

Integration examples

With OpenAI

import openai
from llm_cost_tracker import CostTracker

client = openai.OpenAI()
tracker = CostTracker("./costs.db")

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}],
)

tracker.record(
    prompt_tokens=response.usage.prompt_tokens,
    completion_tokens=response.usage.completion_tokens,
    model="gpt-4o-mini",
    provider="openai",
)

With Anthropic

import anthropic
from llm_cost_tracker import CostTracker

client = anthropic.Anthropic()
tracker = CostTracker("./costs.db")

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
)

tracker.record(
    prompt_tokens=response.usage.input_tokens,
    completion_tokens=response.usage.output_tokens,
    model="claude-sonnet-4-20250514",
    provider="anthropic",
)

With LiteLLM

import litellm
from llm_cost_tracker import CostTracker

tracker = CostTracker("./costs.db")

response = litellm.completion(model="gpt-4o-mini", messages=[{"role": "user", "content": "Hello"}])

tracker.record(
    prompt_tokens=response.usage.prompt_tokens,
    completion_tokens=response.usage.completion_tokens,
    model="gpt-4o-mini",
    provider="openai",
)

How it works

  • SQLite database — all data stored locally in a single file. No external services.
  • Zero dependencies — pure Python stdlib. No numpy, no pandas, no requests.
  • WAL mode — concurrent reads while writing. Safe for multi-threaded apps.
  • Built-in pricing — 40+ models with auto-matching. Falls back gracefully for unknown models.
  • Counterfactual tracking — when you handle a request locally, it estimates what the LLM call would have cost, so you can see real savings.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_costlog-0.2.0.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_costlog-0.2.0-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file llm_costlog-0.2.0.tar.gz.

File metadata

  • Download URL: llm_costlog-0.2.0.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for llm_costlog-0.2.0.tar.gz
Algorithm Hash digest
SHA256 199decf3a805727624b21ed553037a6e9151800e07111c3eb7db73bce277d953
MD5 4f86b1b1edc6e782c6e1a731ffa208fd
BLAKE2b-256 81747895d2dc60a02484d7b7d5970b2c651f0980b566799f2d278a76fbfd1969

See more details on using hashes here.

File details

Details for the file llm_costlog-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: llm_costlog-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for llm_costlog-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 256bb99ddac389be297c160a0b9558368b91b86c0702b3f989e4495c1a80e593
MD5 6223ecd13e8d998d08849ff78715616f
BLAKE2b-256 5dc29c480de2b50ca3c6c6a1e51026cf0c0803b87df52dc966ac217b6b9c4f1b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page