Skip to main content

Track LLM API costs per request. Know where your tokens go.

Project description

LLM Cost Tracker

Track LLM API costs per request. Know where your tokens go.

Zero dependencies. Pure Python. Works with any LLM provider.

pip install llm-cost-tracker

Quickstart

from llm_cost_tracker import CostTracker

tracker = CostTracker("./llm_costs.db")

# Track an OpenAI call
tracker.record(
    prompt_tokens=847,
    completion_tokens=234,
    model="gpt-4o-mini",
    provider="openai",
)

# Track an Anthropic call
tracker.record(
    prompt_tokens=1200,
    completion_tokens=890,
    model="claude-3-5-sonnet",
    provider="anthropic",
)

# Track a request you handled locally (no LLM call)
tracker.record(
    prompt_tokens=0,
    completion_tokens=0,
    model="gpt-4o-mini",
    provider="openai",
    route="local",
    prompt_text="where is the login function defined",
    intent="code_lookup",
)

# See where your money is going
report = tracker.report(window="7d")
print(f"Total cost: ${report['total_cost_usd']:.4f}")
print(f"Total tokens: {report['total_tokens']:,}")
print(f"Requests: {report['total_requests']}")
print(f"Local vs external: {report['local_count']} / {report['external_count']}")
print(f"Estimated savings: ${report['total_saved_full_modeled_usd']:.4f}")
print(f"Cost by model: {report['cost_by_model']}")

# The important part — how much are you wasting?
print(f"\n--- Waste Analysis ---")
print(f"Avoidable external requests: {report['avoidable_external_requests']}")
print(f"Money wasted on unnecessary LLM calls: ${report['avoidable_cost_usd']:.4f}")
print(f"Additional savings from model downgrades: ${report['potential_model_downgrade_savings_usd']:.4f}")
print(report['optimization_summary'])

What it tracks

Every call to tracker.record() stores:

  • Tokens used — prompt + completion, per request
  • Cost in USD — calculated from built-in pricing tables (40+ models)
  • Route — was this handled locally or sent to an LLM?
  • Counterfactual savings — if you handled it locally, how much did you save vs sending it to the LLM?
  • Model, provider, intent, session — slice your costs any way you want

Reports

# Last 7 days, grouped by model
report = tracker.report(window="7d", group_by="model")

# Last 24 hours, specific session
report = tracker.report(window="1d", session_key="user-123")

# All time
report = tracker.report()

Report fields:

  • total_requests, total_cost_usd, total_tokens
  • total_prompt_tokens, total_completion_tokens
  • local_count, external_count
  • total_saved_prompt_only_usd, total_saved_full_modeled_usd
  • requests_by_route, cost_by_model, cost_by_provider
  • tokens_by_model, savings_by_intent
  • avoidable_external_requests — requests sent to LLM that didn't need one
  • avoidable_cost_usd — money wasted on those unnecessary calls
  • avoidable_percent — what % of your external calls were avoidable
  • potential_model_downgrade_savings_usd — savings from using cheaper models
  • optimization_summary — human-readable summary of waste found
  • breakdown (when group_by is specified)

Snapshots (for dashboards & cron jobs)

# Capture a daily snapshot
snapshot = tracker.capture_snapshot(window_hours=24, job_name="daily-cost-report")
print(f"Net savings: ${snapshot['net_savings_conservative_usd']:.4f}")

# View recent snapshots
for s in tracker.snapshots(limit=7):
    print(f"{s['job_name']}: saved ${s['saved_full_modeled_usd']:.4f}, spent ${s['external_cost_usd']:.4f}")

Built-in pricing (40+ models)

Pricing is built in for OpenAI, Anthropic, Google, Meta, Mistral, and DeepSeek models. Prices are USD per 1M tokens and auto-matched by model name.

from llm_cost_tracker import lookup_pricing

inp, out, source = lookup_pricing("gpt-4o-mini")
print(f"Input: ${inp}/1M tokens, Output: ${out}/1M tokens")
# Input: $0.15/1M tokens, Output: $0.6/1M tokens

Custom pricing:

from llm_cost_tracker.pricing import DEFAULT_PRICING
DEFAULT_PRICING["my-custom-model"] = (1.00, 3.00)  # input, output per 1M tokens

Integration examples

With OpenAI

import openai
from llm_cost_tracker import CostTracker

client = openai.OpenAI()
tracker = CostTracker("./costs.db")

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}],
)

tracker.record(
    prompt_tokens=response.usage.prompt_tokens,
    completion_tokens=response.usage.completion_tokens,
    model="gpt-4o-mini",
    provider="openai",
)

With Anthropic

import anthropic
from llm_cost_tracker import CostTracker

client = anthropic.Anthropic()
tracker = CostTracker("./costs.db")

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
)

tracker.record(
    prompt_tokens=response.usage.input_tokens,
    completion_tokens=response.usage.output_tokens,
    model="claude-sonnet-4-20250514",
    provider="anthropic",
)

With LiteLLM

import litellm
from llm_cost_tracker import CostTracker

tracker = CostTracker("./costs.db")

response = litellm.completion(model="gpt-4o-mini", messages=[{"role": "user", "content": "Hello"}])

tracker.record(
    prompt_tokens=response.usage.prompt_tokens,
    completion_tokens=response.usage.completion_tokens,
    model="gpt-4o-mini",
    provider="openai",
)

How it works

  • SQLite database — all data stored locally in a single file. No external services.
  • Zero dependencies — pure Python stdlib. No numpy, no pandas, no requests.
  • WAL mode — concurrent reads while writing. Safe for multi-threaded apps.
  • Built-in pricing — 40+ models with auto-matching. Falls back gracefully for unknown models.
  • Counterfactual tracking — when you handle a request locally, it estimates what the LLM call would have cost, so you can see real savings.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_costlog-0.1.0.tar.gz (14.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_costlog-0.1.0-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file llm_costlog-0.1.0.tar.gz.

File metadata

  • Download URL: llm_costlog-0.1.0.tar.gz
  • Upload date:
  • Size: 14.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for llm_costlog-0.1.0.tar.gz
Algorithm Hash digest
SHA256 583babe14cd29cddf61d7c716613a1567d577ea6be51d9943d745bcf451c633c
MD5 907ba3143edd6fe8d10f13b0f4f084be
BLAKE2b-256 236968e64edd21754a92e471f335d7c849cb6d1373de726767925f1f53aa7a04

See more details on using hashes here.

File details

Details for the file llm_costlog-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llm_costlog-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for llm_costlog-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0b8b12b8c39d161dc59e41f02f3083885e883ebe212c98998163ade5078094b9
MD5 4994238a897bf9d750feadbd5e7be6a4
BLAKE2b-256 421a9de1c1c123f95dd8a22ecadaee45d246e45a0e87c8581b7228b9898ef237

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page