Skip to main content

Track, visualize, and optimize LLM API spending. Two lines of code, zero config.

Project description

LLM Cost Profiler

Find the money you're burning on LLM APIs. Two lines of code, zero config, instant visibility.

LLM Cost Report — Last 7 Days
========================================
Total: $847.32 | 2.4M tokens | 12,847 calls

By Feature:
  summarizer         $412.80  (48.7%)  ████████████████████
  chatbot            $203.11  (24.0%)  ████████████
  classifier          $89.40  (10.5%)  █████
  content_gen         $78.22   (9.2%)  ████
  extraction          $41.50   (4.9%)  ██
  untagged            $22.29   (2.6%)  █

Warnings:
  ⚠ summarizer: 34% of calls are retries ($140.15 wasted)
  ⚠ chatbot: avg 3,200 input tokens but only 180 output tokens (context bloat)
  ⚠ classifier: using gpt-4o but output is always <10 tokens (cheaper model works)

I ran this on my own project and found $1,240/month in waste — duplicate calls that should be cached, an expensive model doing a job a cheap one handles fine, and retry loops burning money on failures. All fixable in an afternoon.


Setup — 2 lines, 30 seconds

pip install llm-cost-profiler
from llm_cost_profiler import wrap
from openai import OpenAI

client = wrap(OpenAI())  # that's it. everything is tracked now.

Your code works exactly as before. Every API call is silently logged to a local SQLite database. If logging fails for any reason, it fails silently — your app is never affected.

Works with Anthropic too:

from anthropic import Anthropic
client = wrap(Anthropic())

And async clients:

from openai import AsyncOpenAI
client = wrap(AsyncOpenAI())

What you get

llmcost report — Where your money goes

llmcost report           # last 7 days
llmcost report --days 30 # last 30 days

Shows total spend, breakdown by feature and model, and automatic warnings about retry waste, context bloat, and overpriced model usage.

llmcost hotspots — Which lines of code cost the most

Top Cost Hotspots:
  1. features/summarizer.py:47   summarize_doc()    $412.80/week   4,201 calls  ████████████████████
  2. api/chat.py:123             handle_message()   $203.11/week   3,892 calls  ██████████
  3. pipeline/classify.py:34     classify_text()     $89.40/week   2,847 calls  ████

Auto-detected from the Python call stack. No manual annotation needed.

llmcost compare — What changed

Week-over-Week Comparison:
  Total: $847.32 → was $623.10 (+36% ⚠)

  Biggest increases:
    summarizer: +$180 (+77%) — call volume doubled
    chatbot: +$44 (+28%) — avg tokens per call increased

llmcost optimize — What to fix and how much you'll save

LLM Cost Optimization Report
========================================
Current monthly spend (projected): $2,847
Potential savings found: $1,240/month (43.5%)

  #1 CACHE — classifier.py:34                        [SAVE $310/mo]
     85% of calls are exact duplicates (723 of 847/week)
     → Add @cache decorator
     Confidence: HIGH

  #2 RETRY FIX — content_gen.py:112                   [SAVE $180/mo]
     28% retry rate from JSON parse errors
     → Fix prompt to return raw JSON
     Confidence: HIGH

  #3 MODEL DOWNGRADE — classifier.py:34               [SAVE $71/mo]
     Output is always <10 tokens, one of 5 fixed labels
     → Switch gpt-4o to gpt-4o-mini
     Confidence: MEDIUM

  #4 CONTEXT BLOAT — chatbot.py:123                   [SAVE $155/mo]
     Avg 3,200 input tokens, growing over conversation
     → Truncate history to last 5 messages
     Confidence: MEDIUM

Five analyses: cache detection, retry waste, model downgrade suggestions, context bloat detection, batching opportunities.

llmcost dashboard — Visual dashboard

llmcost dashboard  # opens http://127.0.0.1:8177

Dark-themed local web dashboard with:

  • Cost summary cards and feature treemap
  • Spend timeline chart (daily/hourly)
  • Model usage breakdown
  • Hotspots table
  • Optimization waterfall chart

Auto-refreshes every 30 seconds. Single HTML file, no npm, no build step.


Tag your calls

Group costs by feature, customer, environment — whatever matters to you:

from llm_cost_profiler import tag

with tag(feature="summarizer", customer="acme_corp"):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Summarize this..."}]
    )

Tags nest. Inner tags merge with outer tags.

Cache responses

Stop paying for duplicate calls:

from llm_cost_profiler import cache

@cache(ttl=3600)  # cache for 1 hour
def classify_text(text):
    return client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": f"Classify: {text}"}]
    )

classify_text("hello")  # API call, cached
classify_text("hello")  # instant, free

Store prompts (optional)

Enable prompt storage for deeper optimization analysis:

client = wrap(OpenAI(), store_prompts=True)

Disabled by default for privacy. When enabled, the optimizer can detect near-duplicate prompts and analyze what causes retry failures.


How it works

  • Wrapper: Transparent proxy pattern — intercepts SDK method calls without monkey-patching. Your client object behaves identically.
  • Storage: SQLite with WAL mode at ~/.llmcost/data.db. Thread-safe. All data stays local.
  • Pricing: Built-in table for OpenAI and Anthropic models. Prefix-matching handles versioned model names automatically.
  • Call site detection: Walks the Python call stack to find the file and line that triggered each API call.
  • Zero dependencies: Only uses the Python standard library. The OpenAI/Anthropic SDKs are detected at runtime, not required at install time.

Requirements

  • Python 3.9+
  • No required dependencies
  • Optional: openai and/or anthropic SDKs

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_spend_profiler-0.1.0.tar.gz (35.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_spend_profiler-0.1.0-py3-none-any.whl (32.1 kB view details)

Uploaded Python 3

File details

Details for the file llm_spend_profiler-0.1.0.tar.gz.

File metadata

  • Download URL: llm_spend_profiler-0.1.0.tar.gz
  • Upload date:
  • Size: 35.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for llm_spend_profiler-0.1.0.tar.gz
Algorithm Hash digest
SHA256 53af34b1239cc9d58a0afe40b4572c5983bb7fe00182349577d78e0a4affc8c6
MD5 2625e429aedec757c126c6d6c334eaa6
BLAKE2b-256 e350973869dc88f153db4622a51d0c58f5117718e67ea9ca7d8f49e97afca863

See more details on using hashes here.

File details

Details for the file llm_spend_profiler-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_spend_profiler-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 41ac1190b5af4871c7466f086aa2c47f786d8d438ab0efc914a6aa8f4180c164
MD5 7d268c0ed98bb06870689ba50070d9e5
BLAKE2b-256 a7dd94e1c3f9a652ea94d4f663f4e0ef0443e7760853d931c4ad7faa3de3a59c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page