Skip to main content

cProfile for LLMs โ€” find which function is burning your AI budget. Flame graph output, zero-config, no proxy.

Project description

llmspy ๐Ÿ”ฅ

You're spending $800/month on LLMs. Which function is burning it?

Find out in one line. No proxy. No signup. No traffic rerouting.

PyPI version Tests Python 3.10+ License: MIT Zero dependencies

pip install tokenspy

The Problem

You get an OpenAI invoice. It says $800 this month. You have no idea which function caused it.

def run_pipeline(query):
    docs = fetch_and_summarize(query)    # โ† costs $600?
    entities = extract_entities(docs)   # โ† or this one?
    return generate_report(entities)    # โ† or this one?

Langfuse and Helicone force you to reroute traffic through their proxy. Sign up. Configure. Break your local setup.

llmspy takes 1 line. No proxy. No signup. Runs entirely on your machine.


The Fix

import llmspy

@llmspy.profile
def run_pipeline(query):
    docs = fetch_and_summarize(query)
    entities = extract_entities(docs)
    return generate_report(entities)

run_pipeline("Analyze Q3 earnings")
llmspy.report()

Output

โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘  llmspy cost report                                                  โ•‘
โ•‘  total: $0.0523  ยท  18,734 tokens  ยท  3 calls                       โ•‘
โ• โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฃ
โ•‘                                                                      โ•‘
โ•‘  fetch_and_summarize      $0.038  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘  73%             โ•‘
โ•‘    โ””โ”€ gpt-4o               $0.038  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘  73%            โ•‘
โ•‘       โ””โ”€ 12,000 tokens                                               โ•‘
โ•‘                                                                      โ•‘
โ•‘  generate_report          $0.011  โ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  21%            โ•‘
โ•‘    โ””โ”€ gpt-4o               $0.011  โ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  21%            โ•‘
โ•‘       โ””โ”€ 3,600 tokens                                                โ•‘
โ•‘                                                                      โ•‘
โ•‘  extract_entities         $0.003  โ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘   6%            โ•‘
โ•‘    โ””โ”€ gpt-4o-mini          $0.003  โ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘   6%            โ•‘
โ•‘       โ””โ”€ 3,134 tokens                                                โ•‘
โ•‘                                                                      โ•‘
โ• โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฃ
โ•‘  Optimization hints                                                  โ•‘
โ•‘                                                                      โ•‘
โ•‘  ๐Ÿ”ด fetch_and_summarize [gpt-4o]                                     โ•‘
โ•‘     Switch to gpt-4o-mini โ€” 94% cheaper  (~$540/month savings)      โ•‘
โ•‘                                                                      โ•‘
โ•‘  ๐ŸŸก fetch_and_summarize [gpt-4o]                                     โ•‘
โ•‘     Avg input: 12,000 tokens. Trim context or limit retrieval.       โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Now you know: fetch_and_summarize is burning 73% of your budget. Fix that one function, cut your bill by $540/month.


Quick Start

Decorator (most common)

import llmspy

@llmspy.profile
def summarize_docs(docs: list[str]) -> str:
    return openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "\n".join(docs)}]
    ).choices[0].message.content

summarize_docs(my_docs)
llmspy.report()            # prints flame graph to terminal
llmspy.report("html")     # writes llmspy_report.html, opens in browser

Context Manager

with llmspy.session("research_task") as s:
    response = anthropic_client.messages.create(
        model="claude-haiku-4-5",
        messages=[{"role": "user", "content": query}]
    )

print(f"Cost:   {s.cost_str}")    # "$0.0012"
print(f"Tokens: {s.tokens}")      # 3,240
print(f"Calls:  {s.calls}")       # 1

Programmatic Access

data = llmspy.stats()
# {
#   "total_cost_usd": 0.042,
#   "total_tokens": 15000,
#   "total_calls": 3,
#   "by_function": {"summarize_docs": 0.038, "generate_report": 0.004},
#   "by_model":    {"gpt-4o": 0.040, "gpt-4o-mini": 0.002},
#   "calls": [...],
# }

Persistent Tracking Across Sessions

# In your app startup:
llmspy.init(persist=True)   # saves to ~/.llmspy/usage.db

# Decorate as normal โ€” costs accumulate across restarts
@llmspy.profile
def my_agent(query):
    ...

How It Works

llmspy monkey-patches the SDK client in-process โ€” the same technique used by py-spy and line_profiler:

Your Code
    โ”‚
    โ”œโ”€โ”€ @llmspy.profile โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ sets active function
    โ”‚
    โ””โ”€โ”€ openai_client.chat.completions.create(...)
                โ”‚
                โ””โ”€โ”€ llmspy interceptor (in-process monkey-patch)
                        โ”œโ”€โ”€ calls original SDK method
                        โ”œโ”€โ”€ reads response.usage (tokens)
                        โ”œโ”€โ”€ looks up cost in built-in pricing table
                        โ”œโ”€โ”€ records: function ยท model ยท tokens ยท cost ยท duration
                        โ””โ”€โ”€ returns response UNCHANGED to your code

llmspy.report() โ†’ renders flame graph from recorded data

No proxy server. No HTTP interception. No environment variables. No configuration.

Your code runs exactly as before. llmspy just watches and keeps score.


HTML Flame Graph

llmspy.report(format="html")

Opens a self-contained HTML file in your browser โ€” zero JS dependencies, pure SVG:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  llmspy โ€” Total: $0.0523  (18,734 tokens)                       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                  โ”‚
โ”‚  fetch_and_summarize  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  73%     โ”‚
โ”‚  generate_report      โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ                      21%     โ”‚
โ”‚  extract_entities     โ–ˆโ–ˆโ–ˆโ–ˆ                               6%     โ”‚
โ”‚                                                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚ Model          โ”‚  Cost   โ”‚  %    โ”‚ Input  โ”‚ Output       โ”‚   โ”‚
โ”‚  โ”‚ gpt-4o         โ”‚ $0.049  โ”‚  94%  โ”‚ 15,600 โ”‚ 4,200        โ”‚   โ”‚
โ”‚  โ”‚ gpt-4o-mini    โ”‚ $0.003  โ”‚   6%  โ”‚  3,134 โ”‚    500       โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Supported Providers

Automatically detected โ€” nothing to configure:

Provider Package Intercepted
OpenAI openai>=1.0 chat.completions.create (sync + async)
Anthropic anthropic>=0.30 messages.create (sync + async)
Google google-generativeai>=0.7 generate_content

Built-in Pricing Table

30+ models, updated Feb 2026. No API call needed.

Model Input $/1M Output $/1M
claude-opus-4-6 $15.00 $75.00
claude-sonnet-4-6 $3.00 $15.00
claude-haiku-4-5 $0.80 $4.00
gpt-4o $2.50 $10.00
gpt-4o-mini $0.15 $0.60
o1 $15.00 $60.00
gemini-1.5-pro $1.25 $5.00
gemini-1.5-flash $0.075 $0.30

โ†’ Full pricing table


API Reference

Symbol Description
@llmspy.profile Decorator โ€” profile all LLM calls inside the function
llmspy.session(name) Context manager โ€” profile calls in a with block
llmspy.report() Print text flame graph to terminal
llmspy.report(format="html") Write + open HTML flame graph in browser
llmspy.stats() Return full breakdown as a dict
llmspy.reset() Clear all recorded calls
llmspy.init(persist=True) Enable SQLite persistence across sessions

Comparison

Langfuse Helicone LiteLLM Proxy llmspy
Requires proxy / gateway โœ… yes โœ… yes โœ… yes โŒ no
Requires signup โœ… yes โœ… yes โŒ no โŒ no
Local-first โŒ no โŒ no โšก partial โœ… yes
Zero dependencies โŒ no โŒ no โŒ no โœ… yes
Flame graph output โŒ no โŒ no โŒ no โœ… yes
@decorator API โŒ no โŒ no โŒ no โœ… yes
Optimization hints โŒ no โšก partial โŒ no โœ… yes
Works offline โŒ no โŒ no โšก partial โœ… yes

Roadmap

  • Streaming response support (stream=True)
  • Token budget alerts: @llmspy.profile(budget_usd=0.10)
  • LangChain / LangGraph integration
  • CLI: llmspy history, llmspy report
  • GitHub Actions annotation (cost diff per PR)
  • Cost comparison across git commits

Contributing

git clone https://github.com/pinakimishra95/llm-cost-profiler
cd llm-cost-profiler
pip install -e ".[dev]"
pytest tests/                # 59 tests, ~0.1s

Issues and PRs welcome โ€” especially for new provider support and updated pricing.


License

MIT ยฉ Pinaki Mishra. See LICENSE.


Star this repo if you're tired of mystery LLM invoices. โญ

GitHub ยท PyPI ยท Issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenspy-0.1.0.tar.gz (23.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokenspy-0.1.0-py3-none-any.whl (21.0 kB view details)

Uploaded Python 3

File details

Details for the file tokenspy-0.1.0.tar.gz.

File metadata

  • Download URL: tokenspy-0.1.0.tar.gz
  • Upload date:
  • Size: 23.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokenspy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6f2194c6341dc9f9e6dc3ad66593539344fc4a698431c80dd6c3ccede94fd2c8
MD5 423cc5c500d3aada2cdab04723f6dedf
BLAKE2b-256 a74123e32e5ba265658cab1ccfa8bc05c3c8bf6f5ea954a3e55043e3c90eb0bc

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenspy-0.1.0.tar.gz:

Publisher: publish.yml on pinakimishra95/llm-cost-profiler

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tokenspy-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tokenspy-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokenspy-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 606245eb70fb572cc03e8e7a9862cb7646799ab93efba25c6506ef35aafcc63e
MD5 6d1a04132e388d37e92f12c6be0d6660
BLAKE2b-256 966b06869d7176177786c6fe0e6675a8f5618f576e669a889eed37a0d7f86a6c

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenspy-0.1.0-py3-none-any.whl:

Publisher: publish.yml on pinakimishra95/llm-cost-profiler

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page