Skip to main content

cProfile for LLMs โ€” find which function is burning your AI budget. Flame graph output, zero-config, no proxy.

Project description

tokenspy ๐Ÿ”ฅ

You're spending $800/month on LLMs. Which function is burning it?

Find out in one line. No proxy. No signup. No traffic rerouting.

PyPI version Tests Python 3.10+ License: MIT Zero dependencies

pip install tokenspy

The Problem

You get an OpenAI invoice. It says $800 this month. You have no idea which function caused it.

def run_pipeline(query):
    docs = fetch_and_summarize(query)    # โ† costs $600?
    entities = extract_entities(docs)   # โ† or this one?
    return generate_report(entities)    # โ† or this one?

Langfuse and Helicone force you to reroute traffic through their proxy. Sign up. Configure. Break your local setup.

tokenspy takes 1 line. No proxy. No signup. Runs entirely on your machine.


The Fix

import tokenspy

@tokenspy.profile
def run_pipeline(query):
    docs = fetch_and_summarize(query)
    entities = extract_entities(docs)
    return generate_report(entities)

run_pipeline("Analyze Q3 earnings")
tokenspy.report()

Output

โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘  tokenspy cost report                                                  โ•‘
โ•‘  total: $0.0523  ยท  18,734 tokens  ยท  3 calls                       โ•‘
โ• โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฃ
โ•‘                                                                      โ•‘
โ•‘  fetch_and_summarize      $0.038  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘  73%             โ•‘
โ•‘    โ””โ”€ gpt-4o               $0.038  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘  73%            โ•‘
โ•‘       โ””โ”€ 12,000 tokens                                               โ•‘
โ•‘                                                                      โ•‘
โ•‘  generate_report          $0.011  โ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  21%            โ•‘
โ•‘    โ””โ”€ gpt-4o               $0.011  โ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  21%            โ•‘
โ•‘       โ””โ”€ 3,600 tokens                                                โ•‘
โ•‘                                                                      โ•‘
โ•‘  extract_entities         $0.003  โ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘   6%            โ•‘
โ•‘    โ””โ”€ gpt-4o-mini          $0.003  โ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘   6%            โ•‘
โ•‘       โ””โ”€ 3,134 tokens                                                โ•‘
โ•‘                                                                      โ•‘
โ• โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฃ
โ•‘  Optimization hints                                                  โ•‘
โ•‘                                                                      โ•‘
โ•‘  ๐Ÿ”ด fetch_and_summarize [gpt-4o]                                     โ•‘
โ•‘     Switch to gpt-4o-mini โ€” 94% cheaper  (~$540/month savings)      โ•‘
โ•‘                                                                      โ•‘
โ•‘  ๐ŸŸก fetch_and_summarize [gpt-4o]                                     โ•‘
โ•‘     Avg input: 12,000 tokens. Trim context or limit retrieval.       โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Now you know: fetch_and_summarize is burning 73% of your budget. Fix that one function, cut your bill by $540/month.


Quick Start

Decorator (most common)

import tokenspy

@tokenspy.profile
def summarize_docs(docs: list[str]) -> str:
    return openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "\n".join(docs)}]
    ).choices[0].message.content

summarize_docs(my_docs)
tokenspy.report()            # prints flame graph to terminal
tokenspy.report("html")     # writes tokenspy_report.html, opens in browser

Context Manager

with tokenspy.session("research_task") as s:
    response = anthropic_client.messages.create(
        model="claude-haiku-4-5",
        messages=[{"role": "user", "content": query}]
    )

print(f"Cost:   {s.cost_str}")    # "$0.0012"
print(f"Tokens: {s.tokens}")      # 3,240
print(f"Calls:  {s.calls}")       # 1

Programmatic Access

data = tokenspy.stats()
# {
#   "total_cost_usd": 0.042,
#   "total_tokens": 15000,
#   "total_calls": 3,
#   "by_function": {"summarize_docs": 0.038, "generate_report": 0.004},
#   "by_model":    {"gpt-4o": 0.040, "gpt-4o-mini": 0.002},
#   "calls": [...],
# }

Persistent Tracking Across Sessions

# In your app startup:
tokenspy.init(persist=True)   # saves to ~/.tokenspy/usage.db

# Decorate as normal โ€” costs accumulate across restarts
@tokenspy.profile
def my_agent(query):
    ...

How It Works

tokenspy monkey-patches the SDK client in-process โ€” the same technique used by py-spy and line_profiler:

Your Code
    โ”‚
    โ”œโ”€โ”€ @tokenspy.profile โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ sets active function
    โ”‚
    โ””โ”€โ”€ openai_client.chat.completions.create(...)
                โ”‚
                โ””โ”€โ”€ tokenspy interceptor (in-process monkey-patch)
                        โ”œโ”€โ”€ calls original SDK method
                        โ”œโ”€โ”€ reads response.usage (tokens)
                        โ”œโ”€โ”€ looks up cost in built-in pricing table
                        โ”œโ”€โ”€ records: function ยท model ยท tokens ยท cost ยท duration
                        โ””โ”€โ”€ returns response UNCHANGED to your code

tokenspy.report() โ†’ renders flame graph from recorded data

No proxy server. No HTTP interception. No environment variables. No configuration.

Your code runs exactly as before. tokenspy just watches and keeps score.


HTML Flame Graph

tokenspy.report(format="html")

Opens a self-contained HTML file in your browser โ€” zero JS dependencies, pure SVG:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  tokenspy โ€” Total: $0.0523  (18,734 tokens)                       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                  โ”‚
โ”‚  fetch_and_summarize  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  73%     โ”‚
โ”‚  generate_report      โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ                      21%     โ”‚
โ”‚  extract_entities     โ–ˆโ–ˆโ–ˆโ–ˆ                               6%     โ”‚
โ”‚                                                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚ Model          โ”‚  Cost   โ”‚  %    โ”‚ Input  โ”‚ Output       โ”‚   โ”‚
โ”‚  โ”‚ gpt-4o         โ”‚ $0.049  โ”‚  94%  โ”‚ 15,600 โ”‚ 4,200        โ”‚   โ”‚
โ”‚  โ”‚ gpt-4o-mini    โ”‚ $0.003  โ”‚   6%  โ”‚  3,134 โ”‚    500       โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Supported Providers

Automatically detected โ€” nothing to configure:

Provider Package Intercepted
OpenAI openai>=1.0 chat.completions.create (sync + async)
Anthropic anthropic>=0.30 messages.create (sync + async)
Google google-generativeai>=0.7 generate_content

Built-in Pricing Table

30+ models, updated Feb 2026. No API call needed.

Model Input $/1M Output $/1M
claude-opus-4-6 $15.00 $75.00
claude-sonnet-4-6 $3.00 $15.00
claude-haiku-4-5 $0.80 $4.00
gpt-4o $2.50 $10.00
gpt-4o-mini $0.15 $0.60
o1 $15.00 $60.00
gemini-1.5-pro $1.25 $5.00
gemini-1.5-flash $0.075 $0.30

โ†’ Full pricing table


API Reference

Symbol Description
@tokenspy.profile Decorator โ€” profile all LLM calls inside the function
tokenspy.session(name) Context manager โ€” profile calls in a with block
tokenspy.report() Print text flame graph to terminal
tokenspy.report(format="html") Write + open HTML flame graph in browser
tokenspy.stats() Return full breakdown as a dict
tokenspy.reset() Clear all recorded calls
tokenspy.init(persist=True) Enable SQLite persistence across sessions

Comparison

Langfuse Helicone LiteLLM Proxy tokenspy
Requires proxy / gateway โœ… yes โœ… yes โœ… yes โŒ no
Requires signup โœ… yes โœ… yes โŒ no โŒ no
Local-first โŒ no โŒ no โšก partial โœ… yes
Zero dependencies โŒ no โŒ no โŒ no โœ… yes
Flame graph output โŒ no โŒ no โŒ no โœ… yes
@decorator API โŒ no โŒ no โŒ no โœ… yes
Optimization hints โŒ no โšก partial โŒ no โœ… yes
Works offline โŒ no โŒ no โšก partial โœ… yes

Roadmap

  • Streaming response support (stream=True)
  • Token budget alerts: @tokenspy.profile(budget_usd=0.10)
  • LangChain / LangGraph integration
  • CLI: tokenspy history, tokenspy report
  • GitHub Actions annotation (cost diff per PR)
  • Cost comparison across git commits

Contributing

git clone https://github.com/pinakimishra95/tokenspy
cd tokenspy
pip install -e ".[dev]"
pytest tests/                # 59 tests, ~0.1s

Issues and PRs welcome โ€” especially for new provider support and updated pricing.


License

MIT ยฉ Pinaki Mishra. See LICENSE.


Star this repo if you're tired of mystery LLM invoices. โญ

GitHub ยท PyPI ยท Issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenspy-0.1.1.tar.gz (23.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokenspy-0.1.1-py3-none-any.whl (21.0 kB view details)

Uploaded Python 3

File details

Details for the file tokenspy-0.1.1.tar.gz.

File metadata

  • Download URL: tokenspy-0.1.1.tar.gz
  • Upload date:
  • Size: 23.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokenspy-0.1.1.tar.gz
Algorithm Hash digest
SHA256 65e30104c20020dfaca3f8bb59cddd13f75c0067153a8b98acc8b5acac6b5f37
MD5 26ff310b01013f9ff3ad5e8412132cd1
BLAKE2b-256 b2e2d6df9de062b84f0ac169d14a60038b13846c8d85c28f2796cdf7d07b2836

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenspy-0.1.1.tar.gz:

Publisher: publish.yml on pinakimishra95/tokenspy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tokenspy-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: tokenspy-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 21.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokenspy-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f22aded904ed217c7c071b29c4cd32f3401be66d0af8519e1b1bccb7edd37e5a
MD5 b40e07e73c2153b5665eac1d39a5a350
BLAKE2b-256 fdbc86c4867c79623351bb349d6ae95459af3e2b5058ef67916d6c52937416c2

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenspy-0.1.1-py3-none-any.whl:

Publisher: publish.yml on pinakimishra95/tokenspy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page