Skip to main content

cProfile for LLMs โ€” find which function is burning your AI budget. Flame graph output, zero-config, no proxy.

Project description

tokenspy ๐Ÿ”ฅ

You're spending $800/month on LLMs. Which function is burning it?

Find out in one line. No proxy. No signup. No traffic rerouting.

PyPI version Tests Python 3.10+ License: MIT Zero dependencies

pip install tokenspy

The Problem

You get an OpenAI invoice. It says $800 this month. You have no idea which function caused it.

def run_pipeline(query):
    docs = fetch_and_summarize(query)    # โ† costs $600?
    entities = extract_entities(docs)   # โ† or this one?
    return generate_report(entities)    # โ† or this one?

Langfuse and Helicone force you to reroute traffic through their proxy. Sign up. Configure. Break your local setup.

tokenspy takes 1 line. No proxy. No signup. Runs entirely on your machine.


The Fix

import tokenspy

@tokenspy.profile
def run_pipeline(query):
    docs = fetch_and_summarize(query)
    entities = extract_entities(docs)
    return generate_report(entities)

run_pipeline("Analyze Q3 earnings")
tokenspy.report()

Output

โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘  tokenspy cost report                                                  โ•‘
โ•‘  total: $0.0523  ยท  18,734 tokens  ยท  3 calls                       โ•‘
โ• โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฃ
โ•‘                                                                      โ•‘
โ•‘  fetch_and_summarize      $0.038  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘  73%             โ•‘
โ•‘    โ””โ”€ gpt-4o               $0.038  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘  73%            โ•‘
โ•‘       โ””โ”€ 12,000 tokens                                               โ•‘
โ•‘                                                                      โ•‘
โ•‘  generate_report          $0.011  โ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  21%            โ•‘
โ•‘    โ””โ”€ gpt-4o               $0.011  โ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  21%            โ•‘
โ•‘       โ””โ”€ 3,600 tokens                                                โ•‘
โ•‘                                                                      โ•‘
โ•‘  extract_entities         $0.003  โ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘   6%            โ•‘
โ•‘    โ””โ”€ gpt-4o-mini          $0.003  โ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘   6%            โ•‘
โ•‘       โ””โ”€ 3,134 tokens                                                โ•‘
โ•‘                                                                      โ•‘
โ• โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฃ
โ•‘  Optimization hints                                                  โ•‘
โ•‘                                                                      โ•‘
โ•‘  ๐Ÿ”ด fetch_and_summarize [gpt-4o]                                     โ•‘
โ•‘     Switch to gpt-4o-mini โ€” 94% cheaper  (~$540/month savings)      โ•‘
โ•‘                                                                      โ•‘
โ•‘  ๐ŸŸก fetch_and_summarize [gpt-4o]                                     โ•‘
โ•‘     Avg input: 12,000 tokens. Trim context or limit retrieval.       โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Now you know: fetch_and_summarize is burning 73% of your budget. Fix that one function, cut your bill by $540/month.


Quick Start

Decorator (most common)

import tokenspy

@tokenspy.profile
def summarize_docs(docs: list[str]) -> str:
    return openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "\n".join(docs)}]
    ).choices[0].message.content

summarize_docs(my_docs)
tokenspy.report()            # prints flame graph to terminal
tokenspy.report("html")     # writes tokenspy_report.html, opens in browser

Context Manager

with tokenspy.session("research_task") as s:
    response = anthropic_client.messages.create(
        model="claude-haiku-4-5",
        messages=[{"role": "user", "content": query}]
    )

print(f"Cost:   {s.cost_str}")    # "$0.0012"
print(f"Tokens: {s.tokens}")      # 3,240
print(f"Calls:  {s.calls}")       # 1

Programmatic Access

data = tokenspy.stats()
# {
#   "total_cost_usd": 0.042,
#   "total_tokens": 15000,
#   "total_calls": 3,
#   "by_function": {"summarize_docs": 0.038, "generate_report": 0.004},
#   "by_model":    {"gpt-4o": 0.040, "gpt-4o-mini": 0.002},
#   "calls": [...],
# }

Persistent Tracking Across Sessions

# In your app startup:
tokenspy.init(persist=True)   # saves to ~/.tokenspy/usage.db

# Decorate as normal โ€” costs accumulate across restarts
@tokenspy.profile
def my_agent(query):
    ...

How It Works

tokenspy monkey-patches the SDK client in-process โ€” the same technique used by py-spy and line_profiler:

Your Code
    โ”‚
    โ”œโ”€โ”€ @tokenspy.profile โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ sets active function
    โ”‚
    โ””โ”€โ”€ openai_client.chat.completions.create(...)
                โ”‚
                โ””โ”€โ”€ tokenspy interceptor (in-process monkey-patch)
                        โ”œโ”€โ”€ calls original SDK method
                        โ”œโ”€โ”€ reads response.usage (tokens)
                        โ”œโ”€โ”€ looks up cost in built-in pricing table
                        โ”œโ”€โ”€ records: function ยท model ยท tokens ยท cost ยท duration
                        โ””โ”€โ”€ returns response UNCHANGED to your code

tokenspy.report() โ†’ renders flame graph from recorded data

No proxy server. No HTTP interception. No environment variables. No configuration.

Your code runs exactly as before. tokenspy just watches and keeps score.


HTML Flame Graph

tokenspy.report(format="html")

Opens a self-contained HTML file in your browser โ€” zero JS dependencies, pure SVG:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  tokenspy โ€” Total: $0.0523  (18,734 tokens)                       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                  โ”‚
โ”‚  fetch_and_summarize  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  73%     โ”‚
โ”‚  generate_report      โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ                      21%     โ”‚
โ”‚  extract_entities     โ–ˆโ–ˆโ–ˆโ–ˆ                               6%     โ”‚
โ”‚                                                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚ Model          โ”‚  Cost   โ”‚  %    โ”‚ Input  โ”‚ Output       โ”‚   โ”‚
โ”‚  โ”‚ gpt-4o         โ”‚ $0.049  โ”‚  94%  โ”‚ 15,600 โ”‚ 4,200        โ”‚   โ”‚
โ”‚  โ”‚ gpt-4o-mini    โ”‚ $0.003  โ”‚   6%  โ”‚  3,134 โ”‚    500       โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Supported Providers

Automatically detected โ€” nothing to configure:

Provider Package Intercepted
OpenAI openai>=1.0 chat.completions.create (sync + async)
Anthropic anthropic>=0.30 messages.create (sync + async)
Google google-generativeai>=0.7 generate_content

Built-in Pricing Table

30+ models, updated Feb 2026. No API call needed.

Model Input $/1M Output $/1M
claude-opus-4-6 $15.00 $75.00
claude-sonnet-4-6 $3.00 $15.00
claude-haiku-4-5 $0.80 $4.00
gpt-4o $2.50 $10.00
gpt-4o-mini $0.15 $0.60
o1 $15.00 $60.00
gemini-1.5-pro $1.25 $5.00
gemini-1.5-flash $0.075 $0.30

โ†’ Full pricing table


API Reference

Symbol Description
@tokenspy.profile Decorator โ€” profile all LLM calls inside the function
tokenspy.session(name) Context manager โ€” profile calls in a with block
tokenspy.report() Print text flame graph to terminal
tokenspy.report(format="html") Write + open HTML flame graph in browser
tokenspy.stats() Return full breakdown as a dict
tokenspy.reset() Clear all recorded calls
tokenspy.init(persist=True) Enable SQLite persistence across sessions

Comparison

Langfuse Helicone LiteLLM Proxy tokenspy
Requires proxy / gateway โœ… yes โœ… yes โœ… yes โŒ no
Requires signup โœ… yes โœ… yes โŒ no โŒ no
Local-first โŒ no โŒ no โšก partial โœ… yes
Zero dependencies โŒ no โŒ no โŒ no โœ… yes
Flame graph output โŒ no โŒ no โŒ no โœ… yes
@decorator API โŒ no โŒ no โŒ no โœ… yes
Optimization hints โŒ no โšก partial โŒ no โœ… yes
Works offline โŒ no โŒ no โšก partial โœ… yes

Roadmap

  • Streaming response support (stream=True)
  • Token budget alerts: @tokenspy.profile(budget_usd=0.10)
  • LangChain / LangGraph integration
  • CLI: tokenspy history, tokenspy report
  • GitHub Actions annotation (cost diff per PR)
  • Cost comparison across git commits

Contributing

git clone https://github.com/pinakimishra95/tokenspy
cd tokenspy
pip install -e ".[dev]"
pytest tests/                # 59 tests, ~0.1s

Issues and PRs welcome โ€” especially for new provider support and updated pricing.


License

MIT ยฉ Pinaki Mishra. See LICENSE.


Star this repo if you're tired of mystery LLM invoices. โญ

GitHub ยท PyPI ยท Issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenspy-0.1.2.tar.gz (35.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokenspy-0.1.2-py3-none-any.whl (30.5 kB view details)

Uploaded Python 3

File details

Details for the file tokenspy-0.1.2.tar.gz.

File metadata

  • Download URL: tokenspy-0.1.2.tar.gz
  • Upload date:
  • Size: 35.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokenspy-0.1.2.tar.gz
Algorithm Hash digest
SHA256 c17555a6184d0b0e8e49f4522150c2194d4e03afabcb8887dc47d47b9566e137
MD5 2eb73867c1409f401919005798c016df
BLAKE2b-256 ed678f6fac0f78ab2b2769c10cc53dde15fa3305066ce10ee23611c01463528f

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenspy-0.1.2.tar.gz:

Publisher: publish.yml on pinakimishra95/tokenspy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tokenspy-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: tokenspy-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 30.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokenspy-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 985066a3b3d104cf02ebe82e70a2bab6a1e6f96e0d0fdba948c9929dd5195d5d
MD5 db0278dae165d5e06c1b0f25088ae82e
BLAKE2b-256 e13a28f69304fce22b6d0d40d9b9706f92763f3cecb22f3d3a67c9a61db981a9

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenspy-0.1.2-py3-none-any.whl:

Publisher: publish.yml on pinakimishra95/tokenspy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page