Skip to main content

cProfile for LLMs โ€” find which function is burning your AI budget. Flame graph output, zero-config, no proxy.

Project description

tokenspy ๐Ÿ”ฅ

You're spending $800/month on LLMs. Which function is burning it?

Find out in one line. No proxy. No signup. No traffic rerouting.

PyPI version Tests Python 3.10+ License: MIT Zero dependencies

pip install tokenspy

The Problem

You get an OpenAI invoice. It says $800 this month. You have no idea which function caused it.

def run_pipeline(query):
    docs = fetch_and_summarize(query)    # โ† costs $600?
    entities = extract_entities(docs)   # โ† or this one?
    return generate_report(entities)    # โ† or this one?

Langfuse and Helicone force you to reroute traffic through their proxy. Sign up. Configure. Break your local setup.

tokenspy takes 1 line. No proxy. No signup. Runs entirely on your machine.


The Fix

import tokenspy

@tokenspy.profile
def run_pipeline(query):
    docs = fetch_and_summarize(query)
    entities = extract_entities(docs)
    return generate_report(entities)

run_pipeline("Analyze Q3 earnings")
tokenspy.report()

Output

โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘  tokenspy cost report                                                โ•‘
โ•‘  total: $0.0523  ยท  18,734 tokens  ยท  3 calls                       โ•‘
โ• โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฃ
โ•‘                                                                      โ•‘
โ•‘  fetch_and_summarize      $0.038  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘  73%             โ•‘
โ•‘    โ””โ”€ gpt-4o               $0.038  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘  73%            โ•‘
โ•‘       โ””โ”€ 12,000 tokens                                               โ•‘
โ•‘                                                                      โ•‘
โ•‘  generate_report          $0.011  โ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  21%            โ•‘
โ•‘    โ””โ”€ gpt-4o               $0.011  โ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  21%            โ•‘
โ•‘       โ””โ”€ 3,600 tokens                                                โ•‘
โ•‘                                                                      โ•‘
โ•‘  extract_entities         $0.003  โ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘   6%            โ•‘
โ•‘    โ””โ”€ gpt-4o-mini          $0.003  โ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘   6%            โ•‘
โ•‘       โ””โ”€ 3,134 tokens                                                โ•‘
โ•‘                                                                      โ•‘
โ• โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฃ
โ•‘  Optimization hints                                                  โ•‘
โ•‘                                                                      โ•‘
โ•‘  ๐Ÿ”ด fetch_and_summarize [gpt-4o]                                     โ•‘
โ•‘     Switch to gpt-4o-mini โ€” 94% cheaper  (~$540/month savings)      โ•‘
โ•‘                                                                      โ•‘
โ•‘  ๐ŸŸก fetch_and_summarize [gpt-4o]                                     โ•‘
โ•‘     Avg input: 12,000 tokens. Trim context or limit retrieval.       โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Now you know: fetch_and_summarize is burning 73% of your budget. Fix that one function, cut your bill by $540/month.


Quick Start

Decorator (most common)

import tokenspy

@tokenspy.profile
def summarize_docs(docs: list[str]) -> str:
    return openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "\n".join(docs)}]
    ).choices[0].message.content

summarize_docs(my_docs)
tokenspy.report()           # prints flame graph to terminal
tokenspy.report("html")    # writes tokenspy_report.html, opens in browser

Context Manager

with tokenspy.session("research_task") as s:
    response = anthropic_client.messages.create(
        model="claude-haiku-4-5",
        messages=[{"role": "user", "content": query}]
    )

print(f"Cost:   {s.cost_str}")    # "$0.0012"
print(f"Tokens: {s.tokens}")      # 3,240
print(f"Calls:  {s.calls}")       # 1

Streaming (works automatically)

@tokenspy.profile
def stream_response(query):
    # stream=True is fully supported โ€” no changes needed
    for chunk in openai_client.chat.completions.create(
        model="gpt-4o", messages=[...], stream=True
    ):
        print(chunk.choices[0].delta.content or "", end="")

stream_response("Summarize this")
tokenspy.report()   # tokens + cost captured after stream completes

Budget Alerts

# Warn if a single invocation costs more than $0.10
@tokenspy.profile(budget_usd=0.10)
def my_agent(query): ...

# Raise an exception instead of just warning
@tokenspy.profile(budget_usd=0.10, on_exceeded="raise")
def strict_agent(query): ...
UserWarning: [tokenspy] Budget exceeded in my_agent: $0.1423 > $0.1000

Programmatic Access

data = tokenspy.stats()
# {
#   "total_cost_usd": 0.042,
#   "total_tokens": 15000,
#   "total_calls": 3,
#   "by_function": {"summarize_docs": 0.038, "generate_report": 0.004},
#   "by_model":    {"gpt-4o": 0.040, "gpt-4o-mini": 0.002},
#   "calls": [...],
# }

Persistent Tracking Across Sessions

# In your app startup:
tokenspy.init(persist=True)            # saves to ~/.tokenspy/usage.db
tokenspy.init(persist=True, track_git=True)  # also tags each call with git SHA

@tokenspy.profile
def my_agent(query): ...

CLI

After running with persist=True, inspect your usage from the terminal:

# Show recent call history
tokenspy history --limit 20

# Print cost report from saved data
tokenspy report
tokenspy report --format html

# Diff two runs (e.g. before and after a refactor)
tokenspy compare --db before.db --db after.db

# Compare costs between two git commits
tokenspy compare --commit abc123 --commit def456 --db ~/.tokenspy/usage.db
Timestamp            Function               Model                      Cost   Tokens       ms
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
2026-02-26 09:14:33  run_agent              gpt-4o                   $0.0523   18734      842
2026-02-26 09:14:41  summarize_docs         claude-haiku-4-5         $0.0012    3240      210

LangChain / LangGraph

No proxy, no SDK swap โ€” just add a callback:

from tokenspy.integrations.langchain import TokenspyCallbackHandler

# With any chain
chain.invoke(prompt, config={"callbacks": [TokenspyCallbackHandler()]})

# At model construction time
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o", callbacks=[TokenspyCallbackHandler()])

# Works with LangGraph agents too โ€” same callback system
pip install tokenspy[langchain]

GitHub Actions โ€” Cost Diff Per PR

Catch cost regressions before they merge:

# In your CI test script:
from tokenspy.ci import annotate_cost_diff
annotate_cost_diff("current_run.db", "baseline.db")

Outputs GitHub Actions annotations:

::warning title=tokenspy cost regression::fetch_and_summarize: cost increased by $0.0312 (62.4%)

And writes a Markdown table to the job summary:

Function Cost vs Baseline
fetch_and_summarize $0.0812 โ–ฒ62.4%
extract_entities $0.0031 โ–ผ2.1%

How It Works

tokenspy monkey-patches the SDK client in-process โ€” the same technique used by py-spy and line_profiler:

Your Code
    โ”‚
    โ”œโ”€โ”€ @tokenspy.profile โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ sets active function
    โ”‚
    โ””โ”€โ”€ openai_client.chat.completions.create(...)
                โ”‚
                โ””โ”€โ”€ tokenspy interceptor (in-process monkey-patch)
                        โ”œโ”€โ”€ calls original SDK method
                        โ”œโ”€โ”€ reads response.usage (tokens)
                        โ”œโ”€โ”€ looks up cost in built-in pricing table
                        โ”œโ”€โ”€ records: function ยท model ยท tokens ยท cost ยท duration
                        โ””โ”€โ”€ returns response UNCHANGED to your code

tokenspy.report() โ†’ renders flame graph from recorded data

No proxy server. No HTTP interception. No environment variables. No configuration.

Your code runs exactly as before. tokenspy just watches and keeps score.


Supported Providers

Automatically detected โ€” nothing to configure:

Provider Package Intercepted
OpenAI openai>=1.0 chat.completions.create (sync + async + streaming)
Anthropic anthropic>=0.30 messages.create (sync + async + streaming)
Google google-generativeai>=0.7 generate_content
LangChain langchain-core>=0.2 Callback handler (any model/provider)

Built-in Pricing Table

30+ models, updated Feb 2026. No API call needed.

Model Input $/1M Output $/1M
claude-opus-4-6 $15.00 $75.00
claude-sonnet-4-6 $3.00 $15.00
claude-haiku-4-5 $0.80 $4.00
gpt-4o $2.50 $10.00
gpt-4o-mini $0.15 $0.60
o1 $15.00 $60.00
gemini-1.5-pro $1.25 $5.00
gemini-1.5-flash $0.075 $0.30

โ†’ Full pricing table


API Reference

Symbol Description
@tokenspy.profile Decorator โ€” profile all LLM calls inside the function
@tokenspy.profile(budget_usd=0.10) Decorator with cost budget alert
@tokenspy.profile(budget_usd=0.10, on_exceeded="raise") Raise BudgetExceededError if exceeded
tokenspy.session(name) Context manager โ€” profile calls in a with block
tokenspy.report() Print text flame graph to terminal
tokenspy.report(format="html") Write + open HTML flame graph in browser
tokenspy.stats() Return full breakdown as a dict
tokenspy.reset() Clear all recorded calls
tokenspy.init(persist=True) Enable SQLite persistence across sessions
tokenspy.init(track_git=True) Tag each call with current git commit SHA
TokenspyCallbackHandler LangChain/LangGraph callback handler
tokenspy history CLI: show recent call history
tokenspy report CLI: render cost report
tokenspy compare CLI: diff two DBs or two git commits
tokenspy annotate CLI: emit GitHub Actions cost annotations

Comparison

Langfuse Helicone LiteLLM Proxy tokenspy
Requires proxy / gateway โœ… yes โœ… yes โœ… yes โŒ no
Requires signup โœ… yes โœ… yes โŒ no โŒ no
Local-first โŒ no โŒ no โšก partial โœ… yes
Zero dependencies โŒ no โŒ no โŒ no โœ… yes
Flame graph output โŒ no โŒ no โŒ no โœ… yes
@decorator API โŒ no โŒ no โŒ no โœ… yes
Streaming support โœ… yes โœ… yes โœ… yes โœ… yes
Budget alerts โšก partial โšก partial โŒ no โœ… yes
LangChain integration โœ… yes โœ… yes โœ… yes โœ… yes
CLI history/report โŒ no โŒ no โŒ no โœ… yes
GitHub Actions cost diff โŒ no โŒ no โŒ no โœ… yes
Git commit cost tracking โŒ no โŒ no โŒ no โœ… yes
Optimization hints โŒ no โšก partial โŒ no โœ… yes
Works offline โŒ no โŒ no โšก partial โœ… yes

Contributing

git clone https://github.com/pinakimishra95/tokenspy
cd tokenspy
pip install -e ".[dev]"
pytest tests/    # 100 tests, ~0.2s

Issues and PRs welcome โ€” especially for new provider support and updated pricing.


License

MIT ยฉ Pinaki Mishra. See LICENSE.


Star this repo if you're tired of mystery LLM invoices. โญ

GitHub ยท PyPI ยท Issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenspy-0.1.3.tar.gz (35.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokenspy-0.1.3-py3-none-any.whl (31.3 kB view details)

Uploaded Python 3

File details

Details for the file tokenspy-0.1.3.tar.gz.

File metadata

  • Download URL: tokenspy-0.1.3.tar.gz
  • Upload date:
  • Size: 35.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokenspy-0.1.3.tar.gz
Algorithm Hash digest
SHA256 f5c27eed500f0b27313d4825bd54ec549535dac33d1932a8d2015c991f06a37c
MD5 b1346fffa374e03c18a5fe4c1964c81d
BLAKE2b-256 86b7a3fc82c78f87f5efd442af33c4bc67f877bff43b1d4dbea25a189098edd3

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenspy-0.1.3.tar.gz:

Publisher: publish.yml on pinakimishra95/tokenspy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tokenspy-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: tokenspy-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 31.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokenspy-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 221cc53114f0be86ca3ca92ae9ffedba7400a63c336f3f258433554e3edc1ef7
MD5 54058657db254cc3711cb8c5ae1e136f
BLAKE2b-256 116f59559632e1ed1cad56d6ab9d81234088b2c90fd6e6b5f9834f02d6c1986d

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenspy-0.1.3-py3-none-any.whl:

Publisher: publish.yml on pinakimishra95/tokenspy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page