cProfile for LLMs โ find which function is burning your AI budget. Flame graph output, zero-config, no proxy.
Project description
tokenspy ๐ฅ
You're spending $800/month on LLMs. Which function is burning it?
Find out in one line. No proxy. No signup. No traffic rerouting.
pip install tokenspy
The Problem
You get an OpenAI invoice. It says $800 this month. You have no idea which function caused it.
def run_pipeline(query):
docs = fetch_and_summarize(query) # โ costs $600?
entities = extract_entities(docs) # โ or this one?
return generate_report(entities) # โ or this one?
Langfuse and Helicone force you to reroute traffic through their proxy. Sign up. Configure. Break your local setup.
tokenspy takes 1 line. No proxy. No signup. Runs entirely on your machine.
The Fix
import tokenspy
@tokenspy.profile
def run_pipeline(query):
docs = fetch_and_summarize(query)
entities = extract_entities(docs)
return generate_report(entities)
run_pipeline("Analyze Q3 earnings")
tokenspy.report()
Output
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ tokenspy cost report โ
โ total: $0.0523 ยท 18,734 tokens ยท 3 calls โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฃ
โ โ
โ fetch_and_summarize $0.038 โโโโโโโโโโโโโโโโ 73% โ
โ โโ gpt-4o $0.038 โโโโโโโโโโโโโโโโ 73% โ
โ โโ 12,000 tokens โ
โ โ
โ generate_report $0.011 โโโโโโโโโโโโโโโโ 21% โ
โ โโ gpt-4o $0.011 โโโโโโโโโโโโโโโโ 21% โ
โ โโ 3,600 tokens โ
โ โ
โ extract_entities $0.003 โโโโโโโโโโโโโโโโ 6% โ
โ โโ gpt-4o-mini $0.003 โโโโโโโโโโโโโโโโ 6% โ
โ โโ 3,134 tokens โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฃ
โ Optimization hints โ
โ โ
โ ๐ด fetch_and_summarize [gpt-4o] โ
โ Switch to gpt-4o-mini โ 94% cheaper (~$540/month savings) โ
โ โ
โ ๐ก fetch_and_summarize [gpt-4o] โ
โ Avg input: 12,000 tokens. Trim context or limit retrieval. โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Now you know: fetch_and_summarize is burning 73% of your budget. Fix that one function, cut your bill by $540/month.
Quick Start
Decorator (most common)
import tokenspy
@tokenspy.profile
def summarize_docs(docs: list[str]) -> str:
return openai_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "\n".join(docs)}]
).choices[0].message.content
summarize_docs(my_docs)
tokenspy.report() # prints flame graph to terminal
tokenspy.report("html") # writes tokenspy_report.html, opens in browser
Context Manager
with tokenspy.session("research_task") as s:
response = anthropic_client.messages.create(
model="claude-haiku-4-5",
messages=[{"role": "user", "content": query}]
)
print(f"Cost: {s.cost_str}") # "$0.0012"
print(f"Tokens: {s.tokens}") # 3,240
print(f"Calls: {s.calls}") # 1
Streaming (works automatically)
@tokenspy.profile
def stream_response(query):
# stream=True is fully supported โ no changes needed
for chunk in openai_client.chat.completions.create(
model="gpt-4o", messages=[...], stream=True
):
print(chunk.choices[0].delta.content or "", end="")
stream_response("Summarize this")
tokenspy.report() # tokens + cost captured after stream completes
Budget Alerts
# Warn if a single invocation costs more than $0.10
@tokenspy.profile(budget_usd=0.10)
def my_agent(query): ...
# Raise an exception instead of just warning
@tokenspy.profile(budget_usd=0.10, on_exceeded="raise")
def strict_agent(query): ...
UserWarning: [tokenspy] Budget exceeded in my_agent: $0.1423 > $0.1000
Programmatic Access
data = tokenspy.stats()
# {
# "total_cost_usd": 0.042,
# "total_tokens": 15000,
# "total_calls": 3,
# "by_function": {"summarize_docs": 0.038, "generate_report": 0.004},
# "by_model": {"gpt-4o": 0.040, "gpt-4o-mini": 0.002},
# "calls": [...],
# }
Persistent Tracking Across Sessions
# In your app startup:
tokenspy.init(persist=True) # saves to ~/.tokenspy/usage.db
tokenspy.init(persist=True, track_git=True) # also tags each call with git SHA
@tokenspy.profile
def my_agent(query): ...
CLI
After running with persist=True, inspect your usage from the terminal:
# Show recent call history
tokenspy history --limit 20
# Print cost report from saved data
tokenspy report
tokenspy report --format html
# Diff two runs (e.g. before and after a refactor)
tokenspy compare --db before.db --db after.db
# Compare costs between two git commits
tokenspy compare --commit abc123 --commit def456 --db ~/.tokenspy/usage.db
Timestamp Function Model Cost Tokens ms
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
2026-02-26 09:14:33 run_agent gpt-4o $0.0523 18734 842
2026-02-26 09:14:41 summarize_docs claude-haiku-4-5 $0.0012 3240 210
LangChain / LangGraph
No proxy, no SDK swap โ just add a callback:
from tokenspy.integrations.langchain import TokenspyCallbackHandler
# With any chain
chain.invoke(prompt, config={"callbacks": [TokenspyCallbackHandler()]})
# At model construction time
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o", callbacks=[TokenspyCallbackHandler()])
# Works with LangGraph agents too โ same callback system
pip install tokenspy[langchain]
GitHub Actions โ Cost Diff Per PR
Catch cost regressions before they merge:
# In your CI test script:
from tokenspy.ci import annotate_cost_diff
annotate_cost_diff("current_run.db", "baseline.db")
Outputs GitHub Actions annotations:
::warning title=tokenspy cost regression::fetch_and_summarize: cost increased by $0.0312 (62.4%)
And writes a Markdown table to the job summary:
| Function | Cost | vs Baseline |
|---|---|---|
fetch_and_summarize |
$0.0812 | โฒ62.4% |
extract_entities |
$0.0031 | โผ2.1% |
How It Works
tokenspy monkey-patches the SDK client in-process โ the same technique used by py-spy and line_profiler:
Your Code
โ
โโโ @tokenspy.profile โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ sets active function
โ
โโโ openai_client.chat.completions.create(...)
โ
โโโ tokenspy interceptor (in-process monkey-patch)
โโโ calls original SDK method
โโโ reads response.usage (tokens)
โโโ looks up cost in built-in pricing table
โโโ records: function ยท model ยท tokens ยท cost ยท duration
โโโ returns response UNCHANGED to your code
tokenspy.report() โ renders flame graph from recorded data
No proxy server. No HTTP interception. No environment variables. No configuration.
Your code runs exactly as before. tokenspy just watches and keeps score.
Supported Providers
Automatically detected โ nothing to configure:
| Provider | Package | Intercepted |
|---|---|---|
| OpenAI | openai>=1.0 |
chat.completions.create (sync + async + streaming) |
| Anthropic | anthropic>=0.30 |
messages.create (sync + async + streaming) |
google-generativeai>=0.7 |
generate_content |
|
| LangChain | langchain-core>=0.2 |
Callback handler (any model/provider) |
Built-in Pricing Table
30+ models, updated Feb 2026. No API call needed.
| Model | Input $/1M | Output $/1M |
|---|---|---|
| claude-opus-4-6 | $15.00 | $75.00 |
| claude-sonnet-4-6 | $3.00 | $15.00 |
| claude-haiku-4-5 | $0.80 | $4.00 |
| gpt-4o | $2.50 | $10.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| o1 | $15.00 | $60.00 |
| gemini-1.5-pro | $1.25 | $5.00 |
| gemini-1.5-flash | $0.075 | $0.30 |
API Reference
| Symbol | Description |
|---|---|
@tokenspy.profile |
Decorator โ profile all LLM calls inside the function |
@tokenspy.profile(budget_usd=0.10) |
Decorator with cost budget alert |
@tokenspy.profile(budget_usd=0.10, on_exceeded="raise") |
Raise BudgetExceededError if exceeded |
tokenspy.session(name) |
Context manager โ profile calls in a with block |
tokenspy.report() |
Print text flame graph to terminal |
tokenspy.report(format="html") |
Write + open HTML flame graph in browser |
tokenspy.stats() |
Return full breakdown as a dict |
tokenspy.reset() |
Clear all recorded calls |
tokenspy.init(persist=True) |
Enable SQLite persistence across sessions |
tokenspy.init(track_git=True) |
Tag each call with current git commit SHA |
TokenspyCallbackHandler |
LangChain/LangGraph callback handler |
tokenspy history |
CLI: show recent call history |
tokenspy report |
CLI: render cost report |
tokenspy compare |
CLI: diff two DBs or two git commits |
tokenspy annotate |
CLI: emit GitHub Actions cost annotations |
Comparison
| Langfuse | Helicone | LiteLLM Proxy | tokenspy | |
|---|---|---|---|---|
| Requires proxy / gateway | โ yes | โ yes | โ yes | โ no |
| Requires signup | โ yes | โ yes | โ no | โ no |
| Local-first | โ no | โ no | โก partial | โ yes |
| Zero dependencies | โ no | โ no | โ no | โ yes |
| Flame graph output | โ no | โ no | โ no | โ yes |
@decorator API |
โ no | โ no | โ no | โ yes |
| Streaming support | โ yes | โ yes | โ yes | โ yes |
| Budget alerts | โก partial | โก partial | โ no | โ yes |
| LangChain integration | โ yes | โ yes | โ yes | โ yes |
| CLI history/report | โ no | โ no | โ no | โ yes |
| GitHub Actions cost diff | โ no | โ no | โ no | โ yes |
| Git commit cost tracking | โ no | โ no | โ no | โ yes |
| Optimization hints | โ no | โก partial | โ no | โ yes |
| Works offline | โ no | โ no | โก partial | โ yes |
Contributing
git clone https://github.com/pinakimishra95/tokenspy
cd tokenspy
pip install -e ".[dev]"
pytest tests/ # 100 tests, ~0.2s
Issues and PRs welcome โ especially for new provider support and updated pricing.
License
MIT ยฉ Pinaki Mishra. See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tokenspy-0.1.3.tar.gz.
File metadata
- Download URL: tokenspy-0.1.3.tar.gz
- Upload date:
- Size: 35.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5c27eed500f0b27313d4825bd54ec549535dac33d1932a8d2015c991f06a37c
|
|
| MD5 |
b1346fffa374e03c18a5fe4c1964c81d
|
|
| BLAKE2b-256 |
86b7a3fc82c78f87f5efd442af33c4bc67f877bff43b1d4dbea25a189098edd3
|
Provenance
The following attestation bundles were made for tokenspy-0.1.3.tar.gz:
Publisher:
publish.yml on pinakimishra95/tokenspy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tokenspy-0.1.3.tar.gz -
Subject digest:
f5c27eed500f0b27313d4825bd54ec549535dac33d1932a8d2015c991f06a37c - Sigstore transparency entry: 995625229
- Sigstore integration time:
-
Permalink:
pinakimishra95/tokenspy@f55bcd94642f536857da00cc13e3f3de6b9d9f79 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/pinakimishra95
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f55bcd94642f536857da00cc13e3f3de6b9d9f79 -
Trigger Event:
release
-
Statement type:
File details
Details for the file tokenspy-0.1.3-py3-none-any.whl.
File metadata
- Download URL: tokenspy-0.1.3-py3-none-any.whl
- Upload date:
- Size: 31.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
221cc53114f0be86ca3ca92ae9ffedba7400a63c336f3f258433554e3edc1ef7
|
|
| MD5 |
54058657db254cc3711cb8c5ae1e136f
|
|
| BLAKE2b-256 |
116f59559632e1ed1cad56d6ab9d81234088b2c90fd6e6b5f9834f02d6c1986d
|
Provenance
The following attestation bundles were made for tokenspy-0.1.3-py3-none-any.whl:
Publisher:
publish.yml on pinakimishra95/tokenspy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tokenspy-0.1.3-py3-none-any.whl -
Subject digest:
221cc53114f0be86ca3ca92ae9ffedba7400a63c336f3f258433554e3edc1ef7 - Sigstore transparency entry: 995625264
- Sigstore integration time:
-
Permalink:
pinakimishra95/tokenspy@f55bcd94642f536857da00cc13e3f3de6b9d9f79 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/pinakimishra95
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f55bcd94642f536857da00cc13e3f3de6b9d9f79 -
Trigger Event:
release
-
Statement type: