cProfile for LLMs โ find which function is burning your AI budget. Flame graph output, zero-config, no proxy.
Project description
llmspy ๐ฅ
You're spending $800/month on LLMs. Which function is burning it?
Find out in one line. No proxy. No signup. No traffic rerouting.
pip install tokenspy
The Problem
You get an OpenAI invoice. It says $800 this month. You have no idea which function caused it.
def run_pipeline(query):
docs = fetch_and_summarize(query) # โ costs $600?
entities = extract_entities(docs) # โ or this one?
return generate_report(entities) # โ or this one?
Langfuse and Helicone force you to reroute traffic through their proxy. Sign up. Configure. Break your local setup.
llmspy takes 1 line. No proxy. No signup. Runs entirely on your machine.
The Fix
import llmspy
@llmspy.profile
def run_pipeline(query):
docs = fetch_and_summarize(query)
entities = extract_entities(docs)
return generate_report(entities)
run_pipeline("Analyze Q3 earnings")
llmspy.report()
Output
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ llmspy cost report โ
โ total: $0.0523 ยท 18,734 tokens ยท 3 calls โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฃ
โ โ
โ fetch_and_summarize $0.038 โโโโโโโโโโโโโโโโ 73% โ
โ โโ gpt-4o $0.038 โโโโโโโโโโโโโโโโ 73% โ
โ โโ 12,000 tokens โ
โ โ
โ generate_report $0.011 โโโโโโโโโโโโโโโโ 21% โ
โ โโ gpt-4o $0.011 โโโโโโโโโโโโโโโโ 21% โ
โ โโ 3,600 tokens โ
โ โ
โ extract_entities $0.003 โโโโโโโโโโโโโโโโ 6% โ
โ โโ gpt-4o-mini $0.003 โโโโโโโโโโโโโโโโ 6% โ
โ โโ 3,134 tokens โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฃ
โ Optimization hints โ
โ โ
โ ๐ด fetch_and_summarize [gpt-4o] โ
โ Switch to gpt-4o-mini โ 94% cheaper (~$540/month savings) โ
โ โ
โ ๐ก fetch_and_summarize [gpt-4o] โ
โ Avg input: 12,000 tokens. Trim context or limit retrieval. โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Now you know: fetch_and_summarize is burning 73% of your budget. Fix that one function, cut your bill by $540/month.
Quick Start
Decorator (most common)
import llmspy
@llmspy.profile
def summarize_docs(docs: list[str]) -> str:
return openai_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "\n".join(docs)}]
).choices[0].message.content
summarize_docs(my_docs)
llmspy.report() # prints flame graph to terminal
llmspy.report("html") # writes llmspy_report.html, opens in browser
Context Manager
with llmspy.session("research_task") as s:
response = anthropic_client.messages.create(
model="claude-haiku-4-5",
messages=[{"role": "user", "content": query}]
)
print(f"Cost: {s.cost_str}") # "$0.0012"
print(f"Tokens: {s.tokens}") # 3,240
print(f"Calls: {s.calls}") # 1
Programmatic Access
data = llmspy.stats()
# {
# "total_cost_usd": 0.042,
# "total_tokens": 15000,
# "total_calls": 3,
# "by_function": {"summarize_docs": 0.038, "generate_report": 0.004},
# "by_model": {"gpt-4o": 0.040, "gpt-4o-mini": 0.002},
# "calls": [...],
# }
Persistent Tracking Across Sessions
# In your app startup:
llmspy.init(persist=True) # saves to ~/.llmspy/usage.db
# Decorate as normal โ costs accumulate across restarts
@llmspy.profile
def my_agent(query):
...
How It Works
llmspy monkey-patches the SDK client in-process โ the same technique used by py-spy and line_profiler:
Your Code
โ
โโโ @llmspy.profile โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ sets active function
โ
โโโ openai_client.chat.completions.create(...)
โ
โโโ llmspy interceptor (in-process monkey-patch)
โโโ calls original SDK method
โโโ reads response.usage (tokens)
โโโ looks up cost in built-in pricing table
โโโ records: function ยท model ยท tokens ยท cost ยท duration
โโโ returns response UNCHANGED to your code
llmspy.report() โ renders flame graph from recorded data
No proxy server. No HTTP interception. No environment variables. No configuration.
Your code runs exactly as before. llmspy just watches and keeps score.
HTML Flame Graph
llmspy.report(format="html")
Opens a self-contained HTML file in your browser โ zero JS dependencies, pure SVG:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ llmspy โ Total: $0.0523 (18,734 tokens) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ fetch_and_summarize โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 73% โ
โ generate_report โโโโโโโโโโโโ 21% โ
โ extract_entities โโโโ 6% โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Model โ Cost โ % โ Input โ Output โ โ
โ โ gpt-4o โ $0.049 โ 94% โ 15,600 โ 4,200 โ โ
โ โ gpt-4o-mini โ $0.003 โ 6% โ 3,134 โ 500 โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Supported Providers
Automatically detected โ nothing to configure:
| Provider | Package | Intercepted |
|---|---|---|
| OpenAI | openai>=1.0 |
chat.completions.create (sync + async) |
| Anthropic | anthropic>=0.30 |
messages.create (sync + async) |
google-generativeai>=0.7 |
generate_content |
Built-in Pricing Table
30+ models, updated Feb 2026. No API call needed.
| Model | Input $/1M | Output $/1M |
|---|---|---|
| claude-opus-4-6 | $15.00 | $75.00 |
| claude-sonnet-4-6 | $3.00 | $15.00 |
| claude-haiku-4-5 | $0.80 | $4.00 |
| gpt-4o | $2.50 | $10.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| o1 | $15.00 | $60.00 |
| gemini-1.5-pro | $1.25 | $5.00 |
| gemini-1.5-flash | $0.075 | $0.30 |
API Reference
| Symbol | Description |
|---|---|
@llmspy.profile |
Decorator โ profile all LLM calls inside the function |
llmspy.session(name) |
Context manager โ profile calls in a with block |
llmspy.report() |
Print text flame graph to terminal |
llmspy.report(format="html") |
Write + open HTML flame graph in browser |
llmspy.stats() |
Return full breakdown as a dict |
llmspy.reset() |
Clear all recorded calls |
llmspy.init(persist=True) |
Enable SQLite persistence across sessions |
Comparison
| Langfuse | Helicone | LiteLLM Proxy | llmspy | |
|---|---|---|---|---|
| Requires proxy / gateway | โ yes | โ yes | โ yes | โ no |
| Requires signup | โ yes | โ yes | โ no | โ no |
| Local-first | โ no | โ no | โก partial | โ yes |
| Zero dependencies | โ no | โ no | โ no | โ yes |
| Flame graph output | โ no | โ no | โ no | โ yes |
@decorator API |
โ no | โ no | โ no | โ yes |
| Optimization hints | โ no | โก partial | โ no | โ yes |
| Works offline | โ no | โ no | โก partial | โ yes |
Roadmap
- Streaming response support (
stream=True) - Token budget alerts:
@llmspy.profile(budget_usd=0.10) - LangChain / LangGraph integration
- CLI:
llmspy history,llmspy report - GitHub Actions annotation (cost diff per PR)
- Cost comparison across git commits
Contributing
git clone https://github.com/pinakimishra95/llm-cost-profiler
cd llm-cost-profiler
pip install -e ".[dev]"
pytest tests/ # 59 tests, ~0.1s
Issues and PRs welcome โ especially for new provider support and updated pricing.
License
MIT ยฉ Pinaki Mishra. See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tokenspy-0.1.0.tar.gz.
File metadata
- Download URL: tokenspy-0.1.0.tar.gz
- Upload date:
- Size: 23.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f2194c6341dc9f9e6dc3ad66593539344fc4a698431c80dd6c3ccede94fd2c8
|
|
| MD5 |
423cc5c500d3aada2cdab04723f6dedf
|
|
| BLAKE2b-256 |
a74123e32e5ba265658cab1ccfa8bc05c3c8bf6f5ea954a3e55043e3c90eb0bc
|
Provenance
The following attestation bundles were made for tokenspy-0.1.0.tar.gz:
Publisher:
publish.yml on pinakimishra95/llm-cost-profiler
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tokenspy-0.1.0.tar.gz -
Subject digest:
6f2194c6341dc9f9e6dc3ad66593539344fc4a698431c80dd6c3ccede94fd2c8 - Sigstore transparency entry: 995194184
- Sigstore integration time:
-
Permalink:
pinakimishra95/llm-cost-profiler@5fcf1c0cbf0f2520f7b3468251cf8962e6c3a8c7 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/pinakimishra95
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5fcf1c0cbf0f2520f7b3468251cf8962e6c3a8c7 -
Trigger Event:
release
-
Statement type:
File details
Details for the file tokenspy-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tokenspy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
606245eb70fb572cc03e8e7a9862cb7646799ab93efba25c6506ef35aafcc63e
|
|
| MD5 |
6d1a04132e388d37e92f12c6be0d6660
|
|
| BLAKE2b-256 |
966b06869d7176177786c6fe0e6675a8f5618f576e669a889eed37a0d7f86a6c
|
Provenance
The following attestation bundles were made for tokenspy-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on pinakimishra95/llm-cost-profiler
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tokenspy-0.1.0-py3-none-any.whl -
Subject digest:
606245eb70fb572cc03e8e7a9862cb7646799ab93efba25c6506ef35aafcc63e - Sigstore transparency entry: 995194208
- Sigstore integration time:
-
Permalink:
pinakimishra95/llm-cost-profiler@5fcf1c0cbf0f2520f7b3468251cf8962e6c3a8c7 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/pinakimishra95
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5fcf1c0cbf0f2520f7b3468251cf8962e6c3a8c7 -
Trigger Event:
release
-
Statement type: