cProfile for LLMs — find which function is burning your AI budget. Flame graph output, zero-config, no proxy.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Pinaki_Agent_memory

These details have not been verified by PyPI

Project description

tokenspy 🔥

You're spending $800/month on LLMs. Which function is burning it?

Find out in one line. No proxy. No signup. No traffic rerouting.

pip install tokenspy

The Problem

You get an OpenAI invoice. It says $800 this month. You have no idea which function caused it.

def run_pipeline(query):
    docs = fetch_and_summarize(query)    # ← costs $600?
    entities = extract_entities(docs)   # ← or this one?
    return generate_report(entities)    # ← or this one?

Langfuse and Helicone force you to reroute traffic through their proxy. Sign up. Configure. Break your local setup.

tokenspy takes 1 line. No proxy. No signup. Runs entirely on your machine.

The Fix

import tokenspy

@tokenspy.profile
def run_pipeline(query):
    docs = fetch_and_summarize(query)
    entities = extract_entities(docs)
    return generate_report(entities)

run_pipeline("Analyze Q3 earnings")
tokenspy.report()

Output

╔══════════════════════════════════════════════════════════════════════╗
║  tokenspy cost report                                                ║
║  total: $0.0523  ·  18,734 tokens  ·  3 calls                       ║
╠══════════════════════════════════════════════════════════════════════╣
║                                                                      ║
║  fetch_and_summarize      $0.038  ████████████░░░░  73%             ║
║    └─ gpt-4o               $0.038  ████████████░░░░  73%            ║
║       └─ 12,000 tokens                                               ║
║                                                                      ║
║  generate_report          $0.011  ████░░░░░░░░░░░░  21%            ║
║    └─ gpt-4o               $0.011  ████░░░░░░░░░░░░  21%            ║
║       └─ 3,600 tokens                                                ║
║                                                                      ║
║  extract_entities         $0.003  █░░░░░░░░░░░░░░░   6%            ║
║    └─ gpt-4o-mini          $0.003  █░░░░░░░░░░░░░░░   6%            ║
║       └─ 3,134 tokens                                                ║
║                                                                      ║
╠══════════════════════════════════════════════════════════════════════╣
║  Optimization hints                                                  ║
║                                                                      ║
║  🔴 fetch_and_summarize [gpt-4o]                                     ║
║     Switch to gpt-4o-mini — 94% cheaper  (~$540/month savings)      ║
║                                                                      ║
║  🟡 fetch_and_summarize [gpt-4o]                                     ║
║     Avg input: 12,000 tokens. Trim context or limit retrieval.       ║
╚══════════════════════════════════════════════════════════════════════╝

Now you know: fetch_and_summarize is burning 73% of your budget. Fix that one function, cut your bill by $540/month.

Quick Start

Decorator (most common)

import tokenspy

@tokenspy.profile
def summarize_docs(docs: list[str]) -> str:
    return openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "\n".join(docs)}]
    ).choices[0].message.content

summarize_docs(my_docs)
tokenspy.report()           # prints flame graph to terminal
tokenspy.report("html")    # writes tokenspy_report.html, opens in browser

Context Manager

with tokenspy.session("research_task") as s:
    response = anthropic_client.messages.create(
        model="claude-haiku-4-5",
        messages=[{"role": "user", "content": query}]
    )

print(f"Cost:   {s.cost_str}")    # "$0.0012"
print(f"Tokens: {s.tokens}")      # 3,240
print(f"Calls:  {s.calls}")       # 1

Streaming (works automatically)

@tokenspy.profile
def stream_response(query):
    # stream=True is fully supported — no changes needed
    for chunk in openai_client.chat.completions.create(
        model="gpt-4o", messages=[...], stream=True
    ):
        print(chunk.choices[0].delta.content or "", end="")

stream_response("Summarize this")
tokenspy.report()   # tokens + cost captured after stream completes

Budget Alerts

# Warn if a single invocation costs more than $0.10
@tokenspy.profile(budget_usd=0.10)
def my_agent(query): ...

# Raise an exception instead of just warning
@tokenspy.profile(budget_usd=0.10, on_exceeded="raise")
def strict_agent(query): ...

UserWarning: [tokenspy] Budget exceeded in my_agent: $0.1423 > $0.1000

Programmatic Access

data = tokenspy.stats()
# {
#   "total_cost_usd": 0.042,
#   "total_tokens": 15000,
#   "total_calls": 3,
#   "by_function": {"summarize_docs": 0.038, "generate_report": 0.004},
#   "by_model":    {"gpt-4o": 0.040, "gpt-4o-mini": 0.002},
#   "calls": [...],
# }

Persistent Tracking Across Sessions

# In your app startup:
tokenspy.init(persist=True)            # saves to ~/.tokenspy/usage.db
tokenspy.init(persist=True, track_git=True)  # also tags each call with git SHA

@tokenspy.profile
def my_agent(query): ...

CLI

After running with persist=True, inspect your usage from the terminal:

# Show recent call history
tokenspy history --limit 20

# Print cost report from saved data
tokenspy report
tokenspy report --format html

# Diff two runs (e.g. before and after a refactor)
tokenspy compare --db before.db --db after.db

# Compare costs between two git commits
tokenspy compare --commit abc123 --commit def456 --db ~/.tokenspy/usage.db

Timestamp            Function               Model                      Cost   Tokens       ms
───────────────────────────────────────────────────────────────────────────────────────────────
2026-02-26 09:14:33  run_agent              gpt-4o                   $0.0523   18734      842
2026-02-26 09:14:41  summarize_docs         claude-haiku-4-5         $0.0012    3240      210

LangChain / LangGraph

No proxy, no SDK swap — just add a callback:

from tokenspy.integrations.langchain import TokenspyCallbackHandler

# With any chain
chain.invoke(prompt, config={"callbacks": [TokenspyCallbackHandler()]})

# At model construction time
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o", callbacks=[TokenspyCallbackHandler()])

# Works with LangGraph agents too — same callback system

pip install tokenspy[langchain]

GitHub Actions — Cost Diff Per PR

Catch cost regressions before they merge:

# In your CI test script:
from tokenspy.ci import annotate_cost_diff
annotate_cost_diff("current_run.db", "baseline.db")

Outputs GitHub Actions annotations:

::warning title=tokenspy cost regression::fetch_and_summarize: cost increased by $0.0312 (62.4%)

And writes a Markdown table to the job summary:

Function	Cost	vs Baseline
`fetch_and_summarize`	$0.0812	▲62.4%
`extract_entities`	$0.0031	▼2.1%

How It Works

tokenspy monkey-patches the SDK client in-process — the same technique used by py-spy and line_profiler:

Your Code
    │
    ├── @tokenspy.profile ────────────────────────────── sets active function
    │
    └── openai_client.chat.completions.create(...)
                │
                └── tokenspy interceptor (in-process monkey-patch)
                        ├── calls original SDK method
                        ├── reads response.usage (tokens)
                        ├── looks up cost in built-in pricing table
                        ├── records: function · model · tokens · cost · duration
                        └── returns response UNCHANGED to your code

tokenspy.report() → renders flame graph from recorded data

No proxy server. No HTTP interception. No environment variables. No configuration.

Your code runs exactly as before. tokenspy just watches and keeps score.

Supported Providers

Automatically detected — nothing to configure:

Provider	Package	Intercepted
OpenAI	`openai>=1.0`	`chat.completions.create` (sync + async + streaming)
Anthropic	`anthropic>=0.30`	`messages.create` (sync + async + streaming)
Google	`google-generativeai>=0.7`	`generate_content`
LangChain	`langchain-core>=0.2`	Callback handler (any model/provider)

Built-in Pricing Table

30+ models, updated Feb 2026. No API call needed.

Model	Input $/1M	Output $/1M
claude-opus-4-6	$15.00	$75.00
claude-sonnet-4-6	$3.00	$15.00
claude-haiku-4-5	$0.80	$4.00
gpt-4o	$2.50	$10.00
gpt-4o-mini	$0.15	$0.60
o1	$15.00	$60.00
gemini-1.5-pro	$1.25	$5.00
gemini-1.5-flash	$0.075	$0.30

→ Full pricing table

API Reference

Symbol	Description
`@tokenspy.profile`	Decorator — profile all LLM calls inside the function
`@tokenspy.profile(budget_usd=0.10)`	Decorator with cost budget alert
`@tokenspy.profile(budget_usd=0.10, on_exceeded="raise")`	Raise `BudgetExceededError` if exceeded
`tokenspy.session(name)`	Context manager — profile calls in a `with` block
`tokenspy.report()`	Print text flame graph to terminal
`tokenspy.report(format="html")`	Write + open HTML flame graph in browser
`tokenspy.stats()`	Return full breakdown as a dict
`tokenspy.reset()`	Clear all recorded calls
`tokenspy.init(persist=True)`	Enable SQLite persistence across sessions
`tokenspy.init(track_git=True)`	Tag each call with current git commit SHA
`TokenspyCallbackHandler`	LangChain/LangGraph callback handler
`tokenspy history`	CLI: show recent call history
`tokenspy report`	CLI: render cost report
`tokenspy compare`	CLI: diff two DBs or two git commits
`tokenspy annotate`	CLI: emit GitHub Actions cost annotations

Comparison

	Langfuse	Helicone	LiteLLM Proxy	tokenspy
Requires proxy / gateway	✅ yes	✅ yes	✅ yes	❌ no
Requires signup	✅ yes	✅ yes	❌ no	❌ no
Local-first	❌ no	❌ no	⚡ partial	✅ yes
Zero dependencies	❌ no	❌ no	❌ no	✅ yes
Flame graph output	❌ no	❌ no	❌ no	✅ yes
`@decorator` API	❌ no	❌ no	❌ no	✅ yes
Streaming support	✅ yes	✅ yes	✅ yes	✅ yes
Budget alerts	⚡ partial	⚡ partial	❌ no	✅ yes
LangChain integration	✅ yes	✅ yes	✅ yes	✅ yes
CLI history/report	❌ no	❌ no	❌ no	✅ yes
GitHub Actions cost diff	❌ no	❌ no	❌ no	✅ yes
Git commit cost tracking	❌ no	❌ no	❌ no	✅ yes
Optimization hints	❌ no	⚡ partial	❌ no	✅ yes
Works offline	❌ no	❌ no	⚡ partial	✅ yes

Contributing

git clone https://github.com/pinakimishra95/tokenspy
cd tokenspy
pip install -e ".[dev]"
pytest tests/    # 100 tests, ~0.2s

Issues and PRs welcome — especially for new provider support and updated pricing.

License

MIT © Pinaki Mishra. See LICENSE.

Star this repo if you're tired of mystery LLM invoices. ⭐

GitHub · PyPI · Issues

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Pinaki_Agent_memory

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.0

Mar 10, 2026

This version

0.1.3

Feb 26, 2026

0.1.2

Feb 26, 2026

0.1.1

Feb 26, 2026

0.1.0

Feb 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenspy-0.1.3.tar.gz (35.8 kB view details)

Uploaded Feb 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tokenspy-0.1.3-py3-none-any.whl (31.3 kB view details)

Uploaded Feb 26, 2026 Python 3

File details

Details for the file tokenspy-0.1.3.tar.gz.

File metadata

Download URL: tokenspy-0.1.3.tar.gz
Upload date: Feb 26, 2026
Size: 35.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokenspy-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`f5c27eed500f0b27313d4825bd54ec549535dac33d1932a8d2015c991f06a37c`
MD5	`b1346fffa374e03c18a5fe4c1964c81d`
BLAKE2b-256	`86b7a3fc82c78f87f5efd442af33c4bc67f877bff43b1d4dbea25a189098edd3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenspy-0.1.3.tar.gz:

Publisher: publish.yml on pinakimishra95/tokenspy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tokenspy-0.1.3.tar.gz
- Subject digest: f5c27eed500f0b27313d4825bd54ec549535dac33d1932a8d2015c991f06a37c
- Sigstore transparency entry: 995625229
- Sigstore integration time: Feb 26, 2026
Source repository:
- Permalink: pinakimishra95/tokenspy@f55bcd94642f536857da00cc13e3f3de6b9d9f79
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/pinakimishra95
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f55bcd94642f536857da00cc13e3f3de6b9d9f79
- Trigger Event: release

File details

Details for the file tokenspy-0.1.3-py3-none-any.whl.

File metadata

Download URL: tokenspy-0.1.3-py3-none-any.whl
Upload date: Feb 26, 2026
Size: 31.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokenspy-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`221cc53114f0be86ca3ca92ae9ffedba7400a63c336f3f258433554e3edc1ef7`
MD5	`54058657db254cc3711cb8c5ae1e136f`
BLAKE2b-256	`116f59559632e1ed1cad56d6ab9d81234088b2c90fd6e6b5f9834f02d6c1986d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenspy-0.1.3-py3-none-any.whl:

Publisher: publish.yml on pinakimishra95/tokenspy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tokenspy-0.1.3-py3-none-any.whl
- Subject digest: 221cc53114f0be86ca3ca92ae9ffedba7400a63c336f3f258433554e3edc1ef7
- Sigstore transparency entry: 995625264
- Sigstore integration time: Feb 26, 2026
Source repository:
- Permalink: pinakimishra95/tokenspy@f55bcd94642f536857da00cc13e3f3de6b9d9f79
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/pinakimishra95
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f55bcd94642f536857da00cc13e3f3de6b9d9f79
- Trigger Event: release

tokenspy 0.1.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

tokenspy 🔥

The Problem

The Fix

Output

Quick Start

Decorator (most common)

Context Manager

Streaming (works automatically)

Budget Alerts

Programmatic Access

Persistent Tracking Across Sessions

CLI

LangChain / LangGraph

GitHub Actions — Cost Diff Per PR

How It Works

Supported Providers

Built-in Pricing Table

API Reference

Comparison

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance