cProfile for LLMs — find which function is burning your AI budget. Flame graph output, zero-config, no proxy.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Pinaki_Agent_memory

These details have not been verified by PyPI

Project description

llmspy 🔥

You're spending $800/month on LLMs. Which function is burning it?

Find out in one line. No proxy. No signup. No traffic rerouting.

pip install tokenspy

The Problem

You get an OpenAI invoice. It says $800 this month. You have no idea which function caused it.

def run_pipeline(query):
    docs = fetch_and_summarize(query)    # ← costs $600?
    entities = extract_entities(docs)   # ← or this one?
    return generate_report(entities)    # ← or this one?

Langfuse and Helicone force you to reroute traffic through their proxy. Sign up. Configure. Break your local setup.

llmspy takes 1 line. No proxy. No signup. Runs entirely on your machine.

The Fix

import llmspy

@llmspy.profile
def run_pipeline(query):
    docs = fetch_and_summarize(query)
    entities = extract_entities(docs)
    return generate_report(entities)

run_pipeline("Analyze Q3 earnings")
llmspy.report()

Output

╔══════════════════════════════════════════════════════════════════════╗
║  llmspy cost report                                                  ║
║  total: $0.0523  ·  18,734 tokens  ·  3 calls                       ║
╠══════════════════════════════════════════════════════════════════════╣
║                                                                      ║
║  fetch_and_summarize      $0.038  ████████████░░░░  73%             ║
║    └─ gpt-4o               $0.038  ████████████░░░░  73%            ║
║       └─ 12,000 tokens                                               ║
║                                                                      ║
║  generate_report          $0.011  ████░░░░░░░░░░░░  21%            ║
║    └─ gpt-4o               $0.011  ████░░░░░░░░░░░░  21%            ║
║       └─ 3,600 tokens                                                ║
║                                                                      ║
║  extract_entities         $0.003  █░░░░░░░░░░░░░░░   6%            ║
║    └─ gpt-4o-mini          $0.003  █░░░░░░░░░░░░░░░   6%            ║
║       └─ 3,134 tokens                                                ║
║                                                                      ║
╠══════════════════════════════════════════════════════════════════════╣
║  Optimization hints                                                  ║
║                                                                      ║
║  🔴 fetch_and_summarize [gpt-4o]                                     ║
║     Switch to gpt-4o-mini — 94% cheaper  (~$540/month savings)      ║
║                                                                      ║
║  🟡 fetch_and_summarize [gpt-4o]                                     ║
║     Avg input: 12,000 tokens. Trim context or limit retrieval.       ║
╚══════════════════════════════════════════════════════════════════════╝

Now you know: fetch_and_summarize is burning 73% of your budget. Fix that one function, cut your bill by $540/month.

Quick Start

Decorator (most common)

import llmspy

@llmspy.profile
def summarize_docs(docs: list[str]) -> str:
    return openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "\n".join(docs)}]
    ).choices[0].message.content

summarize_docs(my_docs)
llmspy.report()            # prints flame graph to terminal
llmspy.report("html")     # writes llmspy_report.html, opens in browser

Context Manager

with llmspy.session("research_task") as s:
    response = anthropic_client.messages.create(
        model="claude-haiku-4-5",
        messages=[{"role": "user", "content": query}]
    )

print(f"Cost:   {s.cost_str}")    # "$0.0012"
print(f"Tokens: {s.tokens}")      # 3,240
print(f"Calls:  {s.calls}")       # 1

Programmatic Access

data = llmspy.stats()
# {
#   "total_cost_usd": 0.042,
#   "total_tokens": 15000,
#   "total_calls": 3,
#   "by_function": {"summarize_docs": 0.038, "generate_report": 0.004},
#   "by_model":    {"gpt-4o": 0.040, "gpt-4o-mini": 0.002},
#   "calls": [...],
# }

Persistent Tracking Across Sessions

# In your app startup:
llmspy.init(persist=True)   # saves to ~/.llmspy/usage.db

# Decorate as normal — costs accumulate across restarts
@llmspy.profile
def my_agent(query):
    ...

How It Works

llmspy monkey-patches the SDK client in-process — the same technique used by py-spy and line_profiler:

Your Code
    │
    ├── @llmspy.profile ────────────────────────────── sets active function
    │
    └── openai_client.chat.completions.create(...)
                │
                └── llmspy interceptor (in-process monkey-patch)
                        ├── calls original SDK method
                        ├── reads response.usage (tokens)
                        ├── looks up cost in built-in pricing table
                        ├── records: function · model · tokens · cost · duration
                        └── returns response UNCHANGED to your code

llmspy.report() → renders flame graph from recorded data

No proxy server. No HTTP interception. No environment variables. No configuration.

Your code runs exactly as before. llmspy just watches and keeps score.

HTML Flame Graph

llmspy.report(format="html")

Opens a self-contained HTML file in your browser — zero JS dependencies, pure SVG:

┌─────────────────────────────────────────────────────────────────┐
│  llmspy — Total: $0.0523  (18,734 tokens)                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  fetch_and_summarize  ████████████████████████████████  73%     │
│  generate_report      ████████████                      21%     │
│  extract_entities     ████                               6%     │
│                                                                  │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ Model          │  Cost   │  %    │ Input  │ Output       │   │
│  │ gpt-4o         │ $0.049  │  94%  │ 15,600 │ 4,200        │   │
│  │ gpt-4o-mini    │ $0.003  │   6%  │  3,134 │    500       │   │
│  └──────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

Supported Providers

Automatically detected — nothing to configure:

Provider	Package	Intercepted
OpenAI	`openai>=1.0`	`chat.completions.create` (sync + async)
Anthropic	`anthropic>=0.30`	`messages.create` (sync + async)
Google	`google-generativeai>=0.7`	`generate_content`

Built-in Pricing Table

30+ models, updated Feb 2026. No API call needed.

Model	Input $/1M	Output $/1M
claude-opus-4-6	$15.00	$75.00
claude-sonnet-4-6	$3.00	$15.00
claude-haiku-4-5	$0.80	$4.00
gpt-4o	$2.50	$10.00
gpt-4o-mini	$0.15	$0.60
o1	$15.00	$60.00
gemini-1.5-pro	$1.25	$5.00
gemini-1.5-flash	$0.075	$0.30

→ Full pricing table

API Reference

Symbol	Description
`@llmspy.profile`	Decorator — profile all LLM calls inside the function
`llmspy.session(name)`	Context manager — profile calls in a `with` block
`llmspy.report()`	Print text flame graph to terminal
`llmspy.report(format="html")`	Write + open HTML flame graph in browser
`llmspy.stats()`	Return full breakdown as a dict
`llmspy.reset()`	Clear all recorded calls
`llmspy.init(persist=True)`	Enable SQLite persistence across sessions

Comparison

	Langfuse	Helicone	LiteLLM Proxy	llmspy
Requires proxy / gateway	✅ yes	✅ yes	✅ yes	❌ no
Requires signup	✅ yes	✅ yes	❌ no	❌ no
Local-first	❌ no	❌ no	⚡ partial	✅ yes
Zero dependencies	❌ no	❌ no	❌ no	✅ yes
Flame graph output	❌ no	❌ no	❌ no	✅ yes
`@decorator` API	❌ no	❌ no	❌ no	✅ yes
Optimization hints	❌ no	⚡ partial	❌ no	✅ yes
Works offline	❌ no	❌ no	⚡ partial	✅ yes

Roadmap

Streaming response support (stream=True)
Token budget alerts: @llmspy.profile(budget_usd=0.10)
LangChain / LangGraph integration
CLI: llmspy history, llmspy report
GitHub Actions annotation (cost diff per PR)
Cost comparison across git commits

Contributing

git clone https://github.com/pinakimishra95/llm-cost-profiler
cd llm-cost-profiler
pip install -e ".[dev]"
pytest tests/                # 59 tests, ~0.1s

Issues and PRs welcome — especially for new provider support and updated pricing.

License

MIT © Pinaki Mishra. See LICENSE.

Star this repo if you're tired of mystery LLM invoices. ⭐

GitHub · PyPI · Issues

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Pinaki_Agent_memory

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.0

Mar 10, 2026

0.1.3

Feb 26, 2026

0.1.2

Feb 26, 2026

0.1.1

Feb 26, 2026

This version

0.1.0

Feb 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenspy-0.1.0.tar.gz (23.2 kB view details)

Uploaded Feb 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tokenspy-0.1.0-py3-none-any.whl (21.0 kB view details)

Uploaded Feb 26, 2026 Python 3

File details

Details for the file tokenspy-0.1.0.tar.gz.

File metadata

Download URL: tokenspy-0.1.0.tar.gz
Upload date: Feb 26, 2026
Size: 23.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokenspy-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6f2194c6341dc9f9e6dc3ad66593539344fc4a698431c80dd6c3ccede94fd2c8`
MD5	`423cc5c500d3aada2cdab04723f6dedf`
BLAKE2b-256	`a74123e32e5ba265658cab1ccfa8bc05c3c8bf6f5ea954a3e55043e3c90eb0bc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenspy-0.1.0.tar.gz:

Publisher: publish.yml on pinakimishra95/llm-cost-profiler

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tokenspy-0.1.0.tar.gz
- Subject digest: 6f2194c6341dc9f9e6dc3ad66593539344fc4a698431c80dd6c3ccede94fd2c8
- Sigstore transparency entry: 995194184
- Sigstore integration time: Feb 26, 2026
Source repository:
- Permalink: pinakimishra95/llm-cost-profiler@5fcf1c0cbf0f2520f7b3468251cf8962e6c3a8c7
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/pinakimishra95
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5fcf1c0cbf0f2520f7b3468251cf8962e6c3a8c7
- Trigger Event: release

File details

Details for the file tokenspy-0.1.0-py3-none-any.whl.

File metadata

Download URL: tokenspy-0.1.0-py3-none-any.whl
Upload date: Feb 26, 2026
Size: 21.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokenspy-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`606245eb70fb572cc03e8e7a9862cb7646799ab93efba25c6506ef35aafcc63e`
MD5	`6d1a04132e388d37e92f12c6be0d6660`
BLAKE2b-256	`966b06869d7176177786c6fe0e6675a8f5618f576e669a889eed37a0d7f86a6c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenspy-0.1.0-py3-none-any.whl:

Publisher: publish.yml on pinakimishra95/llm-cost-profiler

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tokenspy-0.1.0-py3-none-any.whl
- Subject digest: 606245eb70fb572cc03e8e7a9862cb7646799ab93efba25c6506ef35aafcc63e
- Sigstore transparency entry: 995194208
- Sigstore integration time: Feb 26, 2026
Source repository:
- Permalink: pinakimishra95/llm-cost-profiler@5fcf1c0cbf0f2520f7b3468251cf8962e6c3a8c7
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/pinakimishra95
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5fcf1c0cbf0f2520f7b3468251cf8962e6c3a8c7
- Trigger Event: release

tokenspy 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

llmspy 🔥

The Problem

The Fix

Output

Quick Start

Decorator (most common)

Context Manager

Programmatic Access

Persistent Tracking Across Sessions

How It Works

HTML Flame Graph

Supported Providers

Built-in Pricing Table

API Reference

Comparison

Roadmap

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance