Find the cache tax draining your AI bill: a cross-provider cache-TTL-waste detector + model-swap savings simulator + pricing-drift + per-agent attribution. MCP server + CLI.
Project description
The scorch mark on your AI bill. Scorchmark finds the cache-rebuild waste, model-swap savings, and silent price hikes your provider dashboard won't show you — as an MCP tool + CLI.
Your provider dashboard tells you what you spent — a day late. It never tells you what you wasted: the cache rebuilds you re-paid for, the model that would have done the job for 40% less, the silent price hike. Scorchmark reads a cost log you already have and finds that waste — starting with the cache-TTL waste behind the documented $6,000 overnight burn, which no observability tool we surveyed detects.
It answers the questions the dashboard can't: where did the cache money go, which agent burned it, and what would a cheaper model have saved? Read-only and local — no proxy in your request path — as an MCP tool your agent can call mid-run, or a one-line CLI.
| ❌ Without Scorchmark | ✅ With it |
|---|---|
| You find out from the limit email, a day late | The agent sees warn at wake-up #2, mid-run |
| Cache-TTL waste silently eats up to 90% of spend | detect_cache_waste flags it and prices the loss |
| No idea which agent burned the money | cost_by_agent attributes every dollar |
| "Should I have used a cheaper model?" stays unknowable | simulate_model_swap gives the exact saved % |
Why this exists
Someone left Claude Code looping overnight to check PRs and woke up to a $6,000 bill. Not a bug in their code. A cache-TTL change (1 hour down to 5 minutes) meant every 30-minute wake-up rebuilt an 800k-token history at the cache write rate instead of the cheap cache read. The dashboard showed nothing for days. The first warning was the limit email, after the money was gone.
The post got 1,400+ upvotes because everyone running unattended agents felt the cold sweat. And there was no tripwire — every tool, including the provider's own dashboard, is retrospective.
Replaying that incident's shape through these tools:
ingested 46 wake-ups total spend $265
detect_cache_waste 45 rebuilds, $237 wasted (90% of spend)
check_budget($50 cap) first 'warn' at wake-up #2 (burn $20/hr)
simulate_model_swap→Sonnet $265 would have been $159 (save 40%)
90% of that spend was avoidable, and a $50 cap would have tripped on the second wake-up, not the 46th. (Absolute dollars scale with your context size and loop length; the waste fraction and the early catch are the point.)
How it's different from Helicone / Langfuse / LangSmith
Those are good tools. They are also a different shape of thing.
| Helicone | Langfuse / LangSmith | Scorchmark | |
|---|---|---|---|
| Form factor | Proxy in your request path | SDK / OTel tracing + dashboard | MCP tool the agent calls |
| When you learn | Dashboard, after the call | Dashboard, after the run | In the loop, before the next call |
| Who acts on it | A human reading a chart | A human reading a trace | The agent itself |
| Cache-TTL waste | Not detected | Not detected | Detected and priced |
| In your critical path | Yes (all traffic routed) | No | No (read-only on your logs) |
| Setup | Swap base URL | Instrument SDK | Point it at a log file |
The wedge: those tools tell you what happened. Scorchmark tells the agent what's about to happen, in a form it can act on without a human in the loop. It adds nothing to your request path — it reads a cost log you already have.
And it is not a spend cap. Hard budget enforcement is a commodity now — Cloudflare AI Gateway, LiteLLM, and Portkey all block-before-the-call. Scorchmark does the part they don't: it tells you where the money leaked and what to change. Cap your spend with a gateway; find the cache tax with this. It is also not a full observability platform — if you want flame-graph traces and prompt evals, run Langfuse. Scorchmark is the cost-intelligence layer, and it composes fine with both.
Quickstart
The CLI core is pure stdlib (install pulls nothing). The MCP server adds one extra:
uv sync --extra mcp # MCP server deps (FastMCP)
uv run fastmcp run server.py # stdio (Claude Desktop / Inspector)
uv run fastmcp run server.py --transport streamable-http --port 8000 # remote / MCPize
Add it to Claude Desktop / Cursor (claude_desktop_config.json or .cursor/mcp.json):
{
"mcpServers": {
"scorchmark": {
"command": "uv",
"args": ["run", "fastmcp", "run", "/path/to/scorchmark/server.py"]
}
}
}
Then, in the loop:
ingest_run(open("examples/sample_cost_log.jsonl").read())
check_budget(monthly_cap_usd=100, reset_day=1) # ok | warn | breach, with ETA to the reset
detect_cache_waste() # the $6k pattern
simulate_model_swap(to_model="claude-haiku-4-5") # exact per-row savings, cross-provider OK
Try it in your terminal (no MCP client)
Run it on your own Claude Code usage — zero setup, a log you already have:
uvx --from scorchmark scorchmark report --claude-code
# reads ~/.claude/projects/**/*.jsonl directly and prices every request
That auto-adapts Claude Code's session transcripts — no reformatting. Or point it at any cost log:
uvx --from scorchmark scorchmark report mylog.jsonl --cap 100 # auto-detects the format
uv run scorchmark report examples/sample_cost_log.jsonl --cap 50 # from a clone
cat mylog.jsonl | uv run scorchmark swap - --to claude-haiku-4-5 # reads stdin
Subcommands: report (all checks), budget, cache-waste, by-agent, anomalies, swap.
Add --json for the raw result, --from {auto,scorchmark,claude-code} to force a format, or point
the log argument at a directory of .jsonl files. Same engine as the MCP server.
See it run
Real output from examples/sample_cost_log.jsonl — the live warn, the cache-waste dollars, and
the per-agent breakdown the provider dashboard never shows you (these are actual tool results, not
mockups):
Tools
Core
| Tool | What it does |
|---|---|
ingest_run |
Load a cost-log JSONL. Computes cost from the bundled pricing model when absent. |
check_budget |
Live tripwire: spend, burn rate, projected spend to the next reset, ok/warn/breach, ETA to cap. |
find_spend_anomalies |
Flags requests costing N× the agent's median — the loop-spike signature. |
detect_cache_waste |
Detects cache waste, modeling each provider's real economics (see below) and pricing it. |
cost_by_agent |
Per-agent attribution: cost, share, requests, average per request. |
Edge (not offered by any tool we surveyed)
| Tool | What it does |
|---|---|
detect_spend_acceleration |
Flags a burn rate that doubles across consecutive windows (runaway context growth). |
simulate_model_swap |
Recomputes every past request at another model's price for the row-exact savings. Cross-provider. |
detect_pricing_drift |
Snapshots provider rates and surfaces any silent change — the root cause of the $6k burn. |
Match (parity with the heavy gateways)
| Tool | What it does |
|---|---|
predict_rate_limit |
Projects ETA to a 429 from rate-limit headers, per dimension. |
detect_stuck_agent |
Flags an agent repeating the same tool call — the stuck-loop signature. |
build_alert_payload |
Turns any result into a Slack, ntfy, or PagerDuty webhook payload. |
Resource scorchmark://pricing/current exposes the curated cross-provider pricing model.
Log format
One JSON object per line — the de-facto schema the cost trackers, and Claude's own usage fields, already emit:
{"request_id": "r1", "ts": "2026-06-21T03:00:00Z", "provider": "anthropic",
"model": "claude-opus-4-8", "agent_id": "pr-loop", "input_tokens": 2000,
"output_tokens": 1500, "cache_write_tokens": 800000, "cache_read_tokens": 0}
ts accepts ISO-8601 (naive timestamps are read as UTC), epoch seconds, or epoch milliseconds.
cost_usd is optional and computed when absent. Anthropic's native cache_creation_input_tokens
and cache_read_input_tokens are accepted too. Two optional field groups unlock extra tools:
| Field group | Unlocks |
|---|---|
rate_limit_remaining_tokens, rate_limit_limit_tokens, rate_limit_reset_s (and *_requests) |
predict_rate_limit |
tool_name, tool_args_hash |
detect_stuck_agent |
Alerting
Set SCORCHMARK_WEBHOOK_URL to a JSON webhook (Slack, ntfy, PagerDuty), then call
check_budget(..., alert=True) to POST a payload on warn or breach. Or call
build_alert_payload(result) on any tool's output and route it yourself.
The webhook URL is yours to supply; if you self-host this for others, validate/allowlist it (an attacker-controlled URL is an SSRF vector).
Pricing
| Tier | Price | For |
|---|---|---|
| Free | $0 | find the cache tax on your own logs — the solo dev who got burned once |
| Pro | $19/mo | unattended loops: cache-waste + burn-acceleration early warning, webhook alerting |
| Team | $49/mo | per-agent attribution, model-swap savings simulator, pricing-drift, audit-trail export |
Catch one runaway loop and it has paid for itself many times over — and the free tier alone catches the $6k cache-TTL pattern.
Unlocking Pro/Team. The paid tools (detect_spend_acceleration, cost_by_agent,
simulate_model_swap, detect_pricing_drift, and webhook alerting) unlock with a signed license
key, verified offline (Ed25519 — no phone-home, so the no-outbound-calls guarantee holds):
pip install 'scorchmark[pro]' # adds the offline verifier
export SCORCHMARK_LICENSE=SCM1..... # the key from your purchase
scorchmark license # confirm → tier: TEAM · active
Buy via MCPize (managed billing) or direct Stripe — the full get-paid setup is in PAYMENTS.md. The free tier needs none of this and stays pure-stdlib.
Security
Read-only, local-only, no credential storage, no outbound calls from core logic. Core modules have zero runtime dependencies beyond the Python standard library. See SECURITY.md.
Cross-provider cache economics
The three providers price caching three different ways, and detect_cache_waste models each —
this is why a generic "cache miss" tool gets the dollars wrong.
| Provider | Cache model | Where the waste is | Same slow loop* |
|---|---|---|---|
| Anthropic | Write premium (write = 1.25× input, read = 0.1×) | Re-paying the write premium when the loop interval exceeds the TTL | $237 |
| OpenAI | Automatic, no write premium (read = 0.1× input) | A stable prefix that never cache-hits, losing the ~90% read discount | $46 (gpt-5) |
| Google Gemini | Read discount plus hourly storage ($4.50/M-tok/hr Pro) | Missed discount, and storage billed on idle explicit caches | $46 (2.5-pro) |
*Same 46-wake-up, 800k-context, 30-min loop, priced per provider. The catastrophic version is Anthropic-specific — the write premium is what turned that loop into a $6k bill. On OpenAI/Gemini the identical loop "only" forfeits the read discount, which the detector reports honestly as a smaller number.
Accuracy
Anthropic, OpenAI, and Google rates in pricing.py were re-verified on 2026-06-22 against each
provider's official pricing page (platform.claude.com, developers.openai.com/api/docs/pricing,
ai.google.dev/gemini-api/docs/pricing) — every row confirmed, and the current OpenAI tiers
(incl. gpt-5.4-mini / -nano) added. The tables are the standard context tier; very-large-context
pricing (OpenAI >272K, Gemini >200K) and Gemini's hourly cache-storage fee are noted but not priced
per row, since the cost log carries no context-tier or cache-lifetime field. detect_pricing_drift
exists because providers change rates without notice — run it regularly.
License
Code: MIT (see LICENSE) — the entire source, including the gated tools, is free to read, fork, and modify. This is honest open-core, not DRM.
Pro/Team license keys are a separate commercial purchase: a signed key (verified offline, Ed25519) that activates the paid tools in official builds and funds the maintained, cross-provider pricing model. Buying a key supports the project and gets you the official tier — the MIT license means you could edit the gate out, but the key is what keeps the pricing data and the audit-trail tier maintained. Keys are per-purchaser and non-transferable; sold with no warranty (the MIT terms govern the software itself). Buy via MCPize (managed billing) or direct Stripe — see PAYMENTS.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scorchmark-0.6.0.tar.gz.
File metadata
- Download URL: scorchmark-0.6.0.tar.gz
- Upload date:
- Size: 49.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9abe2582a19a91b99bf4a3d4f8508b10320416b7dcf7147ca190badfa83cdec7
|
|
| MD5 |
e6c710e3a581b1d1242144694d2c66cb
|
|
| BLAKE2b-256 |
ab88893d8bb1cd5a21ee19aa1d5180416eae8448bb263be1a9cd04dc3203fb3a
|
File details
Details for the file scorchmark-0.6.0-py3-none-any.whl.
File metadata
- Download URL: scorchmark-0.6.0-py3-none-any.whl
- Upload date:
- Size: 39.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e16a1aac3059cb1dffa5711ddaec2cce3b9186c996856b8761cd79b1fbbf9569
|
|
| MD5 |
ec41f34fecc30dd9d7ae366fe03af2f6
|
|
| BLAKE2b-256 |
f33009978c17e4481624e62b2a4fb28d14d6666441a52d683c532ea85556ee5a
|