Skip to main content

Find the cache tax draining your AI bill: a cross-provider cache-TTL-waste detector + model-swap savings simulator + pricing-drift + per-agent attribution. MCP server + CLI.

Project description

Scorchmark

The scorch mark on your AI bill. Scorchmark finds the cache-rebuild waste, model-swap savings, and silent price hikes your provider dashboard won't show you — as an MCP tool + CLI.

CI License: MIT MCP server tests: 66 passing runtime deps: 0

Scorchmark catching runaway cache-rebuild spend live

Your provider dashboard tells you what you spent — a day late. It never tells you what you wasted: the cache rebuilds you re-paid for, the model that would have done the job for 40% less, the silent price hike. Scorchmark reads a cost log you already have and finds that waste — starting with the cache-TTL waste behind the documented $6,000 overnight burn, which no observability tool we surveyed detects.

It answers the questions the dashboard can't: where did the cache money go, which agent burned it, and what would a cheaper model have saved? Read-only and local — no proxy in your request path — as an MCP tool your agent can call mid-run, or a one-line CLI.

❌ Without Scorchmark ✅ With it
You find out from the limit email, a day late The agent sees warn at wake-up #2, mid-run
Cache-TTL waste silently eats up to 90% of spend detect_cache_waste flags it and prices the loss
No idea which agent burned the money cost_by_agent attributes every dollar
"Should I have used a cheaper model?" stays unknowable simulate_model_swap gives the exact saved %

Why this exists

Someone left Claude Code looping overnight to check PRs and woke up to a $6,000 bill. Not a bug in their code. A cache-TTL change (1 hour down to 5 minutes) meant every 30-minute wake-up rebuilt an 800k-token history at the cache write rate instead of the cheap cache read. The dashboard showed nothing for days. The first warning was the limit email, after the money was gone.

The post got 1,400+ upvotes because everyone running unattended agents felt the cold sweat. And there was no tripwire — every tool, including the provider's own dashboard, is retrospective.

Replaying that incident's shape through these tools:

ingested 46 wake-ups          total spend $265
detect_cache_waste            45 rebuilds, $237 wasted (90% of spend)
check_budget($50 cap)         first 'warn' at wake-up #2 (burn $20/hr)
simulate_model_swap→Sonnet    $265 would have been $159 (save 40%)

90% of that spend was avoidable, and a $50 cap would have tripped on the second wake-up, not the 46th. (Absolute dollars scale with your context size and loop length; the waste fraction and the early catch are the point.)

How it's different from Helicone / Langfuse / LangSmith

Those are good tools. They are also a different shape of thing.

Helicone Langfuse / LangSmith Scorchmark
Form factor Proxy in your request path SDK / OTel tracing + dashboard MCP tool the agent calls
When you learn Dashboard, after the call Dashboard, after the run In the loop, before the next call
Who acts on it A human reading a chart A human reading a trace The agent itself
Cache-TTL waste Not detected Not detected Detected and priced
In your critical path Yes (all traffic routed) No No (read-only on your logs)
Setup Swap base URL Instrument SDK Point it at a log file

The wedge: those tools tell you what happened. Scorchmark tells the agent what's about to happen, in a form it can act on without a human in the loop. It adds nothing to your request path — it reads a cost log you already have.

And it is not a spend cap. Hard budget enforcement is a commodity now — Cloudflare AI Gateway, LiteLLM, and Portkey all block-before-the-call. Scorchmark does the part they don't: it tells you where the money leaked and what to change. Cap your spend with a gateway; find the cache tax with this. It is also not a full observability platform — if you want flame-graph traces and prompt evals, run Langfuse. Scorchmark is the cost-intelligence layer, and it composes fine with both.

Quickstart

The CLI core is pure stdlib (install pulls nothing). The MCP server adds one extra:

uv sync --extra mcp                                                   # MCP server deps (FastMCP)
uv run fastmcp run server.py                                          # stdio (Claude Desktop / Inspector)
uv run fastmcp run server.py --transport streamable-http --port 8000  # remote / MCPize

Add it to Claude Desktop / Cursor (claude_desktop_config.json or .cursor/mcp.json):

{
  "mcpServers": {
    "scorchmark": {
      "command": "uv",
      "args": ["run", "fastmcp", "run", "/path/to/scorchmark/server.py"]
    }
  }
}

Then, in the loop:

ingest_run(open("examples/sample_cost_log.jsonl").read())
check_budget(monthly_cap_usd=100, reset_day=1)   # ok | warn | breach, with ETA to the reset
detect_cache_waste()                             # the $6k pattern
simulate_model_swap(to_model="claude-haiku-4-5") # exact per-row savings, cross-provider OK

Try it in your terminal (no MCP client)

Run it on your own Claude Code usage — zero setup, a log you already have:

uvx --from scorchmark scorchmark report --claude-code
# reads ~/.claude/projects/**/*.jsonl directly and prices every request

That auto-adapts Claude Code's session transcripts — no reformatting. Or point it at any cost log:

uvx --from scorchmark scorchmark report mylog.jsonl --cap 100   # auto-detects the format
uv run scorchmark report examples/sample_cost_log.jsonl --cap 50          # from a clone
cat mylog.jsonl | uv run scorchmark swap - --to claude-haiku-4-5          # reads stdin

Subcommands: report (all checks), budget, cache-waste, by-agent, anomalies, swap. Add --json for the raw result, --from {auto,scorchmark,claude-code} to force a format, or point the log argument at a directory of .jsonl files. Same engine as the MCP server.

See it run

Real output from examples/sample_cost_log.jsonl — the live warn, the cache-waste dollars, and the per-agent breakdown the provider dashboard never shows you (these are actual tool results, not mockups):

Live budget tripwire — warns mid-run

Cache-TTL waste detection

Per-agent attribution

Tools

Core

Tool What it does
ingest_run Load a cost-log JSONL. Computes cost from the bundled pricing model when absent.
check_budget Live tripwire: spend, burn rate, projected spend to the next reset, ok/warn/breach, ETA to cap.
find_spend_anomalies Flags requests costing N× the agent's median — the loop-spike signature.
detect_cache_waste Detects cache waste, modeling each provider's real economics (see below) and pricing it.
cost_by_agent Per-agent attribution: cost, share, requests, average per request.

Edge (not offered by any tool we surveyed)

Tool What it does
detect_spend_acceleration Flags a burn rate that doubles across consecutive windows (runaway context growth).
simulate_model_swap Recomputes every past request at another model's price for the row-exact savings. Cross-provider.
detect_pricing_drift Snapshots provider rates and surfaces any silent change — the root cause of the $6k burn.

Match (parity with the heavy gateways)

Tool What it does
predict_rate_limit Projects ETA to a 429 from rate-limit headers, per dimension.
detect_stuck_agent Flags an agent repeating the same tool call — the stuck-loop signature.
build_alert_payload Turns any result into a Slack, ntfy, or PagerDuty webhook payload.

Resource scorchmark://pricing/current exposes the curated cross-provider pricing model.

Log format

One JSON object per line — the de-facto schema the cost trackers, and Claude's own usage fields, already emit:

{"request_id": "r1", "ts": "2026-06-21T03:00:00Z", "provider": "anthropic",
 "model": "claude-opus-4-8", "agent_id": "pr-loop", "input_tokens": 2000,
 "output_tokens": 1500, "cache_write_tokens": 800000, "cache_read_tokens": 0}

ts accepts ISO-8601 (naive timestamps are read as UTC), epoch seconds, or epoch milliseconds. cost_usd is optional and computed when absent. Anthropic's native cache_creation_input_tokens and cache_read_input_tokens are accepted too. Two optional field groups unlock extra tools:

Field group Unlocks
rate_limit_remaining_tokens, rate_limit_limit_tokens, rate_limit_reset_s (and *_requests) predict_rate_limit
tool_name, tool_args_hash detect_stuck_agent

Alerting

Set SCORCHMARK_WEBHOOK_URL to a JSON webhook (Slack, ntfy, PagerDuty), then call check_budget(..., alert=True) to POST a payload on warn or breach. Or call build_alert_payload(result) on any tool's output and route it yourself.

The webhook URL is yours to supply; if you self-host this for others, validate/allowlist it (an attacker-controlled URL is an SSRF vector).

Pricing

Tier Price For
Free $0 find the cache tax on your own logs — the solo dev who got burned once
Pro $19/mo unattended loops: cache-waste + burn-acceleration early warning, webhook alerting
Team $49/mo per-agent attribution, model-swap savings simulator, pricing-drift, audit-trail export

Catch one runaway loop and it has paid for itself many times over — and the free tier alone catches the $6k cache-TTL pattern.

Unlocking Pro/Team. The paid tools (detect_spend_acceleration, cost_by_agent, simulate_model_swap, detect_pricing_drift, and webhook alerting) unlock with a signed license key, verified offline (Ed25519 — no phone-home, so the no-outbound-calls guarantee holds):

pip install 'scorchmark[pro]'      # adds the offline verifier
export SCORCHMARK_LICENSE=SCM1.....       # the key from your purchase
scorchmark license                        # confirm → tier: TEAM · active

Buy via MCPize (managed billing) or direct Stripe — the full get-paid setup is in PAYMENTS.md. The free tier needs none of this and stays pure-stdlib.

Security

Read-only, local-only, no credential storage, no outbound calls from core logic. Core modules have zero runtime dependencies beyond the Python standard library. See SECURITY.md.

Cross-provider cache economics

The three providers price caching three different ways, and detect_cache_waste models each — this is why a generic "cache miss" tool gets the dollars wrong.

Provider Cache model Where the waste is Same slow loop*
Anthropic Write premium (write = 1.25× input, read = 0.1×) Re-paying the write premium when the loop interval exceeds the TTL $237
OpenAI Automatic, no write premium (read = 0.1× input) A stable prefix that never cache-hits, losing the ~90% read discount $46 (gpt-5)
Google Gemini Read discount plus hourly storage ($4.50/M-tok/hr Pro) Missed discount, and storage billed on idle explicit caches $46 (2.5-pro)

*Same 46-wake-up, 800k-context, 30-min loop, priced per provider. The catastrophic version is Anthropic-specific — the write premium is what turned that loop into a $6k bill. On OpenAI/Gemini the identical loop "only" forfeits the read discount, which the detector reports honestly as a smaller number.

Accuracy

Anthropic, OpenAI, and Google rates in pricing.py were re-verified on 2026-06-22 against each provider's official pricing page (platform.claude.com, developers.openai.com/api/docs/pricing, ai.google.dev/gemini-api/docs/pricing) — every row confirmed, and the current OpenAI tiers (incl. gpt-5.4-mini / -nano) added. The tables are the standard context tier; very-large-context pricing (OpenAI >272K, Gemini >200K) and Gemini's hourly cache-storage fee are noted but not priced per row, since the cost log carries no context-tier or cache-lifetime field. detect_pricing_drift exists because providers change rates without notice — run it regularly.

License

Code: MIT (see LICENSE) — the entire source, including the gated tools, is free to read, fork, and modify. This is honest open-core, not DRM.

Pro/Team license keys are a separate commercial purchase: a signed key (verified offline, Ed25519) that activates the paid tools in official builds and funds the maintained, cross-provider pricing model. Buying a key supports the project and gets you the official tier — the MIT license means you could edit the gate out, but the key is what keeps the pricing data and the audit-trail tier maintained. Keys are per-purchaser and non-transferable; sold with no warranty (the MIT terms govern the software itself). Buy via MCPize (managed billing) or direct Stripe — see PAYMENTS.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scorchmark-0.6.0.tar.gz (49.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scorchmark-0.6.0-py3-none-any.whl (39.5 kB view details)

Uploaded Python 3

File details

Details for the file scorchmark-0.6.0.tar.gz.

File metadata

  • Download URL: scorchmark-0.6.0.tar.gz
  • Upload date:
  • Size: 49.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for scorchmark-0.6.0.tar.gz
Algorithm Hash digest
SHA256 9abe2582a19a91b99bf4a3d4f8508b10320416b7dcf7147ca190badfa83cdec7
MD5 e6c710e3a581b1d1242144694d2c66cb
BLAKE2b-256 ab88893d8bb1cd5a21ee19aa1d5180416eae8448bb263be1a9cd04dc3203fb3a

See more details on using hashes here.

File details

Details for the file scorchmark-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: scorchmark-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 39.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for scorchmark-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e16a1aac3059cb1dffa5711ddaec2cce3b9186c996856b8761cd79b1fbbf9569
MD5 ec41f34fecc30dd9d7ae366fe03af2f6
BLAKE2b-256 f33009978c17e4481624e62b2a4fb28d14d6666441a52d683c532ea85556ee5a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page