scorchmark

Find the cache tax draining your AI bill: a cross-provider cache-TTL-waste detector + model-swap savings simulator + pricing-drift + per-agent attribution. MCP server + CLI.

These details have not been verified by PyPI

Project links

Project description

Scorchmark

The scorch mark on your AI bill. Scorchmark finds the cache-rebuild waste, model-swap savings, and silent price hikes your provider dashboard won't show you — as an MCP tool + CLI.

License: MIT MCP server tests: 66 passing runtime deps: 0

Scorchmark catching runaway cache-rebuild spend live

Your provider dashboard tells you what you spent — a day late. It never tells you what you wasted: the cache rebuilds you re-paid for, the model that would have done the job for 40% less, the silent price hike. Scorchmark reads a cost log you already have and finds that waste — starting with the cache-TTL waste behind the documented $6,000 overnight burn, which no observability tool we surveyed detects.

It answers the questions the dashboard can't: where did the cache money go, which agent burned it, and what would a cheaper model have saved? Read-only and local — no proxy in your request path — as an MCP tool your agent can call mid-run, or a one-line CLI.

❌ Without Scorchmark	✅ With it
You find out from the limit email, a day late	The agent sees `warn` at wake-up #2, mid-run
Cache-TTL waste silently eats up to 90% of spend	`detect_cache_waste` flags it and prices the loss
No idea which agent burned the money	`cost_by_agent` attributes every dollar
"Should I have used a cheaper model?" stays unknowable	`simulate_model_swap` gives the exact saved %

Why this exists

Someone left Claude Code looping overnight to check PRs and woke up to a $6,000 bill. Not a bug in their code. A cache-TTL change (1 hour down to 5 minutes) meant every 30-minute wake-up rebuilt an 800k-token history at the cache write rate instead of the cheap cache read. The dashboard showed nothing for days. The first warning was the limit email, after the money was gone.

The post got 1,400+ upvotes because everyone running unattended agents felt the cold sweat. And there was no tripwire — every tool, including the provider's own dashboard, is retrospective.

Replaying that incident's shape through these tools:

ingested 46 wake-ups          total spend $265
detect_cache_waste            45 rebuilds, $237 wasted (90% of spend)
check_budget($50 cap)         first 'warn' at wake-up #2 (burn $20/hr)
simulate_model_swap→Sonnet    $265 would have been $159 (save 40%)

90% of that spend was avoidable, and a $50 cap would have tripped on the second wake-up, not the 46th. (Absolute dollars scale with your context size and loop length; the waste fraction and the early catch are the point.)

How it's different from Helicone / Langfuse / LangSmith

Those are good tools. They are also a different shape of thing.

	Helicone	Langfuse / LangSmith	Scorchmark
Form factor	Proxy in your request path	SDK / OTel tracing + dashboard	MCP tool the agent calls
When you learn	Dashboard, after the call	Dashboard, after the run	In the loop, before the next call
Who acts on it	A human reading a chart	A human reading a trace	The agent itself
Cache-TTL waste	Not detected	Not detected	Detected and priced
In your critical path	Yes (all traffic routed)	No	No (read-only on your logs)
Setup	Swap base URL	Instrument SDK	Point it at a log file

The wedge: those tools tell you what happened. Scorchmark tells the agent what's about to happen, in a form it can act on without a human in the loop. It adds nothing to your request path — it reads a cost log you already have.

And it is not a spend cap. Hard budget enforcement is a commodity now — Cloudflare AI Gateway, LiteLLM, and Portkey all block-before-the-call. Scorchmark does the part they don't: it tells you where the money leaked and what to change. Cap your spend with a gateway; find the cache tax with this. It is also not a full observability platform — if you want flame-graph traces and prompt evals, run Langfuse. Scorchmark is the cost-intelligence layer, and it composes fine with both.

Quickstart

The CLI core is pure stdlib (install pulls nothing). The MCP server adds one extra:

uv sync --extra mcp                                                   # MCP server deps (FastMCP)
uv run fastmcp run server.py                                          # stdio (Claude Desktop / Inspector)
uv run fastmcp run server.py --transport streamable-http --port 8000  # remote / MCPize

Add it to Claude Desktop / Cursor (claude_desktop_config.json or .cursor/mcp.json):

{
  "mcpServers": {
    "scorchmark": {
      "command": "uv",
      "args": ["run", "fastmcp", "run", "/path/to/scorchmark/server.py"]
    }
  }
}

Then, in the loop:

ingest_run(open("examples/sample_cost_log.jsonl").read())
check_budget(monthly_cap_usd=100, reset_day=1)   # ok | warn | breach, with ETA to the reset
detect_cache_waste()                             # the $6k pattern
simulate_model_swap(to_model="claude-haiku-4-5") # exact per-row savings, cross-provider OK

Try it in your terminal (no MCP client)

Run it on your own Claude Code usage — zero setup, a log you already have:

uvx --from scorchmark scorchmark report --claude-code
# reads ~/.claude/projects/**/*.jsonl directly and prices every request

That auto-adapts Claude Code's session transcripts — no reformatting. Or point it at any cost log:

uvx --from scorchmark scorchmark report mylog.jsonl --cap 100   # auto-detects the format
uv run scorchmark report examples/sample_cost_log.jsonl --cap 50          # from a clone
cat mylog.jsonl | uv run scorchmark swap - --to claude-haiku-4-5          # reads stdin

Subcommands: report (all checks), budget, cache-waste, by-agent, anomalies, swap. Add --json for the raw result, --from {auto,scorchmark,claude-code} to force a format, or point the log argument at a directory of .jsonl files. Same engine as the MCP server.

See it run

Real output from examples/sample_cost_log.jsonl — the live warn, the cache-waste dollars, and the per-agent breakdown the provider dashboard never shows you (these are actual tool results, not mockups):

Live budget tripwire — warns mid-run

Cache-TTL waste detection

Per-agent attribution

Tools

Core

Tool	What it does
`ingest_run`	Load a cost-log JSONL. Computes cost from the bundled pricing model when absent.
`check_budget`	Live tripwire: spend, burn rate, projected spend to the next reset, `ok`/`warn`/`breach`, ETA to cap.
`find_spend_anomalies`	Flags requests costing N× the agent's median — the loop-spike signature.
`detect_cache_waste`	Detects cache waste, modeling each provider's real economics (see below) and pricing it.
`cost_by_agent`	Per-agent attribution: cost, share, requests, average per request.

Edge (not offered by any tool we surveyed)

Tool	What it does
`detect_spend_acceleration`	Flags a burn rate that doubles across consecutive windows (runaway context growth).
`simulate_model_swap`	Recomputes every past request at another model's price for the row-exact savings. Cross-provider.
`detect_pricing_drift`	Snapshots provider rates and surfaces any silent change — the root cause of the $6k burn.

Match (parity with the heavy gateways)

Tool	What it does
`predict_rate_limit`	Projects ETA to a 429 from rate-limit headers, per dimension.
`detect_stuck_agent`	Flags an agent repeating the same tool call — the stuck-loop signature.
`build_alert_payload`	Turns any result into a Slack, ntfy, or PagerDuty webhook payload.

Resource scorchmark://pricing/current exposes the curated cross-provider pricing model.

Log format

One JSON object per line — the de-facto schema the cost trackers, and Claude's own usage fields, already emit:

{"request_id": "r1", "ts": "2026-06-21T03:00:00Z", "provider": "anthropic",
 "model": "claude-opus-4-8", "agent_id": "pr-loop", "input_tokens": 2000,
 "output_tokens": 1500, "cache_write_tokens": 800000, "cache_read_tokens": 0}

ts accepts ISO-8601 (naive timestamps are read as UTC), epoch seconds, or epoch milliseconds. cost_usd is optional and computed when absent. Anthropic's native cache_creation_input_tokens and cache_read_input_tokens are accepted too. Two optional field groups unlock extra tools:

Field group	Unlocks
`rate_limit_remaining_tokens`, `rate_limit_limit_tokens`, `rate_limit_reset_s` (and `*_requests`)	`predict_rate_limit`
`tool_name`, `tool_args_hash`	`detect_stuck_agent`

Alerting

Set SCORCHMARK_WEBHOOK_URL to a JSON webhook (Slack, ntfy, PagerDuty), then call check_budget(..., alert=True) to POST a payload on warn or breach. Or call build_alert_payload(result) on any tool's output and route it yourself.

The webhook URL is yours to supply; if you self-host this for others, validate/allowlist it (an attacker-controlled URL is an SSRF vector).

Pricing

Tier	Price	For
Free	$0	find the cache tax on your own logs — the solo dev who got burned once
Pro	$19/mo	unattended loops: cache-waste + burn-acceleration early warning, webhook alerting
Team	$49/mo	per-agent attribution, model-swap savings simulator, pricing-drift, audit-trail export

Catch one runaway loop and it has paid for itself many times over — and the free tier alone catches the $6k cache-TTL pattern.

Unlocking Pro/Team. The paid tools (detect_spend_acceleration, cost_by_agent, simulate_model_swap, detect_pricing_drift, and webhook alerting) unlock with a signed license key, verified offline (Ed25519 — no phone-home, so the no-outbound-calls guarantee holds):

pip install 'scorchmark[pro]'      # adds the offline verifier
export SCORCHMARK_LICENSE=SCM1.....       # the key from your purchase
scorchmark license                        # confirm → tier: TEAM · active

Buy via MCPize (managed billing) or direct Stripe — the full get-paid setup is in PAYMENTS.md. The free tier needs none of this and stays pure-stdlib.

Security

Read-only, local-only, no credential storage, no outbound calls from core logic. Core modules have zero runtime dependencies beyond the Python standard library. See SECURITY.md.

Cross-provider cache economics

The three providers price caching three different ways, and detect_cache_waste models each — this is why a generic "cache miss" tool gets the dollars wrong.

Provider	Cache model	Where the waste is	Same slow loop*
Anthropic	Write premium (write = 1.25× input, read = 0.1×)	Re-paying the write premium when the loop interval exceeds the TTL	$237
OpenAI	Automatic, no write premium (read = 0.1× input)	A stable prefix that never cache-hits, losing the ~90% read discount	$46 (gpt-5)
Google Gemini	Read discount plus hourly storage ($4.50/M-tok/hr Pro)	Missed discount, and storage billed on idle explicit caches	$46 (2.5-pro)

*Same 46-wake-up, 800k-context, 30-min loop, priced per provider. The catastrophic version is Anthropic-specific — the write premium is what turned that loop into a $6k bill. On OpenAI/Gemini the identical loop "only" forfeits the read discount, which the detector reports honestly as a smaller number.

Accuracy

Anthropic, OpenAI, and Google rates in pricing.py were re-verified on 2026-06-22 against each provider's official pricing page (platform.claude.com, developers.openai.com/api/docs/pricing, ai.google.dev/gemini-api/docs/pricing) — every row confirmed, and the current OpenAI tiers (incl. gpt-5.4-mini / -nano) added. The tables are the standard context tier; very-large-context pricing (OpenAI >272K, Gemini >200K) and Gemini's hourly cache-storage fee are noted but not priced per row, since the cost log carries no context-tier or cache-lifetime field. detect_pricing_drift exists because providers change rates without notice — run it regularly.

License

Code: MIT (see LICENSE) — the entire source, including the gated tools, is free to read, fork, and modify. This is honest open-core, not DRM.

Pro/Team license keys are a separate commercial purchase: a signed key (verified offline, Ed25519) that activates the paid tools in official builds and funds the maintained, cross-provider pricing model. Buying a key supports the project and gets you the official tier — the MIT license means you could edit the gate out, but the key is what keeps the pricing data and the audit-trail tier maintained. Keys are per-purchaser and non-transferable; sold with no warranty (the MIT terms govern the software itself). Buy via MCPize (managed billing) or direct Stripe — see PAYMENTS.md.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.1

Jun 29, 2026

This version

0.6.0

Jun 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scorchmark-0.6.0.tar.gz (49.8 kB view details)

Uploaded Jun 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scorchmark-0.6.0-py3-none-any.whl (39.5 kB view details)

Uploaded Jun 28, 2026 Python 3

File details

Details for the file scorchmark-0.6.0.tar.gz.

File metadata

Download URL: scorchmark-0.6.0.tar.gz
Upload date: Jun 28, 2026
Size: 49.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for scorchmark-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`9abe2582a19a91b99bf4a3d4f8508b10320416b7dcf7147ca190badfa83cdec7`
MD5	`e6c710e3a581b1d1242144694d2c66cb`
BLAKE2b-256	`ab88893d8bb1cd5a21ee19aa1d5180416eae8448bb263be1a9cd04dc3203fb3a`

See more details on using hashes here.

File details

Details for the file scorchmark-0.6.0-py3-none-any.whl.

File metadata

Download URL: scorchmark-0.6.0-py3-none-any.whl
Upload date: Jun 28, 2026
Size: 39.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for scorchmark-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e16a1aac3059cb1dffa5711ddaec2cce3b9186c996856b8761cd79b1fbbf9569`
MD5	`ec41f34fecc30dd9d7ae366fe03af2f6`
BLAKE2b-256	`f33009978c17e4481624e62b2a4fb28d14d6666441a52d683c532ea85556ee5a`

See more details on using hashes here.

scorchmark 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Why this exists

How it's different from Helicone / Langfuse / LangSmith

Quickstart

Try it in your terminal (no MCP client)

See it run

Tools

Core

Edge (not offered by any tool we surveyed)

Match (parity with the heavy gateways)

Log format

Alerting

Pricing

Security

Cross-provider cache economics

Accuracy

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes