Find LLM cost leaks before your bill does. Static analysis for Anthropic and OpenAI client code.
Project description
llmdoctor
Static analysis for LLM cost leaks. Catch the bugs that show up on next month's invoice — before they ship.
A single stuck agent session can cost $1,000–$5,000 in tokens. A misplaced
cache_controlmarker turns every cached call into a cache write. Output withoutmax_tokenscan blow $50 on one ramble.
llmdoctorfinds these patterns in your code in seconds — before they ship.
What it does
llmdoctor reads your Python source and reports the LLM-cost bugs most likely to hurt you in production. It works on:
- Direct SDKs —
anthropic.Anthropic()andopenai.OpenAI() - LangChain —
ChatAnthropic,ChatOpenAI,AgentExecutor
It runs in seconds. No runtime dependency. No telemetry. No agents. No code execution. Just a CLI you point at a path.
pip install llmdoctor
llmdoctor doctor src/
That's the whole UX.
Why this exists
LLM bills in 2026 surprise teams in three ways:
| Surprise | Typical cause | What llmdoctor catches |
|---|---|---|
| Cache hit rate stays at 0% | Dynamic content placed before cache_control marker silently invalidates the prefix on every call |
TS001 |
| One ramble blows $50 | No max_tokens cap on an OpenAI call or LangChain wrapper |
TS010, TS101 |
| Single agent session burns $1k–$5k | AgentExecutor(max_iterations=None) lets a stuck loop run unbounded |
TS103 |
These bugs are easy to write, hard to spot at runtime (the cost shows up days later in billing dashboards), and trivially detectable in source.
Quickstart
# install
pip install llmdoctor
# scan
llmdoctor doctor . # current directory
llmdoctor doctor src/agent.py # single file
llmdoctor doctor . --json # CI-friendly output
llmdoctor doctor . --fail-on HIGH # exit 1 in CI on any HIGH finding
Example output
╭─ llmdoctor doctor ─────────────────────────────────────────────────────╮
│ Scanned 14 file(s) under src/ │
│ Found 3 issue(s) · 2 HIGH · 1 MEDIUM │
│ Estimated potential savings: ~$340/month (rough estimate) │
╰─────────────────────────────────────────────────────────────────────────╯
╭─ [HIGH] TS103 AgentExecutor with max_iterations=None — single session
can cost $1k+ in tokens ─╮
│ file: src/agent_factory.py:23 │
│ code: agent = AgentExecutor(agent=llm, tools=tools, max_iterations=None) │
│ why: Setting max_iterations=None removes the loop cap. If the │
│ agent's stop condition fails to trigger, the agent runs │
│ forever — racking up $1,000–$5,000 in tokens per session in │
│ documented 2026 incidents. │
│ fix: Set max_iterations=15 (the LangChain default) or higher if │
│ your agent genuinely needs more depth. Always pair with │
│ max_execution_time=... for a wall-clock cap. │
╰──────────────────────────────────────────────────────────────────────────╯
Every finding includes: the exact line, the why, the concrete fix, and a rough cost estimate with the assumptions printed inline. We never quote a dollar number without showing how we got it.
Check catalog (v0.2.0)
| Code | Severity | Surface | Catches |
|---|---|---|---|
| TS001 | HIGH | Anthropic SDK | Dynamic content placed before cache_control — silently invalidates the prompt cache on every call |
| TS003 | MEDIUM | Anthropic SDK | Long static system prompt with no cache_control — missed cache opportunity |
| TS010 | HIGH | OpenAI SDK | chat.completions.create() without max_tokens — output cost unbounded |
| TS011 | MEDIUM | OpenAI / Anthropic SDK | max_tokens > 8000 — likely a copy-paste default |
| TS020 | MEDIUM | OpenAI / Anthropic SDK | Premium model (Opus, GPT-5, GPT-4-Turbo) on a tiny prompt where a cheaper tier would match quality |
| TS101 | HIGH | LangChain | ChatOpenAI() instantiated without max_tokens — every downstream .invoke() inherits unbounded output |
| TS102 | MEDIUM | LangChain | ChatOpenAI / ChatAnthropic with max_tokens > 8000 |
| TS103 | HIGH | LangChain | AgentExecutor(max_iterations=None) — explicitly unbounded agent loop |
| TS104 | MEDIUM | LangChain | AgentExecutor(max_iterations > 50) — the "I bumped the cap as a workaround" anti-pattern |
How llmdoctor compares
We're not competing with runtime observability tools — we're shift-left. Run llmdoctor in CI; run an observability tool in prod.
| llmdoctor | Helicone, Langfuse, OpenLLMetry | Mem0, Letta | LLMLingua | |
|---|---|---|---|---|
| Where | static, in CI | runtime proxy / SDK | runtime, in agent loop | runtime, prompt rewrite |
| When it fires | before deploy | after each call | per session | per call |
| Catches | bugs in source | metrics, traces, costs | memory drift | token bloat |
| Network | none | required | required | required |
| Adds latency? | no — only runs in CI | yes (~10–50ms) | yes | yes |
| Best for | shift-left cost gates | observability dashboards | persistent agent memory | aggressive token compression |
Use llmdoctor and your observability tool. They cover different failure modes.
Cost estimates: how to read them
Estimates are heuristic, not invoice predictions. Each finding prints its assumptions inline:
estimate: ~$135.00/month (assuming: 3000-token system prompt,
100 calls/day, 30-day month, 0.1× cache-read pricing)
Treat the dollar number as order of magnitude. The value llmdoctor delivers is the finding and the fix; the estimate is there to make it actionable. If your traffic is 10× ours, multiply. If it's 1/10, divide. The pricing table is pinned inside the installed package at llmdoctor/pricing.py and was verified against provider pages on 2026-04-30.
Self-audit
We built llmdoctor knowing a measurement tool with a bug is worse than no tool. Before publishing we ran a four-pass audit:
- Correctness — 11 weird AST shapes (async,
**kwargs, walrus, multi-target, augmented assigns, mocked clients, deeply nested calls). Zero false positives. - Input safety — fixed five concrete bugs before shipping: UTF-8 BOM crash, OOM on 6 MB generated
.py,ValueErrorfromast.parseon binary content,RecursionErroron minified code, plus rich-markup injection through filenames. - Security — no
eval, noexec, no network calls, no telemetry.socket/requests/httpx/urllibnot imported anywhere. Verified. - Honest false negatives — limitations like bound-method assignment, multi-target assigns, mock clients, LiteLLM/OpenRouter/raw-HTTP, and TypeScript codebases are documented in source so the tool never silently lies about coverage.
30 tests, all passing in CI. The audit and the test suite are run on every release; results are summarised in this README so a reader on PyPI can verify what was checked without needing to access source.
What we don't do (yet)
- ✅ Catch direct SDK + LangChain config bugs in Python source
- ❌ Patch your code automatically (we report; you fix)
- ❌ Run your code (static analysis only — safe on closed-source)
- ❌ Measure live traffic (that's the runtime sidecar, on the roadmap)
- ❌ TypeScript / JavaScript (Python only today)
- ❌ LiteLLM, OpenRouter, raw HTTP, or arbitrary wrapper functions
- ❌ Phone home, ship telemetry, or collect usage data — ever
If your codebase doesn't import anthropic, openai, langchain_anthropic, or langchain_openai directly, llmdoctor will produce zero findings. That's a feature, not a bug.
Roadmap
| Version | Theme | What |
|---|---|---|
| ✅ 0.1.0 | direct SDK | TS001/003/010/011/020 — Anthropic + OpenAI direct calls |
| ✅ 0.2.0 | LangChain | TS101/102/103/104 — ChatOpenAI, ChatAnthropic, AgentExecutor |
| 0.3.0 | LlamaIndex | Anthropic, OpenAI, ReActAgent from llama_index.* |
| 0.4.0 | retry storms + tool dup | TS030 (retry without budget), TS040 (tool-definition repetition) |
| 0.5.0 | runtime sidecar | optional Python wrapper that reads cache_read_input_tokens from live responses, surfacing cache drift before billing does |
| 1.0 | TypeScript | @anthropic-ai/sdk + openai (Node) — the same checks for the JS ecosystem |
Why we built it
LLM bills in 2026 are increasingly metered, and the bugs that drive them aren't visible until the bill arrives. Engineers writing prompt caches, configuring LangChain agents, or tweaking max_tokens make small mistakes that quietly translate into 10×-or-more cost penalties — the kind that don't show up until the next finance review.
Existing tools catch this after the call. We catch it before the deploy. Static analysis is the right place for this because the bugs have specific syntactic shapes — exactly what AST tooling is built for.
FAQ
Does it run my code?
No. We use ast.parse, never eval/exec/compile(... 'exec'). Safe to run on closed-source repositories.
Does it phone home? No. Zero network calls anywhere in the codebase. No telemetry, no usage stats, no opt-in beacon. This is a hard requirement for OSS releases — we'd consider it a breaking change to add.
Will it false-positive on my mocked tests?
Probably not — we exercise mock-client patterns in our test suite. If you hit one, suppress the line with # llmdoctor: ignore TS001 (or ignore ALL) and email us with the snippet so we can lock in a regression test.
My CI uses LiteLLM / OpenRouter / a custom wrapper. Will it catch anything?
Not yet — those wrappers don't go through anthropic.Anthropic() or langchain_* constructors. Adapter checks are planned per framework. Vote on what to support next via the maintainer email below.
Get in touch
Issue tracker is currently private while we stabilize 0.x. To suggest a check, report a false positive, or share a real-world cost-leak you'd like us to detect:
We aim to respond to actionable bug reports within a few business days.
License
MIT. The full license text is bundled inside the installed package
(llmdoctor-<version>.dist-info/licenses/LICENSE).
Built with measurement-honesty over feature-breadth. The self-audit lives in this README — not behind a click-through — because a measurement tool that's wrong is worse than no tool at all.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmdoctor-0.2.0.tar.gz.
File metadata
- Download URL: llmdoctor-0.2.0.tar.gz
- Upload date:
- Size: 37.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae6b347f923cb30617a8b8e82310226a818c409eb8cc48c5121288bfaf1b400f
|
|
| MD5 |
c12aeb9adccdd91aa9e0c807f9f970d0
|
|
| BLAKE2b-256 |
264645abb68c13ee0d46a70162e4b3334a63833893e4a6568f6374308870cdc3
|
File details
Details for the file llmdoctor-0.2.0-py3-none-any.whl.
File metadata
- Download URL: llmdoctor-0.2.0-py3-none-any.whl
- Upload date:
- Size: 28.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2cd4949b1508a279489ba0ab9a8214dbed0f6ac37acd81f0052e598fb30c17f7
|
|
| MD5 |
d7343ef5adbb9cc43616572b98a56f13
|
|
| BLAKE2b-256 |
2dfda9bd6e407c790a6530104666d44adb153d79862358adf6d7108e0ea4787c
|