Find LLM cost leaks before your bill does. Static analysis for Anthropic and OpenAI client code.
Project description
llmdoctor
Find LLM cost leaks before your bill does.
llmdoctor doctor is a static analyzer for Python code that calls Anthropic
or OpenAI. It catches the patterns that quietly burn money in production:
- Prompt-cache placement bugs that invalidate the cache on every call (the bug claude-mem itself shipped — their issue #1890)
- Missing
max_tokenscaps where output tokens cost 3–10× input - Premium models (Opus, GPT-5) used for tiny prompts where a cheaper model would produce indistinguishable output
- Large static system prompts left uncached
It's an advisor, not a runtime patcher. It reads your code, prints findings with rough cost-impact estimates, and exits.
Install
pip install llmdoctor
# or no-install:
pipx run llmdoctor doctor .
Usage
llmdoctor doctor . # scan current directory
llmdoctor doctor src/agent.py # scan one file
llmdoctor doctor . --json # for CI / piping into other tools
llmdoctor doctor . --fail-on HIGH # exit 1 if any HIGH-severity issue
What it looks like
╭─ llmdoctor doctor ─────────────────────────────────────────────╮
│ Scanned 14 file(s) under src/ │
│ Found 3 issue(s) · 2 HIGH · 1 MEDIUM │
│ Estimated potential savings: ~$340/month (rough estimate) │
╰──────────────────────────────────────────────────────────────────╯
╭─ [HIGH] TS001 Dynamic content before cache_control invalidates the cache ─╮
│ file: src/agent.py:42 │
│ code: {"type": "text", "text": f"User said: {user_query}"}, │
│ why: System block at index 0 contains dynamic content but appears │
│ BEFORE the first block with cache_control. ... │
│ fix: Move static content BEFORE the cache_control marker. Move │
│ dynamic content into the messages array. │
│ estimate: ~$135.00/month (assuming: 3000-token system prompt, 100 │
│ calls/day, 30-day month, 0.1× cache-read pricing) │
│ docs: https://docs.anthropic.com/.../prompt-caching │
╰───────────────────────────────────────────────────────────────────────────╯
Checks shipped in 0.1.0
| Code | Severity | What it catches |
|---|---|---|
| TS001 | HIGH | Dynamic content placed before a cache_control marker (silently invalidates the prompt cache). |
| TS003 | MEDIUM | Large static system prompt without cache_control (missed cache opportunity). |
| TS010 | HIGH | OpenAI call with no max_tokens / max_completion_tokens (output cost unbounded). |
| TS011 | MEDIUM | max_tokens set suspiciously high (likely a copy-paste default that enables runaway completions). |
| TS020 | MEDIUM | Premium model (Opus, GPT-5, GPT-4-Turbo, GPT-4o) on a tiny static prompt where a cheaper tier would likely match quality. |
How cost estimates are calculated
Estimates are heuristic, not invoice predictions. Each issue prints its assumptions (e.g. "100 calls/day, 30-day month, 3000-token system prompt"). Treat the numbers as order-of-magnitude. The tool's value is the finding and the fix; the dollar number is the attention-grabber.
Pricing table is in src/llmdoctor/pricing.py — verified 2026-04-30. Submit
a PR if a model is missing or the price moves.
What this tool deliberately does NOT do (yet)
- It does not patch your code. It reports, you fix.
- It does not run your code. Static analysis only — safe on closed-source repos.
- It does not measure live traffic. That's a different product (the SDK, coming next). The doctor is the first wedge.
- It does not check JavaScript / TypeScript. Python only in 0.1.0.
- It does not flag retry-storm patterns yet (planned: TS030).
- It does not detect tool-definition duplication across calls (planned: TS040).
If your codebase doesn't import anthropic or openai directly (e.g. you
use LangChain, LiteLLM, or hit the HTTP API), the doctor will produce no
findings. Adapter checks for those frameworks are a next step.
Self-audit
Before publishing, we audited the doctor itself for the categories of failure most likely to make a measurement tool lose credibility: checker correctness on edge cases, input safety (BOMs, huge files, binary content, recursion bombs), reporter safety (markup injection), and basic security threat modelling. Five concrete bugs were caught and fixed before 0.1.0; eight intentional false-negatives are documented with rationale.
Full report: AUDIT.md.
Development
git clone https://github.com/Shahriyar-Khan27/llm-doctor
cd llmdoctor
pip install -e ".[dev]"
pytest
License
MIT.
Why we built this
We were scoping a broader LLM-cost optimization SDK and surveyed the landscape: LLMLingua-family compression, GPTCache-style semantic caching, Mem0 / Letta / claude-mem memory frameworks, and Anthropic's prompt caching. One finding kept resurfacing as the single highest-leverage gap: prompt-cache placement bugs are everywhere, mostly invisible, and cost serious money. Even a competent OSS project like claude-mem shipped one to production (their issue #1890). Runtime tools catch this only after weeks of wasted spend; static analysis catches the whole class in seconds.
So before building the bigger SDK, we shipped the diagnostic. That's
llmdoctor.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmdoctor-0.1.1.tar.gz.
File metadata
- Download URL: llmdoctor-0.1.1.tar.gz
- Upload date:
- Size: 31.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0427a346ef4ab21d0ab5ae640167720f61e88c70e03d74232e788e07793872a3
|
|
| MD5 |
8cc839b4f46cc7932a6c95e7e3d2c85c
|
|
| BLAKE2b-256 |
2dec922491dd9563b8d2868667187c2e14858485eb9672d4f31aff0256ffa953
|
File details
Details for the file llmdoctor-0.1.1-py3-none-any.whl.
File metadata
- Download URL: llmdoctor-0.1.1-py3-none-any.whl
- Upload date:
- Size: 22.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a218b8eae5f7c52506834fd6dc3555c2d1f72e631a40528a6350eed34bd45926
|
|
| MD5 |
52ad43b7cd51ee862c9617e386384399
|
|
| BLAKE2b-256 |
1d04108c8c1ca39084d4bad81322a4e945ad4c28ebf06598445bd3ebb2855bcb
|