Find LLM cost leaks before your bill does. Static analysis for Anthropic and OpenAI client code.

These details have not been verified by PyPI

Project links

Project description

llmdoctor

Find LLM cost leaks before your bill does.

llmdoctor doctor is a static analyzer for Python code that calls Anthropic or OpenAI. It catches the patterns that quietly burn money in production:

Prompt-cache placement bugs that invalidate the cache on every call (the bug claude-mem itself shipped — their issue #1890)
Missing max_tokens caps where output tokens cost 3–10× input
Premium models (Opus, GPT-5) used for tiny prompts where a cheaper model would produce indistinguishable output
Large static system prompts left uncached

It's an advisor, not a runtime patcher. It reads your code, prints findings with rough cost-impact estimates, and exits.

Install

pip install llmdoctor
# or no-install:
pipx run llmdoctor doctor .

Usage

llmdoctor doctor .              # scan current directory
llmdoctor doctor src/agent.py   # scan one file
llmdoctor doctor . --json       # for CI / piping into other tools
llmdoctor doctor . --fail-on HIGH   # exit 1 if any HIGH-severity issue

What it looks like

╭─ llmdoctor doctor ─────────────────────────────────────────────╮
│ Scanned 14 file(s) under src/                                    │
│ Found 3 issue(s)  ·  2 HIGH · 1 MEDIUM                           │
│ Estimated potential savings: ~$340/month  (rough estimate)       │
╰──────────────────────────────────────────────────────────────────╯

╭─ [HIGH] TS001 Dynamic content before cache_control invalidates the cache ─╮
│   file:  src/agent.py:42                                                  │
│   code:  {"type": "text", "text": f"User said: {user_query}"},            │
│   why:   System block at index 0 contains dynamic content but appears     │
│          BEFORE the first block with cache_control. ...                   │
│   fix:   Move static content BEFORE the cache_control marker. Move        │
│          dynamic content into the messages array.                         │
│   estimate: ~$135.00/month  (assuming: 3000-token system prompt, 100      │
│             calls/day, 30-day month, 0.1× cache-read pricing)             │
│   docs:  https://docs.anthropic.com/.../prompt-caching                    │
╰───────────────────────────────────────────────────────────────────────────╯

Checks shipped in 0.1.0

Code	Severity	What it catches
TS001	HIGH	Dynamic content placed before a `cache_control` marker (silently invalidates the prompt cache).
TS003	MEDIUM	Large static system prompt without `cache_control` (missed cache opportunity).
TS010	HIGH	OpenAI call with no `max_tokens` / `max_completion_tokens` (output cost unbounded).
TS011	MEDIUM	`max_tokens` set suspiciously high (likely a copy-paste default that enables runaway completions).
TS020	MEDIUM	Premium model (Opus, GPT-5, GPT-4-Turbo, GPT-4o) on a tiny static prompt where a cheaper tier would likely match quality.

How cost estimates are calculated

Estimates are heuristic, not invoice predictions. Each issue prints its assumptions (e.g. "100 calls/day, 30-day month, 3000-token system prompt"). Treat the numbers as order-of-magnitude. The tool's value is the finding and the fix; the dollar number is the attention-grabber.

Pricing table is in src/llmdoctor/pricing.py — verified 2026-04-30. Submit a PR if a model is missing or the price moves.

What this tool deliberately does NOT do (yet)

It does not patch your code. It reports, you fix.
It does not run your code. Static analysis only — safe on closed-source repos.
It does not measure live traffic. That's a different product (the SDK, coming next). The doctor is the first wedge.
It does not check JavaScript / TypeScript. Python only in 0.1.0.
It does not flag retry-storm patterns yet (planned: TS030).
It does not detect tool-definition duplication across calls (planned: TS040).

If your codebase doesn't import anthropic or openai directly (e.g. you use LangChain, LiteLLM, or hit the HTTP API), the doctor will produce no findings. Adapter checks for those frameworks are a next step.

Self-audit

Before publishing, we audited the doctor itself for the categories of failure most likely to make a measurement tool lose credibility: checker correctness on edge cases, input safety (BOMs, huge files, binary content, recursion bombs), reporter safety (markup injection), and basic security threat modelling. Five concrete bugs were caught and fixed before 0.1.0; eight intentional false-negatives are documented with rationale.

Full report: AUDIT.md.

Development

git clone https://github.com/Shahriyar-Khan27/llm-doctor
cd llmdoctor
pip install -e ".[dev]"
pytest

License

MIT.

Why we built this

We were scoping a broader LLM-cost optimization SDK and surveyed the landscape: LLMLingua-family compression, GPTCache-style semantic caching, Mem0 / Letta / claude-mem memory frameworks, and Anthropic's prompt caching. One finding kept resurfacing as the single highest-leverage gap: prompt-cache placement bugs are everywhere, mostly invisible, and cost serious money. Even a competent OSS project like claude-mem shipped one to production (their issue #1890). Runtime tools catch this only after weeks of wasted spend; static analysis catches the whole class in seconds.

So before building the bigger SDK, we shipped the diagnostic. That's llmdoctor.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.5

Apr 30, 2026

0.2.4

Apr 30, 2026

0.2.2

Apr 30, 2026

0.2.1

Apr 30, 2026

0.2.0

Apr 30, 2026

This version

0.1.1

Apr 30, 2026

0.1.0

Apr 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmdoctor-0.1.1.tar.gz (31.3 kB view details)

Uploaded Apr 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmdoctor-0.1.1-py3-none-any.whl (22.0 kB view details)

Uploaded Apr 30, 2026 Python 3

File details

Details for the file llmdoctor-0.1.1.tar.gz.

File metadata

Download URL: llmdoctor-0.1.1.tar.gz
Upload date: Apr 30, 2026
Size: 31.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for llmdoctor-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`0427a346ef4ab21d0ab5ae640167720f61e88c70e03d74232e788e07793872a3`
MD5	`8cc839b4f46cc7932a6c95e7e3d2c85c`
BLAKE2b-256	`2dec922491dd9563b8d2868667187c2e14858485eb9672d4f31aff0256ffa953`

See more details on using hashes here.

File details

Details for the file llmdoctor-0.1.1-py3-none-any.whl.

File metadata

Download URL: llmdoctor-0.1.1-py3-none-any.whl
Upload date: Apr 30, 2026
Size: 22.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for llmdoctor-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a218b8eae5f7c52506834fd6dc3555c2d1f72e631a40528a6350eed34bd45926`
MD5	`52ad43b7cd51ee862c9617e386384399`
BLAKE2b-256	`1d04108c8c1ca39084d4bad81322a4e945ad4c28ebf06598445bd3ebb2855bcb`

See more details on using hashes here.

llmdoctor 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

llmdoctor

Install

Usage

What it looks like

Checks shipped in 0.1.0

How cost estimates are calculated

What this tool deliberately does NOT do (yet)

Self-audit

Development

License

Why we built this

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes