Find LLM cost leaks before your bill does. Static analysis for Anthropic and OpenAI client code.

These details have not been verified by PyPI

Project description

llmdoctor

Static analysis for LLM cost leaks. Catch the bugs that show up on next month's invoice — before they ship.

A single stuck agent session can cost $1,000–$5,000 in tokens. A misplaced cache_control marker turns every cached call into a cache write. Output without max_tokens can blow $50 on one ramble.

llmdoctor finds these patterns in your code in seconds — before they ship.

What it does

llmdoctor reads your Python source and reports the LLM-cost bugs most likely to hurt you in production. It works on:

Direct SDKs — anthropic.Anthropic() and openai.OpenAI()
LangChain — ChatAnthropic, ChatOpenAI, AgentExecutor

It runs in seconds. No runtime dependency. No telemetry. No agents. No code execution. Just a CLI you point at a path.

pip install llmdoctor
llmdoctor doctor src/

That's the whole UX.

Why this exists

LLM bills in 2026 surprise teams in three ways:

Surprise	Typical cause	What llmdoctor catches
Cache hit rate stays at 0%	Dynamic content placed before `cache_control` marker silently invalidates the prefix on every call	TS001
One ramble blows $50	No `max_tokens` cap on an OpenAI call or LangChain wrapper	TS010, TS101
Single agent session burns $1k–$5k	`AgentExecutor(max_iterations=None)` lets a stuck loop run unbounded	TS103

These bugs are easy to write, hard to spot at runtime (the cost shows up days later in billing dashboards), and trivially detectable in source.

Quickstart

# install
pip install llmdoctor

# scan
llmdoctor doctor .                  # current directory
llmdoctor doctor src/agent.py       # single file
llmdoctor doctor . --json           # CI-friendly output
llmdoctor doctor . --fail-on HIGH   # exit 1 in CI on any HIGH finding

Example output

╭─ llmdoctor doctor ─────────────────────────────────────────────────────╮
│ Scanned 14 file(s) under src/                                           │
│ Found 3 issue(s)  ·  2 HIGH · 1 MEDIUM                                  │
│ Estimated potential savings: ~$340/month  (rough estimate)              │
╰─────────────────────────────────────────────────────────────────────────╯

╭─ [HIGH] TS103 AgentExecutor with max_iterations=None — single session
            can cost $1k+ in tokens                                       ─╮
│   file:  src/agent_factory.py:23                                         │
│   code:  agent = AgentExecutor(agent=llm, tools=tools, max_iterations=None) │
│   why:   Setting max_iterations=None removes the loop cap. If the         │
│          agent's stop condition fails to trigger, the agent runs          │
│          forever — racking up $1,000–$5,000 in tokens per session in      │
│          documented 2026 incidents.                                       │
│   fix:   Set max_iterations=15 (the LangChain default) or higher if       │
│          your agent genuinely needs more depth. Always pair with          │
│          max_execution_time=... for a wall-clock cap.                     │
╰──────────────────────────────────────────────────────────────────────────╯

Every finding includes: the exact line, the why, the concrete fix, and a rough cost estimate with the assumptions printed inline. We never quote a dollar number without showing how we got it.

Check catalog (v0.2.0)

Code	Severity	Surface	Catches
TS001	HIGH	Anthropic SDK	Dynamic content placed before `cache_control` — silently invalidates the prompt cache on every call
TS003	MEDIUM	Anthropic SDK	Long static system prompt with no `cache_control` — missed cache opportunity
TS010	HIGH	OpenAI SDK	`chat.completions.create()` without `max_tokens` — output cost unbounded
TS011	MEDIUM	OpenAI / Anthropic SDK	`max_tokens > 8000` — likely a copy-paste default
TS020	MEDIUM	OpenAI / Anthropic SDK	Premium model (Opus, GPT-5, GPT-4-Turbo) on a tiny prompt where a cheaper tier would match quality
TS101	HIGH	LangChain	`ChatOpenAI()` instantiated without `max_tokens` — every downstream `.invoke()` inherits unbounded output
TS102	MEDIUM	LangChain	`ChatOpenAI` / `ChatAnthropic` with `max_tokens > 8000`
TS103	HIGH	LangChain	`AgentExecutor(max_iterations=None)` — explicitly unbounded agent loop
TS104	MEDIUM	LangChain	`AgentExecutor(max_iterations > 50)` — the "I bumped the cap as a workaround" anti-pattern

How llmdoctor compares

We're not competing with runtime observability tools — we're shift-left. Run llmdoctor in CI; run an observability tool in prod.

	llmdoctor	Helicone, Langfuse, OpenLLMetry	Mem0, Letta	LLMLingua
Where	static, in CI	runtime proxy / SDK	runtime, in agent loop	runtime, prompt rewrite
When it fires	before deploy	after each call	per session	per call
Catches	bugs in source	metrics, traces, costs	memory drift	token bloat
Network	none	required	required	required
Adds latency?	no — only runs in CI	yes (~10–50ms)	yes	yes
Best for	shift-left cost gates	observability dashboards	persistent agent memory	aggressive token compression

Use llmdoctor and your observability tool. They cover different failure modes.

Cost estimates: how to read them

Estimates are heuristic, not invoice predictions. Each finding prints its assumptions inline:

estimate: ~$135.00/month  (assuming: 3000-token system prompt,
                          100 calls/day, 30-day month, 0.1× cache-read pricing)

Treat the dollar number as order of magnitude. The value llmdoctor delivers is the finding and the fix; the estimate is there to make it actionable. If your traffic is 10× ours, multiply. If it's 1/10, divide. The pricing table is pinned inside the installed package at llmdoctor/pricing.py and was verified against provider pages on 2026-04-30.

Self-audit

We built llmdoctor knowing a measurement tool with a bug is worse than no tool. Before publishing we ran a four-pass audit:

Correctness — 11 weird AST shapes (async, **kwargs, walrus, multi-target, augmented assigns, mocked clients, deeply nested calls). Zero false positives.
Input safety — fixed five concrete bugs before shipping: UTF-8 BOM crash, OOM on 6 MB generated .py, ValueError from ast.parse on binary content, RecursionError on minified code, plus rich-markup injection through filenames.
Security — no eval, no exec, no network calls, no telemetry. socket/requests/httpx/urllib not imported anywhere. Verified.
Honest false negatives — limitations like bound-method assignment, multi-target assigns, mock clients, LiteLLM/OpenRouter/raw-HTTP, and TypeScript codebases are documented in source so the tool never silently lies about coverage.

30 tests, all passing in CI. The audit and the test suite are run on every release; results are summarised in this README so a reader on PyPI can verify what was checked without needing to access source.

What we don't do (yet)

✅ Catch direct SDK + LangChain config bugs in Python source
❌ Patch your code automatically (we report; you fix)
❌ Run your code (static analysis only — safe on closed-source)
❌ Measure live traffic (that's the runtime sidecar, on the roadmap)
❌ TypeScript / JavaScript (Python only today)
❌ LiteLLM, OpenRouter, raw HTTP, or arbitrary wrapper functions
❌ Phone home, ship telemetry, or collect usage data — ever

If your codebase doesn't import anthropic, openai, langchain_anthropic, or langchain_openai directly, llmdoctor will produce zero findings. That's a feature, not a bug.

Roadmap

Version	Theme	What
✅ 0.1.0	direct SDK	TS001/003/010/011/020 — Anthropic + OpenAI direct calls
✅ 0.2.0	LangChain	TS101/102/103/104 — `ChatOpenAI`, `ChatAnthropic`, `AgentExecutor`
0.3.0	LlamaIndex	`Anthropic`, `OpenAI`, `ReActAgent` from `llama_index.*`
0.4.0	retry storms + tool dup	TS030 (retry without budget), TS040 (tool-definition repetition)
0.5.0	runtime sidecar	optional Python wrapper that reads `cache_read_input_tokens` from live responses, surfacing cache drift before billing does
1.0	TypeScript	`@anthropic-ai/sdk` + `openai` (Node) — the same checks for the JS ecosystem

Why we built it

LLM bills in 2026 are increasingly metered, and the bugs that drive them aren't visible until the bill arrives. Engineers writing prompt caches, configuring LangChain agents, or tweaking max_tokens make small mistakes that quietly translate into 10×-or-more cost penalties — the kind that don't show up until the next finance review.

Existing tools catch this after the call. We catch it before the deploy. Static analysis is the right place for this because the bugs have specific syntactic shapes — exactly what AST tooling is built for.

FAQ

Does it run my code? No. We use ast.parse, never eval/exec/compile(... 'exec'). Safe to run on closed-source repositories.

Does it phone home? No. Zero network calls anywhere in the codebase. No telemetry, no usage stats, no opt-in beacon. This is a hard requirement for OSS releases — we'd consider it a breaking change to add.

Will it false-positive on my mocked tests? Probably not — we exercise mock-client patterns in our test suite. If you hit one, suppress the line with # llmdoctor: ignore TS001 (or ignore ALL) and email us with the snippet so we can lock in a regression test.

My CI uses LiteLLM / OpenRouter / a custom wrapper. Will it catch anything? Not yet — those wrappers don't go through anthropic.Anthropic() or langchain_* constructors. Adapter checks are planned per framework. Vote on what to support next via the maintainer email below.

Get in touch

Issue tracker is currently private while we stabilize 0.x. To suggest a check, report a false positive, or share a real-world cost-leak you'd like us to detect:

📧 issues.llmdoctor@gmail.com

We aim to respond to actionable bug reports within a few business days.

License

MIT. The full license text is bundled inside the installed package (llmdoctor-<version>.dist-info/licenses/LICENSE).

Built with measurement-honesty over feature-breadth. The self-audit lives in this README — not behind a click-through — because a measurement tool that's wrong is worse than no tool at all.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.5

Apr 30, 2026

0.2.4

Apr 30, 2026

0.2.2

Apr 30, 2026

0.2.1

Apr 30, 2026

This version

0.2.0

Apr 30, 2026

0.1.1

Apr 30, 2026

0.1.0

Apr 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmdoctor-0.2.0.tar.gz (37.8 kB view details)

Uploaded Apr 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmdoctor-0.2.0-py3-none-any.whl (28.0 kB view details)

Uploaded Apr 30, 2026 Python 3

File details

Details for the file llmdoctor-0.2.0.tar.gz.

File metadata

Download URL: llmdoctor-0.2.0.tar.gz
Upload date: Apr 30, 2026
Size: 37.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for llmdoctor-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`ae6b347f923cb30617a8b8e82310226a818c409eb8cc48c5121288bfaf1b400f`
MD5	`c12aeb9adccdd91aa9e0c807f9f970d0`
BLAKE2b-256	`264645abb68c13ee0d46a70162e4b3334a63833893e4a6568f6374308870cdc3`

See more details on using hashes here.

File details

Details for the file llmdoctor-0.2.0-py3-none-any.whl.

File metadata

Download URL: llmdoctor-0.2.0-py3-none-any.whl
Upload date: Apr 30, 2026
Size: 28.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for llmdoctor-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2cd4949b1508a279489ba0ab9a8214dbed0f6ac37acd81f0052e598fb30c17f7`
MD5	`d7343ef5adbb9cc43616572b98a56f13`
BLAKE2b-256	`2dfda9bd6e407c790a6530104666d44adb153d79862358adf6d7108e0ea4787c`

See more details on using hashes here.

llmdoctor 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

llmdoctor

What it does

Why this exists

Quickstart

Example output

Check catalog (v0.2.0)

How llmdoctor compares

Cost estimates: how to read them

Self-audit

What we don't do (yet)

Roadmap

Why we built it

FAQ

Get in touch

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes