Skip to main content

Find LLM cost leaks before your bill does. Static analysis for Anthropic and OpenAI client code.

Project description

llmdoctor

Static analysis for LLM cost leaks. Catch the bugs that show up on next month's invoice — before they ship.

PyPI version Python versions License

A single stuck agent session can cost $1,000–$5,000 in tokens. A misplaced cache_control marker turns every cached call into a cache write. Output without max_tokens can blow $50 on one ramble.

llmdoctor finds these patterns in your code in seconds — before they ship.


What it does

llmdoctor reads your Python source and reports the LLM-cost bugs most likely to hurt you in production. It works on:

  • Direct SDKsanthropic.Anthropic() and openai.OpenAI()
  • LangChainChatAnthropic, ChatOpenAI, AgentExecutor

It runs in seconds. No runtime dependency. No telemetry. No agents. No code execution. Just a CLI you point at a path.

pip install llmdoctor
llmdoctor doctor src/

That's the whole UX.


Why this exists

LLM bills in 2026 surprise teams in three ways:

Surprise Typical cause What llmdoctor catches
Cache hit rate stays at 0% Dynamic content placed before cache_control marker silently invalidates the prefix on every call TS001
One ramble blows $50 No max_tokens cap on an OpenAI call or LangChain wrapper TS010, TS101
Single agent session burns $1k–$5k AgentExecutor(max_iterations=None) lets a stuck loop run unbounded TS103

These bugs are easy to write, hard to spot at runtime (the cost shows up days later in billing dashboards), and trivially detectable in source.


Quickstart

# install
pip install llmdoctor

# scan
llmdoctor doctor .                  # current directory
llmdoctor doctor src/agent.py       # single file
llmdoctor doctor . --json           # CI-friendly output
llmdoctor doctor . --fail-on HIGH   # exit 1 in CI on any HIGH finding

Example output

╭─ llmdoctor doctor ─────────────────────────────────────────────────────╮
│ Scanned 14 file(s) under src/                                           │
│ Found 3 issue(s)  ·  2 HIGH · 1 MEDIUM                                  │
│ Estimated potential savings: ~$340/month  (rough estimate)              │
╰─────────────────────────────────────────────────────────────────────────╯

╭─ [HIGH] TS103 AgentExecutor with max_iterations=None — single session
            can cost $1k+ in tokens                                       ─╮
│   file:  src/agent_factory.py:23                                         │
│   code:  agent = AgentExecutor(agent=llm, tools=tools, max_iterations=None) │
│   why:   Setting max_iterations=None removes the loop cap. If the         │
│          agent's stop condition fails to trigger, the agent runs          │
│          forever — racking up $1,000–$5,000 in tokens per session in      │
│          documented 2026 incidents.                                       │
│   fix:   Set max_iterations=15 (the LangChain default) or higher if       │
│          your agent genuinely needs more depth. Always pair with          │
│          max_execution_time=... for a wall-clock cap.                     │
╰──────────────────────────────────────────────────────────────────────────╯

Every finding includes: the exact line, the why, the concrete fix, and a rough cost estimate with the assumptions printed inline. We never quote a dollar number without showing how we got it.


Check catalog (v0.2.0)

Code Severity Surface Catches
TS001 HIGH Anthropic SDK Dynamic content placed before cache_control — silently invalidates the prompt cache on every call
TS003 MEDIUM Anthropic SDK Long static system prompt with no cache_control — missed cache opportunity
TS010 HIGH OpenAI SDK chat.completions.create() without max_tokens — output cost unbounded
TS011 MEDIUM OpenAI / Anthropic SDK max_tokens > 8000 — likely a copy-paste default
TS020 MEDIUM OpenAI / Anthropic SDK Premium model (Opus, GPT-5, GPT-4-Turbo) on a tiny prompt where a cheaper tier would match quality
TS101 HIGH LangChain ChatOpenAI() instantiated without max_tokens — every downstream .invoke() inherits unbounded output
TS102 MEDIUM LangChain ChatOpenAI / ChatAnthropic with max_tokens > 8000
TS103 HIGH LangChain AgentExecutor(max_iterations=None) — explicitly unbounded agent loop
TS104 MEDIUM LangChain AgentExecutor(max_iterations > 50) — the "I bumped the cap as a workaround" anti-pattern

How llmdoctor compares

We're not competing with runtime observability tools — we're shift-left. Run llmdoctor in CI; run an observability tool in prod.

llmdoctor Helicone, Langfuse, OpenLLMetry Mem0, Letta LLMLingua
Where static, in CI runtime proxy / SDK runtime, in agent loop runtime, prompt rewrite
When it fires before deploy after each call per session per call
Catches bugs in source metrics, traces, costs memory drift token bloat
Network none required required required
Adds latency? no — only runs in CI yes (~10–50ms) yes yes
Best for shift-left cost gates observability dashboards persistent agent memory aggressive token compression

Use llmdoctor and your observability tool. They cover different failure modes.


Cost estimates: how to read them

Estimates are heuristic, not invoice predictions. Each finding prints its assumptions inline:

estimate: ~$135.00/month  (assuming: 3000-token system prompt,
                          100 calls/day, 30-day month, 0.1× cache-read pricing)

Treat the dollar number as order of magnitude. The value llmdoctor delivers is the finding and the fix; the estimate is there to make it actionable. If your traffic is 10× ours, multiply. If it's 1/10, divide. The pricing table is pinned inside the installed package at llmdoctor/pricing.py and was verified against provider pages on 2026-04-30.


Self-audit

We built llmdoctor knowing a measurement tool with a bug is worse than no tool. Before publishing we ran a four-pass audit:

  • Correctness — 11 weird AST shapes (async, **kwargs, walrus, multi-target, augmented assigns, mocked clients, deeply nested calls). Zero false positives.
  • Input safety — fixed five concrete bugs before shipping: UTF-8 BOM crash, OOM on 6 MB generated .py, ValueError from ast.parse on binary content, RecursionError on minified code, plus rich-markup injection through filenames.
  • Security — no eval, no exec, no network calls, no telemetry. socket/requests/httpx/urllib not imported anywhere. Verified.
  • Honest false negatives — limitations like bound-method assignment, multi-target assigns, mock clients, LiteLLM/OpenRouter/raw-HTTP, and TypeScript codebases are documented in source so the tool never silently lies about coverage.

30 tests, all passing in CI. The audit and the test suite are run on every release; results are summarised in this README so a reader on PyPI can verify what was checked without needing to access source.


What we don't do (yet)

  • ✅ Catch direct SDK + LangChain config bugs in Python source
  • ❌ Patch your code automatically (we report; you fix)
  • ❌ Run your code (static analysis only — safe on closed-source)
  • ❌ Measure live traffic (that's the runtime sidecar, on the roadmap)
  • ❌ TypeScript / JavaScript (Python only today)
  • ❌ LiteLLM, OpenRouter, raw HTTP, or arbitrary wrapper functions
  • ❌ Phone home, ship telemetry, or collect usage data — ever

If your codebase doesn't import anthropic, openai, langchain_anthropic, or langchain_openai directly, llmdoctor will produce zero findings. That's a feature, not a bug.


Roadmap

Version Theme What
0.1.0 direct SDK TS001/003/010/011/020 — Anthropic + OpenAI direct calls
0.2.0 LangChain TS101/102/103/104 — ChatOpenAI, ChatAnthropic, AgentExecutor
0.3.0 LlamaIndex Anthropic, OpenAI, ReActAgent from llama_index.*
0.4.0 retry storms + tool dup TS030 (retry without budget), TS040 (tool-definition repetition)
0.5.0 runtime sidecar optional Python wrapper that reads cache_read_input_tokens from live responses, surfacing cache drift before billing does
1.0 TypeScript @anthropic-ai/sdk + openai (Node) — the same checks for the JS ecosystem

Why we built it

LLM bills in 2026 are increasingly metered, and the bugs that drive them aren't visible until the bill arrives. Engineers writing prompt caches, configuring LangChain agents, or tweaking max_tokens make small mistakes that quietly translate into 10×-or-more cost penalties — the kind that don't show up until the next finance review.

Existing tools catch this after the call. We catch it before the deploy. Static analysis is the right place for this because the bugs have specific syntactic shapes — exactly what AST tooling is built for.


FAQ

Does it run my code? No. We use ast.parse, never eval/exec/compile(... 'exec'). Safe to run on closed-source repositories.

Does it phone home? No. Zero network calls anywhere in the codebase. No telemetry, no usage stats, no opt-in beacon. This is a hard requirement for OSS releases — we'd consider it a breaking change to add.

Will it false-positive on my mocked tests? Probably not — we exercise mock-client patterns in our test suite. If you hit one, suppress the line with # llmdoctor: ignore TS001 (or ignore ALL) and email us with the snippet so we can lock in a regression test.

My CI uses LiteLLM / OpenRouter / a custom wrapper. Will it catch anything? Not yet — those wrappers don't go through anthropic.Anthropic() or langchain_* constructors. Adapter checks are planned per framework. Vote on what to support next via the maintainer email below.


Get in touch

Issue tracker is currently private while we stabilize 0.x. To suggest a check, report a false positive, or share a real-world cost-leak you'd like us to detect:

📧 issues.llmdoctor@gmail.com

We aim to respond to actionable bug reports within a few business days.


License

MIT. The full license text is bundled inside the installed package (llmdoctor-<version>.dist-info/licenses/LICENSE).

Built with measurement-honesty over feature-breadth. The self-audit lives in this README — not behind a click-through — because a measurement tool that's wrong is worse than no tool at all.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmdoctor-0.2.0.tar.gz (37.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmdoctor-0.2.0-py3-none-any.whl (28.0 kB view details)

Uploaded Python 3

File details

Details for the file llmdoctor-0.2.0.tar.gz.

File metadata

  • Download URL: llmdoctor-0.2.0.tar.gz
  • Upload date:
  • Size: 37.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for llmdoctor-0.2.0.tar.gz
Algorithm Hash digest
SHA256 ae6b347f923cb30617a8b8e82310226a818c409eb8cc48c5121288bfaf1b400f
MD5 c12aeb9adccdd91aa9e0c807f9f970d0
BLAKE2b-256 264645abb68c13ee0d46a70162e4b3334a63833893e4a6568f6374308870cdc3

See more details on using hashes here.

File details

Details for the file llmdoctor-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: llmdoctor-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 28.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for llmdoctor-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2cd4949b1508a279489ba0ab9a8214dbed0f6ac37acd81f0052e598fb30c17f7
MD5 d7343ef5adbb9cc43616572b98a56f13
BLAKE2b-256 2dfda9bd6e407c790a6530104666d44adb153d79862358adf6d7108e0ea4787c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page