Skip to main content

Find LLM cost leaks before your bill does. Static analysis for Anthropic and OpenAI client code.

Project description

llmdoctor

Find LLM cost leaks before your bill does.

llmdoctor doctor is a static analyzer for Python code that calls Anthropic or OpenAI. It catches the patterns that quietly burn money in production:

  • Prompt-cache placement bugs that invalidate the cache on every call (the bug claude-mem itself shipped — their issue #1890)
  • Missing max_tokens caps where output tokens cost 3–10× input
  • Premium models (Opus, GPT-5) used for tiny prompts where a cheaper model would produce indistinguishable output
  • Large static system prompts left uncached

It's an advisor, not a runtime patcher. It reads your code, prints findings with rough cost-impact estimates, and exits.

Install

pip install llmdoctor
# or no-install:
pipx run llmdoctor doctor .

Usage

llmdoctor doctor .              # scan current directory
llmdoctor doctor src/agent.py   # scan one file
llmdoctor doctor . --json       # for CI / piping into other tools
llmdoctor doctor . --fail-on HIGH   # exit 1 if any HIGH-severity issue

What it looks like

╭─ llmdoctor doctor ─────────────────────────────────────────────╮
│ Scanned 14 file(s) under src/                                    │
│ Found 3 issue(s)  ·  2 HIGH · 1 MEDIUM                           │
│ Estimated potential savings: ~$340/month  (rough estimate)       │
╰──────────────────────────────────────────────────────────────────╯

╭─ [HIGH] TS001 Dynamic content before cache_control invalidates the cache ─╮
│   file:  src/agent.py:42                                                  │
│   code:  {"type": "text", "text": f"User said: {user_query}"},            │
│   why:   System block at index 0 contains dynamic content but appears     │
│          BEFORE the first block with cache_control. ...                   │
│   fix:   Move static content BEFORE the cache_control marker. Move        │
│          dynamic content into the messages array.                         │
│   estimate: ~$135.00/month  (assuming: 3000-token system prompt, 100      │
│             calls/day, 30-day month, 0.1× cache-read pricing)             │
│   docs:  https://docs.anthropic.com/.../prompt-caching                    │
╰───────────────────────────────────────────────────────────────────────────╯

Checks shipped in 0.1.0

Code Severity What it catches
TS001 HIGH Dynamic content placed before a cache_control marker (silently invalidates the prompt cache).
TS003 MEDIUM Large static system prompt without cache_control (missed cache opportunity).
TS010 HIGH OpenAI call with no max_tokens / max_completion_tokens (output cost unbounded).
TS011 MEDIUM max_tokens set suspiciously high (likely a copy-paste default that enables runaway completions).
TS020 MEDIUM Premium model (Opus, GPT-5, GPT-4-Turbo, GPT-4o) on a tiny static prompt where a cheaper tier would likely match quality.

How cost estimates are calculated

Estimates are heuristic, not invoice predictions. Each issue prints its assumptions (e.g. "100 calls/day, 30-day month, 3000-token system prompt"). Treat the numbers as order-of-magnitude. The tool's value is the finding and the fix; the dollar number is the attention-grabber.

Pricing table is in src/llmdoctor/pricing.py — verified 2026-04-30. Submit a PR if a model is missing or the price moves.

What this tool deliberately does NOT do (yet)

  • It does not patch your code. It reports, you fix.
  • It does not run your code. Static analysis only — safe on closed-source repos.
  • It does not measure live traffic. That's a different product (the SDK, coming next). The doctor is the first wedge.
  • It does not check JavaScript / TypeScript. Python only in 0.1.0.
  • It does not flag retry-storm patterns yet (planned: TS030).
  • It does not detect tool-definition duplication across calls (planned: TS040).

If your codebase doesn't import anthropic or openai directly (e.g. you use LangChain, LiteLLM, or hit the HTTP API), the doctor will produce no findings. Adapter checks for those frameworks are a next step.

Self-audit

Before publishing, we audited the doctor itself for the categories of failure most likely to make a measurement tool lose credibility: checker correctness on edge cases, input safety (BOMs, huge files, binary content, recursion bombs), reporter safety (markup injection), and basic security threat modelling. Five concrete bugs were caught and fixed before 0.1.0; eight intentional false-negatives are documented with rationale.

Full report: AUDIT.md.

Development

git clone https://github.com/Shahriyar-Khan27/llm-doctor
cd llmdoctor
pip install -e ".[dev]"
pytest

License

MIT.

Why we built this

We were scoping a broader LLM-cost optimization SDK and surveyed the landscape: LLMLingua-family compression, GPTCache-style semantic caching, Mem0 / Letta / claude-mem memory frameworks, and Anthropic's prompt caching. One finding kept resurfacing as the single highest-leverage gap: prompt-cache placement bugs are everywhere, mostly invisible, and cost serious money. Even a competent OSS project like claude-mem shipped one to production (their issue #1890). Runtime tools catch this only after weeks of wasted spend; static analysis catches the whole class in seconds.

So before building the bigger SDK, we shipped the diagnostic. That's llmdoctor.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmdoctor-0.1.1.tar.gz (31.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmdoctor-0.1.1-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file llmdoctor-0.1.1.tar.gz.

File metadata

  • Download URL: llmdoctor-0.1.1.tar.gz
  • Upload date:
  • Size: 31.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for llmdoctor-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0427a346ef4ab21d0ab5ae640167720f61e88c70e03d74232e788e07793872a3
MD5 8cc839b4f46cc7932a6c95e7e3d2c85c
BLAKE2b-256 2dec922491dd9563b8d2868667187c2e14858485eb9672d4f31aff0256ffa953

See more details on using hashes here.

File details

Details for the file llmdoctor-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: llmdoctor-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 22.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for llmdoctor-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a218b8eae5f7c52506834fd6dc3555c2d1f72e631a40528a6350eed34bd45926
MD5 52ad43b7cd51ee862c9617e386384399
BLAKE2b-256 1d04108c8c1ca39084d4bad81322a4e945ad4c28ebf06598445bd3ebb2855bcb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page