Skip to main content

Find LLM cost leaks before your bill does. Static analysis for Anthropic and OpenAI client code.

Project description

llmdoctor

A static analyzer for Python codebases that detects LLM cost-leak patterns before deployment.

PyPI version Python versions License


Overview

llmdoctor reads Python source code and reports configuration patterns that have been observed to cause disproportionate token consumption in production LLM deployments. Each finding includes the affected source location, an explanation of the cost mechanism, a recommended remediation, and a heuristic monthly cost estimate based on a stated traffic profile.

The tool supports two integration surfaces:

  • The official Anthropic and OpenAI Python SDKs (anthropic.Anthropic, openai.OpenAI).
  • The LangChain framework (langchain_anthropic.ChatAnthropic, langchain_openai.ChatOpenAI, langchain.agents.AgentExecutor).

llmdoctor performs no code execution, issues no network requests, and emits no telemetry. It is intended for use in code review and continuous integration pipelines.


Installation

python -m pip install llmdoctor

Requires Python 3.9 or later.

On Windows machines with multiple Python installs, use the launcher to pick a specific interpreter:

py -m pip install llmdoctor          # latest installed Python
py -3.11 -m pip install llmdoctor    # Python 3.11 explicitly

The python -m pip form (and the py -m pip form on Windows) is preferred over a bare pip install because it guarantees the package goes to the same Python you intend to run the tool with — and surfaces errors visibly if the interpreter itself is broken, instead of failing silently.


Usage

llmdoctor doctor .                  # scan the current directory
llmdoctor doctor src/agent.py       # scan a single file
llmdoctor doctor . --json           # emit JSON for downstream tooling
llmdoctor doctor . --fail-on HIGH   # exit non-zero if any HIGH finding

The --fail-on flag is intended for CI integration. The accepted values are HIGH, MEDIUM, LOW, and INFO. The exit code is 1 if any finding at or above the specified severity is present, and 0 otherwise.


Example output

╭── llmdoctor doctor ───────────────────────────────────────────────────╮
│ Scanned 14 file(s) under src/                                         │
│ Found 3 issue(s)  ·  2 HIGH · 1 MEDIUM                                │
│ Estimated potential savings: ~$340/month  (heuristic)                 │
╰───────────────────────────────────────────────────────────────────────╯

╭── [HIGH] TS103  AgentExecutor with max_iterations=None ───────────────╮
│   file:  src/agent_factory.py:23                                      │
│   code:  agent = AgentExecutor(agent=llm, tools=tools,                │
│                                max_iterations=None)                   │
│   why:   max_iterations=None disables the loop cap. If the agent's    │
│          stop condition fails to trigger, the loop runs unbounded.    │
│          Reported per-session cost in 2026 incidents: $1,000-$5,000.  │
│   fix:   Set max_iterations to a finite value (LangChain default is   │
│          15). Pair with max_execution_time for a wall-clock cap.      │
╰───────────────────────────────────────────────────────────────────────╯

Each finding includes the source location, the cost mechanism, a remediation, and a cost estimate with the assumptions printed inline. The tool does not state cost figures without disclosing the assumptions used to derive them.


Check reference

Each check is identified by a stable TSnnn code (TS = "token-saving"). The numbering is structural rather than chronological: the 0xx-series fires on direct Anthropic / OpenAI SDK calls, and the 1xx-series fires on LangChain wrappers. Pairs like TS010 ↔ TS101 are the same shape of bug detected at the two surfaces.

The checks cluster into four failure modes: prompt-cache placement (TS001 / TS003), output caps (TS010 / TS011 / TS101 / TS102), model overspecification (TS020), and agent loop runaway (TS103 / TS104).

Code Severity Surface Description
TS001 HIGH Anthropic SDK Dynamic content placed before a cache_control marker; invalidates the prompt cache on every call.
TS003 MEDIUM Anthropic SDK Long static system prompt with no cache_control marker; pays full input cost on every call when 0.1× cache-read pricing was available.
TS010 HIGH OpenAI SDK chat.completions.create() without max_tokens / max_completion_tokens; output cost unbounded on a single ramble.
TS011 MEDIUM OpenAI / Anthropic SDK max_tokens above 8000; suspected copy-paste default, rarely matched by actual response length.
TS020 MEDIUM OpenAI / Anthropic SDK Premium-tier model (Opus, GPT-5, GPT-4-Turbo, GPT-4o) on a call whose static prompt is short enough that a cheaper tier (Haiku, GPT-4o-mini) is likely to match quality at a fraction of the cost.
TS101 HIGH LangChain ChatOpenAI() instantiated without max_tokens. All downstream .invoke() calls inherit unbounded output.
TS102 MEDIUM LangChain ChatOpenAI / ChatAnthropic with max_tokens above 8000; same copy-paste-default risk as TS011, inherited by every downstream .invoke().
TS103 HIGH LangChain AgentExecutor instantiated with max_iterations=None; a single stuck session has cost $1,000–$5,000 in tokens in documented 2026 incidents.
TS104 MEDIUM LangChain AgentExecutor with max_iterations above 50 — the "bumped the cap as a workaround" anti-pattern. At ~$0.10/turn, 200 iterations is ~$20 per stuck session.

Suppression

To disable a specific check on a given line, append a comment in the following form:

client.chat.completions.create(...)  # llmdoctor: ignore TS010

To disable all checks on a given line, use # llmdoctor: ignore ALL.

The suppression scope is per-line. Multiple codes may be specified in a single comment, separated by commas.


Comparison with adjacent tooling

llmdoctor operates statically and is complementary to runtime tools. The intended usage pattern is to run llmdoctor in continuous integration and run an observability tool in production.

Tool category When it runs Catches Network
llmdoctor static (CI) cost-leak patterns in source None
Helicone, Langfuse, OpenLLMetry runtime proxy / SDK metrics, traces, costs Required
Mem0, Letta runtime, agent loop memory drift Required
LLMLingua runtime, prompt rewrite token bloat Required

Cost estimate methodology

Cost estimates are heuristic and intended as order-of-magnitude indicators.

Formula. Each estimate is computed as:

monthly_usd  =  tokens_per_call × calls_per_day × 30 × $/Mtok ÷ 1,000,000

The savings figure for each finding is the difference between the projected monthly cost of the bug and the projected cost after the fix.

Default traffic profile. 100 calls per day across a 30-day month, with a 3000-token system prompt where applicable. Per-check overrides:

  • TS001 / TS003 (cache misuse): 3000-token system prompt × 100 calls/day; savings = full input cost − 0.1× cache-read cost.
  • TS011 / TS102 (high max_tokens): assumes 30% of calls produce 30% of the cap, compared against a 2048-token baseline.
  • TS020 (premium model on tiny prompt): 200 input + 200 output tokens × 1000 calls/day, compared against the cheaper-tier alternative (Haiku or gpt-4o-mini).
  • TS010 / TS101 / TS103 / TS104: no monthly figure. These are unbounded-cost bugs where the relevant unit is per-incident, not monthly average — the per-session dollar callout is surfaced in the finding's explanation instead of a $ field.

Pricing data. Per-million-token rates are bundled with the package and were verified against provider pages on 2026-04-30. The table covers the Claude 4 family (Opus, Sonnet, Haiku) and the OpenAI commercial line (GPT-5, GPT-4-Turbo, GPT-4o, GPT-4o-mini). Anthropic cache reads are priced at 0.1× input; OpenAI's automatic prompt cache is reflected where the provider exposes it.

The assumptions used in each estimate are printed inline with the finding. Estimates are not invoice predictions; users with traffic substantially above or below the default profile should scale accordingly.

If the model name in a call cannot be resolved to a known entry, no cost estimate is produced. This behavior is deliberate: the tool reports a finding without a dollar figure rather than emit a guess.


Capabilities and limitations

Capability Status
Detect direct-SDK and LangChain configuration bugs in Python source Supported
Apply automatic fixes to source Not supported
Execute or import the analyzed code Not supported
Measure live traffic, cache hit rates, or response usage Planned
Analyze TypeScript or JavaScript source Planned
Recognize LiteLLM, OpenRouter, raw HTTP, or arbitrary wrapper functions Not supported
Issue network requests, ship telemetry, or collect usage data Not implemented

If a codebase does not import anthropic, openai, langchain_anthropic, or langchain_openai directly, llmdoctor will produce no findings. This is by design; the tool's matching is intentionally conservative.


Roadmap

Version Status Scope
0.1.0 Released Direct-SDK checks: TS001, TS003, TS010, TS011, TS020.
0.2.0 Released LangChain adapter: TS101, TS102, TS103, TS104.
0.3.0 Planned LlamaIndex adapter for Anthropic, OpenAI, and ReActAgent.
0.4.0 Planned TS030 (retry without budget); TS040 (tool-definition repetition).
0.5.0 Planned Optional runtime sidecar reading cache_read_input_tokens from live API responses.
1.0.0 Planned TypeScript and Node.js support across all check classes.

Frequently asked questions

Does the tool execute analyzed code? No. The tool uses ast.parse exclusively. Analyzed code is never imported, executed, or compiled.

Does the tool make network requests? No. The package contains no network-related imports. No telemetry, usage reporting, or version-check beacon is implemented.

Is LiteLLM, OpenRouter, or a custom wrapper supported? Not in the current release. Adapter modules for additional frameworks are planned. The maintainer welcomes specific patterns observed in production code.


Contact

The issue tracker is private during the 0.x release series. To report a bug, suggest a check, or share a real-world cost-leak pattern, contact the maintainer at:

issues.llmdoctor@gmail.com

The maintainer aims to respond to actionable bug reports within a few business days.


License

MIT. The full license text is bundled with the installed package at llmdoctor-<version>.dist-info/licenses/LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmdoctor-0.2.5.tar.gz (39.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmdoctor-0.2.5-py3-none-any.whl (28.1 kB view details)

Uploaded Python 3

File details

Details for the file llmdoctor-0.2.5.tar.gz.

File metadata

  • Download URL: llmdoctor-0.2.5.tar.gz
  • Upload date:
  • Size: 39.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for llmdoctor-0.2.5.tar.gz
Algorithm Hash digest
SHA256 ee5c23f8b381fdd34e143a631bbd323b2c8e07c65c78b164d285ac156977f6a3
MD5 631ecb9cd01bbef43717678d56c208cd
BLAKE2b-256 e3f39d6b2b99d54e7a641ef16363f8ee4c92ed1c2c1370e8d1f502933f73a8a3

See more details on using hashes here.

File details

Details for the file llmdoctor-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: llmdoctor-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 28.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for llmdoctor-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 92ef75314be532cfe0e4db134bc9d1d0195b602888de63eead144eae3c43b4ca
MD5 a6273b24428f9e61926dc0cea2b0e46c
BLAKE2b-256 6b1456cdfdbadcc1a3ef9eb272fc9b455bcd258d800c36c012ed9e6f000a4240

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page