Find LLM cost leaks before your bill does. Static analysis for Anthropic and OpenAI client code.
Project description
llmdoctor
A static analyzer for Python codebases that detects LLM cost-leak patterns before deployment.
Overview
llmdoctor reads Python source code and reports configuration patterns
that have been observed to cause disproportionate token consumption in
production LLM deployments. Each finding includes the affected source
location, an explanation of the cost mechanism, a recommended remediation,
and a heuristic monthly cost estimate based on a stated traffic profile.
The tool supports two integration surfaces:
- The official Anthropic and OpenAI Python SDKs (
anthropic.Anthropic,openai.OpenAI). - The LangChain framework (
langchain_anthropic.ChatAnthropic,langchain_openai.ChatOpenAI,langchain.agents.AgentExecutor).
llmdoctor performs no code execution, issues no network requests, and
emits no telemetry. It is intended for use in code review and continuous
integration pipelines.
Installation
pip install llmdoctor
Requires Python 3.9 or later.
Usage
llmdoctor doctor . # scan the current directory
llmdoctor doctor src/agent.py # scan a single file
llmdoctor doctor . --json # emit JSON for downstream tooling
llmdoctor doctor . --fail-on HIGH # exit non-zero if any HIGH finding
The --fail-on flag is intended for CI integration. The accepted values
are HIGH, MEDIUM, LOW, and INFO. The exit code is 1 if any
finding at or above the specified severity is present, and 0 otherwise.
Example output
╭── llmdoctor doctor ───────────────────────────────────────────────────╮
│ Scanned 14 file(s) under src/ │
│ Found 3 issue(s) · 2 HIGH · 1 MEDIUM │
│ Estimated potential savings: ~$340/month (heuristic) │
╰───────────────────────────────────────────────────────────────────────╯
╭── [HIGH] TS103 AgentExecutor with max_iterations=None ───────────────╮
│ file: src/agent_factory.py:23 │
│ code: agent = AgentExecutor(agent=llm, tools=tools, │
│ max_iterations=None) │
│ why: max_iterations=None disables the loop cap. If the agent's │
│ stop condition fails to trigger, the loop runs unbounded. │
│ Reported per-session cost in 2026 incidents: $1,000-$5,000. │
│ fix: Set max_iterations to a finite value (LangChain default is │
│ 15). Pair with max_execution_time for a wall-clock cap. │
╰───────────────────────────────────────────────────────────────────────╯
Each finding includes the source location, the cost mechanism, a remediation, and a cost estimate with the assumptions printed inline. The tool does not state cost figures without disclosing the assumptions used to derive them.
Check reference
The checks cluster into four failure modes: prompt-cache placement (TS001 / TS003), output caps (TS010 / TS011 / TS101 / TS102), model overspecification (TS020), and agent loop runaway (TS103 / TS104).
| Code | Severity | Surface | Description |
|---|---|---|---|
| TS001 | HIGH | Anthropic SDK | Dynamic content placed before a cache_control marker; invalidates the prompt cache on every call. |
| TS003 | MEDIUM | Anthropic SDK | Long static system prompt with no cache_control marker; pays full input cost on every call when 0.1× cache-read pricing was available. |
| TS010 | HIGH | OpenAI SDK | chat.completions.create() without max_tokens / max_completion_tokens; output cost unbounded on a single ramble. |
| TS011 | MEDIUM | OpenAI / Anthropic SDK | max_tokens above 8000; suspected copy-paste default, rarely matched by actual response length. |
| TS020 | MEDIUM | OpenAI / Anthropic SDK | Premium-tier model (Opus, GPT-5, GPT-4-Turbo, GPT-4o) on a call whose static prompt is short enough that a cheaper tier (Haiku, GPT-4o-mini) is likely to match quality at a fraction of the cost. |
| TS101 | HIGH | LangChain | ChatOpenAI() instantiated without max_tokens. All downstream .invoke() calls inherit unbounded output. |
| TS102 | MEDIUM | LangChain | ChatOpenAI / ChatAnthropic with max_tokens above 8000; same copy-paste-default risk as TS011, inherited by every downstream .invoke(). |
| TS103 | HIGH | LangChain | AgentExecutor instantiated with max_iterations=None; a single stuck session has cost $1,000–$5,000 in tokens in documented 2026 incidents. |
| TS104 | MEDIUM | LangChain | AgentExecutor with max_iterations above 50 — the "bumped the cap as a workaround" anti-pattern. At ~$0.10/turn, 200 iterations is ~$20 per stuck session. |
Suppression
To disable a specific check on a given line, append a comment in the following form:
client.chat.completions.create(...) # llmdoctor: ignore TS010
To disable all checks on a given line, use # llmdoctor: ignore ALL.
The suppression scope is per-line. Multiple codes may be specified in a single comment, separated by commas.
Comparison with adjacent tooling
llmdoctor operates statically and is complementary to runtime tools.
The intended usage pattern is to run llmdoctor in continuous integration
and run an observability tool in production.
| Tool category | When it runs | Catches | Network |
|---|---|---|---|
llmdoctor |
static (CI) | cost-leak patterns in source | None |
| Helicone, Langfuse, OpenLLMetry | runtime proxy / SDK | metrics, traces, costs | Required |
| Mem0, Letta | runtime, agent loop | memory drift | Required |
| LLMLingua | runtime, prompt rewrite | token bloat | Required |
Cost estimate methodology
Cost estimates are heuristic and intended as order-of-magnitude indicators.
Formula. Each estimate is computed as:
monthly_usd = tokens_per_call × calls_per_day × 30 × $/Mtok ÷ 1,000,000
The savings figure for each finding is the difference between the projected monthly cost of the bug and the projected cost after the fix.
Default traffic profile. 100 calls per day across a 30-day month, with a 3000-token system prompt where applicable. Per-check overrides:
- TS001 / TS003 (cache misuse): 3000-token system prompt × 100 calls/day; savings = full input cost − 0.1× cache-read cost.
- TS011 / TS102 (high
max_tokens): assumes 30% of calls produce 30% of the cap, compared against a 2048-token baseline. - TS020 (premium model on tiny prompt): 200 input + 200 output
tokens × 1000 calls/day, compared against the cheaper-tier
alternative (Haiku or
gpt-4o-mini). - TS010 / TS101 / TS103 / TS104: no monthly figure. These are
unbounded-cost bugs where the relevant unit is per-incident, not
monthly average — the per-session dollar callout is surfaced in the
finding's explanation instead of a
$field.
Pricing data. Per-million-token rates are bundled with the package and were verified against provider pages on 2026-04-30. The table covers the Claude 4 family (Opus, Sonnet, Haiku) and the OpenAI commercial line (GPT-5, GPT-4-Turbo, GPT-4o, GPT-4o-mini). Anthropic cache reads are priced at 0.1× input; OpenAI's automatic prompt cache is reflected where the provider exposes it.
The assumptions used in each estimate are printed inline with the finding. Estimates are not invoice predictions; users with traffic substantially above or below the default profile should scale accordingly.
If the model name in a call cannot be resolved to a known entry, no cost estimate is produced. This behavior is deliberate: the tool reports a finding without a dollar figure rather than emit a guess.
Pre-publication audit
The following pre-publication audit was performed on v0.2.0. The full test suite (30 tests) passes in continuous integration on every release.
Checker correctness. Eleven AST shapes were exercised as fixtures
that must not produce findings: async calls, **kwargs unpacking,
walrus expressions, multi-target assignments, annotated assignments,
augmented assignments, empty system arrays, bound-method assignment,
mock clients, and deeply nested calls. No false positives were
observed.
Input safety. Five concrete defects were resolved before release:
UTF-8 BOM crash on files written by Windows Notepad; out-of-memory
risk on multi-megabyte generated .py files (resolved with a 5 MB
size cap configurable via LLMDOCTOR_MAX_FILE_BYTES); unhandled
ValueError from ast.parse on certain binary content; unhandled
RecursionError from the visitor on minified source; and rich-markup
injection through user-controlled file paths and code snippets.
Security review. The package contains no usage of eval, exec,
or compile(..., 'exec'). The package does not import socket,
requests, httpx, or urllib. No telemetry or usage reporting is
implemented or planned.
Documented limitations. Bound-method assignment to local variables, multi-target assignments, mock clients with realistic-looking model names, calls through arbitrary wrapper functions, LiteLLM, OpenRouter, raw HTTP, and TypeScript codebases are not currently detected. Each limitation is documented in source so the tool does not silently overstate coverage.
Capabilities and limitations
| Capability | Status |
|---|---|
| Detect direct-SDK and LangChain configuration bugs in Python source | Supported |
| Apply automatic fixes to source | Not supported |
| Execute or import the analyzed code | Not supported |
| Measure live traffic, cache hit rates, or response usage | Planned |
| Analyze TypeScript or JavaScript source | Planned |
| Recognize LiteLLM, OpenRouter, raw HTTP, or arbitrary wrapper functions | Not supported |
| Issue network requests, ship telemetry, or collect usage data | Not implemented |
If a codebase does not import anthropic, openai, langchain_anthropic,
or langchain_openai directly, llmdoctor will produce no findings.
This is by design; the tool's matching is intentionally conservative.
Roadmap
| Version | Status | Scope |
|---|---|---|
| 0.1.0 | Released | Direct-SDK checks: TS001, TS003, TS010, TS011, TS020. |
| 0.2.0 | Released | LangChain adapter: TS101, TS102, TS103, TS104. |
| 0.3.0 | Planned | LlamaIndex adapter for Anthropic, OpenAI, and ReActAgent. |
| 0.4.0 | Planned | TS030 (retry without budget); TS040 (tool-definition repetition). |
| 0.5.0 | Planned | Optional runtime sidecar reading cache_read_input_tokens from live API responses. |
| 1.0.0 | Planned | TypeScript and Node.js support across all check classes. |
Frequently asked questions
Does the tool execute analyzed code?
No. The tool uses ast.parse exclusively. Analyzed code is never
imported, executed, or compiled.
Does the tool make network requests? No. The package contains no network-related imports. No telemetry, usage reporting, or version-check beacon is implemented.
How are mocked clients in test code handled?
Mock-client patterns are exercised in the test suite as fixtures that
must not produce false positives. If a false positive does occur,
suppress the finding with a # llmdoctor: ignore <CODE> comment and
report the case to the maintainer (see Contact below).
Is LiteLLM, OpenRouter, or a custom wrapper supported? Not in the current release. Adapter modules for additional frameworks are planned. The maintainer welcomes specific patterns observed in production code.
Contact
The issue tracker is private during the 0.x release series. To report a bug, suggest a check, or share a real-world cost-leak pattern, contact the maintainer at:
The maintainer aims to respond to actionable bug reports within a few business days.
License
MIT. The full license text is bundled with the installed package at
llmdoctor-<version>.dist-info/licenses/LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmdoctor-0.2.2.tar.gz.
File metadata
- Download URL: llmdoctor-0.2.2.tar.gz
- Upload date:
- Size: 39.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
34b91bc5618561a0f2bbd34c6a70f0a745955a3ef061005f344f9ddd9bf30523
|
|
| MD5 |
7bc2cb63e1b206d3b6efc85ef52ecc53
|
|
| BLAKE2b-256 |
688862487946c18fd722e7dc37b0703596310a25e32209bc5c5a346058c7afe3
|
File details
Details for the file llmdoctor-0.2.2-py3-none-any.whl.
File metadata
- Download URL: llmdoctor-0.2.2-py3-none-any.whl
- Upload date:
- Size: 28.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7960a35822fb3fc2d8d83c68f2d450d29f46ff740fda4fe9a8e4c92e9bfd9281
|
|
| MD5 |
c2c2bc40fdb0be9203f9550a5d57af7a
|
|
| BLAKE2b-256 |
16dad9cef06956b22975892094d386adeedcc2430c62ba3844f0a823bc841613
|