Skip to main content

Catch LLM cost changes in code review. Infracost for LLM spend.

Project description

tokentoll

Prevent LLM cost regressions before production.

CI PyPI version GitHub Marketplace License: MIT Python 3.10+ tokentoll MCP server

tokentoll is a CI gate for LLM cost. It statically analyzes Python (JS/TS planned) for LLM API calls, scores every pull request against a policy you control, and posts a PASS/WARN/FAIL verdict directly on the PR. Optionally, it fails the workflow when the policy is violated, so cost regressions cannot be merged.

tokentoll demo

The verdict comment

When a PR violates your policy, tokentoll comments with a verdict and a blocking-findings list, then exits non-zero so the check fails. Example:

## tokentoll verdict: FAIL

**Blocking findings (2):**

- `src/agent.py:42` - per-call cost grew 15.0x (threshold 5x)
- total monthly delta +$812.00 exceeds budget $250.00

> Required action: revert the regression, raise the threshold in `.tokentoll.yml`, or add an exemption.

When the PR is clean, the verdict is PASS and the comment shows only the cost delta table. When no policy is configured, tokentoll posts an informational delta comment with no verdict.

Quick start (60 seconds)

Add .github/workflows/tokentoll.yml:

name: tokentoll
on:
  pull_request:
    paths:
      - "**.py"

permissions:
  contents: read
  pull-requests: write

jobs:
  cost-gate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - uses: Jwrede/tokentoll@v0.7.0
        with:
          fail-on-policy-violation: true

Then add .tokentoll.yml to your repo root:

budgets:
  max_monthly_delta_usd: 250
  max_callsite_monthly_usd: 100
  max_relative_increase: 5.0

policies:
  block_unknown_models: true
  fail_on_policy_violation: true

Future PRs receive a verdict comment. PRs that exceed the thresholds fail the workflow.

For SHA-pinned installs and minimal-permissions setups, see docs/github-action.md. For the full policy schema, see docs/policy.md. For the security posture, see docs/security.md.

What it detects

SDK Patterns Status
OpenAI chat.completions.create, responses.create Supported
Anthropic messages.create, messages.stream Supported
Google GenAI models.generate_content Supported
LiteLLM completion, acompletion Supported
LangChain ChatOpenAI, ChatAnthropic, init_chat_model Supported
Zhipu AI ZhipuAiClient, ZhipuAI (GLM models) Supported
JS/TS SDKs OpenAI Node, Anthropic, Vercel AI SDK, LangChain.js Planned (v0.8)

Policy rules

The policy block in .tokentoll.yml controls when a PR fails:

Rule Trigger
budgets.max_monthly_delta_usd total estimated monthly delta exceeds the threshold
budgets.max_callsite_monthly_usd any new or changed call site exceeds the threshold
budgets.max_relative_increase per-call cost for any modified call site grows by more than this multiplier
policies.block_unknown_models any new or modified call site uses an unpriced or unresolved model
policies.fail_on_policy_violation tokentoll diff exits 1 on FAIL (CI gate behavior)

Each rule is independent. Leave a field unset to disable that rule. Full reference in docs/policy.md.

CLI

pip install tokentoll

# Scan current directory for LLM API calls and their costs
tokentoll scan .

# Show cost impact of your last commit
tokentoll diff HEAD~1

# Compare two refs and fail on policy violation
tokentoll diff main..HEAD --fail-on-policy-violation

Subcommands:

tokentoll scan [PATH...] [--format table|json|markdown] [--calls-per-month N] [--config PATH]
tokentoll diff [REF] [--base REF] [--head REF] [--format table|json|markdown|github-comment]
               [--config PATH] [--fail-on-policy-violation]
tokentoll update    # refresh bundled pricing data from LiteLLM

Configuration

.tokentoll.yml lives in the repo root and is auto-discovered. Beyond the policy block:

# Per-SDK defaults for dynamic (runtime-resolved) model names
default_models:
  openai: gpt-4o-mini
  anthropic: claude-haiku-3-20240307

# Assumed monthly call volume per call site (used for dollar estimates)
calls_per_month: 5000

# Skip cost estimation for dynamic models entirely.
# Default false: dynamic calls are priced against the per-SDK default.
skip_dynamic_models: false

# Default excludes (tests/, examples/, docs/, cookbook/, benchmarks/, evals/,
# scripts/, notebooks/) are applied automatically. Opt out with:
use_default_excludes: false

# Additional excludes (prefix or glob)
exclude:
  - "*_test.py"
  - vendor/

# Per-path overrides (longest prefix match)
overrides:
  - path: src/agents/
    default_model: gpt-4o
    calls_per_month: 10000
  - path: src/azure/
    skip_dynamic_models: true

Resolution order for dynamic model defaults: default_models (per-SDK) > default_model (generic) > built-in SDK defaults.

Security

tokentoll requires no API keys, sends no telemetry, and runs entirely inside your CI environment. Pricing data ships with the package and updates from LiteLLM on demand. For the recommended permission set, SHA pinning, and fork PR risk, see docs/security.md.

MCP server

tokentoll MCP server

tokentoll ships an MCP (Model Context Protocol) server so Claude Code and other MCP hosts can check the cost impact of LLM code changes from inside an agent conversation:

pip install tokentoll[mcp]
claude mcp add --transport stdio tokentoll -- tokentoll-mcp

Two tools are exposed: scan (estimate costs across a path) and diff (compare two refs). Both return JSON.

How it works

  Source code (.py)
        |
        v
  +-------------+     +------------------+
  | AST scanner |---->| SDK detectors    |
  | (ast.parse) |     | OpenAI, Anthropic|
  +-------------+     | Google, LiteLLM, |
                       | LangChain, Zhipu |
                       +------------------+
                              |
                              v
                       +------------------+
                       | Pricing engine   |
                       | 2200+ models     |
                       +------------------+
                              |
                              v
                       +------------------+
                       | Diff engine      |
                       | (old vs new)     |
                       +------------------+
                              |
                              v
                       +------------------+
                       | Policy evaluator |
                       | PASS/WARN/FAIL   |
                       +------------------+
                              |
                              v
                       +------------------+
                       | PR comment / CLI |
                       | output           |
                       +------------------+

A multi-pass constant propagation engine resolves model names through variable assignments, os.getenv() fallbacks, function defaults, class attributes, constructor arguments, dict literals, and **kwargs unpacking, so real-world code with indirection still produces useful estimates.

Pricing data

Pricing is bundled and works offline. To refresh from LiteLLM:

tokentoll update

Coverage: 300+ models across OpenAI, Anthropic, Google, AWS Bedrock, Azure, and more, plus 2200+ entries from LiteLLM's combined catalog.

Limitations

  • Static analysis only. Models loaded from databases or remote config cannot be resolved; tokentoll falls back to the configured per-SDK default and marks the call site as (default).
  • Token estimates use a characters/4 heuristic unless tiktoken is installed (pip install tokentoll[tiktoken]).
  • Monthly estimates assume uniform call volume per call site. Override per-project with calls_per_month or per-path with overrides.
  • Python only in v0.7. JS/TS support is the focus of v0.8.

Roadmap

  • v0.8: JS/TS support (OpenAI Node SDK, Anthropic, Vercel AI SDK, LangChain.js) via tree-sitter
  • v0.9: Public demo repo with a known-failing PR, gpt-researcher case study, expanded adoption section
  • Future: Context-aware call frequency inference (FastAPI routes versus scripts versus loops)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokentoll-0.7.0.tar.gz (180.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokentoll-0.7.0-py3-none-any.whl (71.9 kB view details)

Uploaded Python 3

File details

Details for the file tokentoll-0.7.0.tar.gz.

File metadata

  • Download URL: tokentoll-0.7.0.tar.gz
  • Upload date:
  • Size: 180.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for tokentoll-0.7.0.tar.gz
Algorithm Hash digest
SHA256 bf6dde5e511ea6a26e3724a345a6c1dc8e212c81d45cab98b6d072235d084680
MD5 ba585663127cd7869eff2a25419144ef
BLAKE2b-256 92ca068a4d1efa41fbb2522599e6838e2fd63fe55f5ed6f5db0a67f8864488ae

See more details on using hashes here.

File details

Details for the file tokentoll-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: tokentoll-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 71.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for tokentoll-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3711ea537150eab67f828ab52ca785a856f712fed6fd1903f7e98b9ee4a3505a
MD5 d943ab7a748bfd2cc284789dc0382342
BLAKE2b-256 6bafe3b616d25e746e740817e961fcfbc712693f59283444914575c1d18f05b2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page