Skip to main content

Lightweight drop-in Python decorator to track costs, monitor token usage, and enforce budget/rate limits for LLM API calls

Project description

llm-toll

A lightweight, drop-in Python decorator to track costs, monitor token usage, and enforce budget and rate limits for LLM API calls.

Overview

llm_toll is a developer tool designed for local prototyping and small-scale production scripts. By simply wrapping a function with @track_costs, developers can automatically log token usage, calculate the exact cost of the run in USD, and halt execution if a hard-coded budget or API rate limit is breached.

Features

  • Drop-In Decorator — Minimal code intrusion. Just add @track_costs above any function making an LLM call.
  • Multi-Provider Support — Built-in pricing matrices for OpenAI, Anthropic, Gemini, and general OpenAI-compatible endpoints.
  • Hard Budget Caps — Prevents functions from executing if the cumulative cost exceeds a defined threshold.
  • Rate Limiting — Local enforcement of RPM and TPM to prevent HTTP 429 errors.
  • Local Persistence — SQLite-backed usage tracking across multiple script runs and days.
  • Cost Reporting — Clean, color-coded terminal summary of cost per call and total session cost.

Quick Start

Installation

pip install llm-toll
# or, with uv
uv add llm-toll

Basic Usage (Auto-detect)

For users utilizing standard SDKs, the decorator infers the model and token count from the response object.

from llm_toll import track_costs

@track_costs(project="my_scraper", max_budget=2.00, reset="monthly")
def generate_summary(text):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": text}]
    )
    return response  # Decorator parses the usage from this object

Advanced Usage (Rate Limits & Explicit Models)

For custom setups or raw API requests, users can explicitly state the model and rate limits.

from llm_toll import track_costs

@track_costs(
    model="claude-sonnet-4-20250514",
    rate_limit="50/min",
    tpm_limit="40000/min",
    extract_usage=lambda res: (res['in_tokens'], res['out_tokens'])
)
def custom_anthropic_call(prompt):
    # custom logic here
    pass

Streaming Support

The decorator automatically detects streaming responses (generators). Cost is tracked after the stream is fully consumed.

from llm_toll import track_costs

@track_costs(project="my_app", max_budget=5.00)
def stream_response(text):
    return client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": text}],
        stream=True,
        stream_options={"include_usage": True},  # recommended for accurate counts
    )

for chunk in stream_response("Hello"):
    print(chunk.choices[0].delta.content, end="")
# Cost is logged automatically after the stream completes

Note: For accurate token counts with OpenAI streaming, pass stream_options={"include_usage": True}. Without it, output tokens are estimated using a character-based heuristic.

Supported Providers

Provider SDK Auto-Parsing Streaming Support Custom Model Overrides
OpenAI Yes (openai client) Yes (chunk calculation) Yes
Anthropic Yes (anthropic client) Yes Yes
Google Gemini Yes (google-genai client) Yes Yes
Local/Ollama No (Cost is $0) N/A Rate limiting only

Error Handling

from llm_toll.exceptions import BudgetExceededError, LocalRateLimitError

try:
    result = generate_summary("some text")
except BudgetExceededError as e:
    print(f"Budget exceeded: {e}")
except LocalRateLimitError as e:
    print(f"Rate limit hit: {e}")

Development

# Install dependencies
uv sync --extra dev

# Run tests
uv run pytest

# Lint & format
uv run ruff check .
uv run ruff format .

# Type check
uv run mypy src/llm_toll

License

MIT License — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_toll-0.2.0.tar.gz (127.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_toll-0.2.0-py3-none-any.whl (19.0 kB view details)

Uploaded Python 3

File details

Details for the file llm_toll-0.2.0.tar.gz.

File metadata

  • Download URL: llm_toll-0.2.0.tar.gz
  • Upload date:
  • Size: 127.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_toll-0.2.0.tar.gz
Algorithm Hash digest
SHA256 29d4109cf0068ba0b0e330e396954298b32a78bfb54de014040de37ab295655c
MD5 899de87b232ef8131df4ec7d713b3514
BLAKE2b-256 f3ee5c2a56b4e871d42580abad39e854f9967deaf5866b9eba3df4a458a897c7

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_toll-0.2.0.tar.gz:

Publisher: release.yml on FelipeMorandini/llm-toll

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llm_toll-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: llm_toll-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 19.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_toll-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c0f28a0e9da542e7057d9426baeb5ca184b92dc9709a7aec17a75c5501d599b9
MD5 26339577df9c30c38dbbb95400e5f5c1
BLAKE2b-256 0aece3a9056a48a0841bc47248aec012281d295ddddfcae21db5e9d881e21eda

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_toll-0.2.0-py3-none-any.whl:

Publisher: release.yml on FelipeMorandini/llm-toll

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page