Lightweight drop-in Python decorator to track costs, monitor token usage, and enforce budget/rate limits for LLM API calls
Project description
llm-toll
A lightweight, drop-in Python decorator to track costs, monitor token usage, and enforce budget and rate limits for LLM API calls.
Overview
llm_toll is a developer tool designed for local prototyping and small-scale production scripts. By simply wrapping a function with @track_costs, developers can automatically log token usage, calculate the exact cost of the run in USD, and halt execution if a hard-coded budget or API rate limit is breached.
Features
- Drop-In Decorator — Minimal code intrusion. Just add
@track_costsabove any function making an LLM call. - Multi-Provider Support — Built-in pricing matrices for OpenAI, Anthropic, Gemini, and general OpenAI-compatible endpoints.
- Hard Budget Caps — Prevents functions from executing if the cumulative cost exceeds a defined threshold.
- Rate Limiting — Local enforcement of RPM and TPM to prevent HTTP 429 errors.
- Local Persistence — SQLite-backed usage tracking across multiple script runs and days.
- Cost Reporting — Clean, color-coded terminal summary of cost per call and total session cost.
Quick Start
Installation
pip install llm-toll
# or, with uv
uv add llm-toll
Basic Usage (Auto-detect)
For users utilizing standard SDKs, the decorator infers the model and token count from the response object.
from llm_toll import track_costs
@track_costs(project="my_scraper", max_budget=2.00, reset="monthly")
def generate_summary(text):
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": text}]
)
return response # Decorator parses the usage from this object
Advanced Usage (Rate Limits & Explicit Models)
For custom setups or raw API requests, users can explicitly state the model and rate limits.
from llm_toll import track_costs
@track_costs(
model="claude-sonnet-4-20250514",
rate_limit="50/min",
tpm_limit="40000/min",
extract_usage=lambda res: (res['in_tokens'], res['out_tokens'])
)
def custom_anthropic_call(prompt):
# custom logic here
pass
Streaming Support
The decorator automatically detects streaming responses (generators). Cost is tracked after the stream is fully consumed.
from llm_toll import track_costs
@track_costs(project="my_app", max_budget=5.00)
def stream_response(text):
return client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": text}],
stream=True,
stream_options={"include_usage": True}, # recommended for accurate counts
)
for chunk in stream_response("Hello"):
print(chunk.choices[0].delta.content, end="")
# Cost is logged automatically after the stream completes
Note: For accurate token counts with OpenAI streaming, pass
stream_options={"include_usage": True}. Without it, output tokens are estimated using a character-based heuristic.
Supported Providers
| Provider | SDK Auto-Parsing | Streaming Support | Custom Model Overrides |
|---|---|---|---|
| OpenAI | Yes (openai client) |
Yes (chunk calculation) | Yes |
| Anthropic | Yes (anthropic client) |
Yes | Yes |
| Google Gemini | Yes (google-genai client) |
Yes | Yes |
| Local/Ollama | No (Cost is $0) | N/A | Rate limiting only |
Error Handling
from llm_toll.exceptions import BudgetExceededError, LocalRateLimitError
try:
result = generate_summary("some text")
except BudgetExceededError as e:
print(f"Budget exceeded: {e}")
except LocalRateLimitError as e:
print(f"Rate limit hit: {e}")
Development
# Install dependencies
uv sync --extra dev
# Run tests
uv run pytest
# Lint & format
uv run ruff check .
uv run ruff format .
# Type check
uv run mypy src/llm_toll
License
MIT License — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_toll-0.2.0.tar.gz.
File metadata
- Download URL: llm_toll-0.2.0.tar.gz
- Upload date:
- Size: 127.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29d4109cf0068ba0b0e330e396954298b32a78bfb54de014040de37ab295655c
|
|
| MD5 |
899de87b232ef8131df4ec7d713b3514
|
|
| BLAKE2b-256 |
f3ee5c2a56b4e871d42580abad39e854f9967deaf5866b9eba3df4a458a897c7
|
Provenance
The following attestation bundles were made for llm_toll-0.2.0.tar.gz:
Publisher:
release.yml on FelipeMorandini/llm-toll
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llm_toll-0.2.0.tar.gz -
Subject digest:
29d4109cf0068ba0b0e330e396954298b32a78bfb54de014040de37ab295655c - Sigstore transparency entry: 1154519583
- Sigstore integration time:
-
Permalink:
FelipeMorandini/llm-toll@3494bfd359b25530dd45306ec60e56526ce61dcb -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/FelipeMorandini
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@3494bfd359b25530dd45306ec60e56526ce61dcb -
Trigger Event:
push
-
Statement type:
File details
Details for the file llm_toll-0.2.0-py3-none-any.whl.
File metadata
- Download URL: llm_toll-0.2.0-py3-none-any.whl
- Upload date:
- Size: 19.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0f28a0e9da542e7057d9426baeb5ca184b92dc9709a7aec17a75c5501d599b9
|
|
| MD5 |
26339577df9c30c38dbbb95400e5f5c1
|
|
| BLAKE2b-256 |
0aece3a9056a48a0841bc47248aec012281d295ddddfcae21db5e9d881e21eda
|
Provenance
The following attestation bundles were made for llm_toll-0.2.0-py3-none-any.whl:
Publisher:
release.yml on FelipeMorandini/llm-toll
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llm_toll-0.2.0-py3-none-any.whl -
Subject digest:
c0f28a0e9da542e7057d9426baeb5ca184b92dc9709a7aec17a75c5501d599b9 - Sigstore transparency entry: 1154519585
- Sigstore integration time:
-
Permalink:
FelipeMorandini/llm-toll@3494bfd359b25530dd45306ec60e56526ce61dcb -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/FelipeMorandini
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@3494bfd359b25530dd45306ec60e56526ce61dcb -
Trigger Event:
push
-
Statement type: