Lightweight drop-in Python decorator to track costs, monitor token usage, and enforce budget/rate limits for LLM API calls

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

FelipeMorandini

These details have not been verified by PyPI

Project description

llm-toll

A lightweight, drop-in Python decorator to track costs, monitor token usage, and enforce budget and rate limits for LLM API calls.

Overview

llm_toll is a developer tool designed for local prototyping and small-scale production scripts. By simply wrapping a function with @track_costs, developers can automatically log token usage, calculate the exact cost of the run in USD, and halt execution if a hard-coded budget or API rate limit is breached.

Features

Drop-In Decorator — Minimal code intrusion. Just add @track_costs above any function making an LLM call.
Multi-Provider Support — Built-in pricing matrices for OpenAI, Anthropic, Gemini, and general OpenAI-compatible endpoints.
Hard Budget Caps — Prevents functions from executing if the cumulative cost exceeds a defined threshold.
Rate Limiting — Local enforcement of RPM and TPM to prevent HTTP 429 errors.
Local Persistence — SQLite-backed usage tracking across multiple script runs and days.
Cost Reporting — Clean, color-coded terminal summary of cost per call and total session cost.

Quick Start

Installation

pip install llm-toll
# or, with uv
uv add llm-toll

Basic Usage (Auto-detect)

For users utilizing standard SDKs, the decorator infers the model and token count from the response object.

from llm_toll import track_costs

@track_costs(project="my_scraper", max_budget=2.00, reset="monthly")
def generate_summary(text):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": text}]
    )
    return response  # Decorator parses the usage from this object

Advanced Usage (Rate Limits & Explicit Models)

For custom setups or raw API requests, users can explicitly state the model and rate limits.

from llm_toll import track_costs

@track_costs(
    model="claude-sonnet-4-20250514",
    rate_limit=50,       # max 50 requests per minute
    tpm_limit=40000,     # max 40k tokens per minute
    extract_usage=lambda res: (res['model'], res['in_tokens'], res['out_tokens'])
)
def custom_anthropic_call(prompt):
    # custom logic here
    pass

Rate limits use a sliding-window algorithm. When a limit is reached, LocalRateLimitError is raised with a retry_after attribute indicating how long to wait.

Streaming Support

The decorator automatically detects streaming responses (generators). Cost is tracked after the stream is fully consumed.

from llm_toll import track_costs

@track_costs(project="my_app", max_budget=5.00)
def stream_response(text):
    return client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": text}],
        stream=True,
        stream_options={"include_usage": True},  # recommended for accurate counts
    )

for chunk in stream_response("Hello"):
    print(chunk.choices[0].delta.content, end="")
# Cost is logged automatically after the stream completes

Note: For accurate token counts with OpenAI streaming, pass stream_options={"include_usage": True}. Without it, output tokens are estimated using a character-based heuristic.

Async Support

The decorator auto-detects async functions and async generators — no changes needed:

from llm_toll import track_costs

@track_costs(project="my_app", max_budget=5.00)
async def async_chat(text):
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": text}]
    )
    return response

@track_costs(project="my_app")
async def async_stream(text):
    stream = await client.chat.completions.create(
        model="gpt-4o", messages=[{"role": "user", "content": text}],
        stream=True, stream_options={"include_usage": True},
    )
    async for chunk in stream:
        yield chunk

SQLite operations run in a thread pool (asyncio.to_thread) so the event loop is never blocked.

Supported Providers

Provider	SDK Auto-Parsing	Streaming Support	Custom Model Overrides
OpenAI	Yes (`openai` client)	Yes (chunk calculation)	Yes
Anthropic	Yes (`anthropic` client)	Yes	Yes
Google Gemini	Yes (`google-genai` client)	Yes	Yes
Local/Ollama	Via OpenAI-compat API	N/A	Rate limiting only ($0 cost)

Local/Ollama Models

Local models (ollama/, local/, llama.cpp/ prefixes) are tracked at $0 cost. Rate limiting still applies — useful for managing local GPU resources.

from llm_toll import track_costs

@track_costs(
    model="ollama/llama3",
    rate_limit=10,       # limit local GPU to 10 RPM
    extract_usage=lambda r: ("ollama/llama3", r["prompt_tokens"], r["completion_tokens"])
)
def local_inference(prompt):
    # Ollama call here
    pass

Tip: Ollama's API is OpenAI-compatible, so if you use the openai client pointed at localhost:11434, auto-parsing works automatically.

LiteLLM Integration

Track costs automatically across all LiteLLM calls — no decorator needed:

import litellm
from llm_toll import LiteLLMCallback

litellm.callbacks = [LiteLLMCallback(project="my-app", max_budget=10.0)]

# All litellm completions are now tracked automatically
response = litellm.completion(model="gpt-4o", messages=[{"role": "user", "content": "Hi"}])

The callback also works with the @track_costs decorator — use whichever approach fits your codebase.

LangChain Integration

Track costs across all LLM calls in a LangChain chain or agent:

from langchain_openai import ChatOpenAI
from llm_toll import LangChainCallback

handler = LangChainCallback(project="my-chain", max_budget=10.0)
llm = ChatOpenAI(model="gpt-4o", callbacks=[handler])

Budget is checked before each LLM call (on_llm_start), and usage is logged after (on_llm_end).

Error Handling

from llm_toll.exceptions import BudgetExceededError, LocalRateLimitError

try:
    result = generate_summary("some text")
except BudgetExceededError as e:
    print(f"Budget exceeded: {e}")
except LocalRateLimitError as e:
    print(f"Rate limit hit: {e}")

CLI Dashboard

View costs and usage from the terminal:

# Show cost summary across all projects
llm-toll --stats

# Filter by project or model
llm-toll --stats --project my_scraper
llm-toll --stats --model gpt-4o

# Reset a project's budget counter
llm-toll --reset --project my_scraper

# Export usage logs to CSV
llm-toll --export csv > usage.csv
llm-toll --export csv --project my_scraper --output report.csv

# Update pricing from remote source
llm-toll --update-pricing

# Launch web dashboard with charts and analytics
llm-toll --dashboard
llm-toll --dashboard --port 9000

The web dashboard shows cost trends, project/model breakdowns, and budget utilization in your browser at http://127.0.0.1:8050.

Pricing can also be updated programmatically:

from llm_toll import update_pricing

update_pricing()  # fetches latest pricing, caches for 24h

PostgreSQL Backend (Team-Wide Tracking)

For team-wide cost visibility, use a shared PostgreSQL database:

pip install llm-toll[postgres]

# Set via environment variable (all @track_costs decorators auto-connect)
export LLM_TOLL_STORE_URL=postgresql://user:pass@host/llm_costs

Or configure programmatically:

from llm_toll import create_store, set_store

store = create_store(url="postgresql://user:pass@host/llm_costs")
set_store(store)

The CLI also supports --store-url:

llm-toll --stats --store-url postgresql://user:pass@host/llm_costs

The PostgreSQL backend uses connection pooling and row-level locking for safe concurrent budget enforcement across multiple processes and machines.

Development

# Install dependencies
uv sync --extra dev

# Run tests
uv run pytest

# Lint & format
uv run ruff check .
uv run ruff format .

# Type check
uv run mypy src/llm_toll

License

MIT License — see LICENSE for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

FelipeMorandini

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.13.0

Mar 23, 2026

0.12.0

Mar 23, 2026

0.11.0

Mar 23, 2026

0.10.0

Mar 23, 2026

0.9.0

Mar 22, 2026

0.8.0

Mar 22, 2026

0.7.0

Mar 22, 2026

0.6.0

Mar 22, 2026

0.5.0

Mar 21, 2026

0.4.0

Mar 21, 2026

0.3.0

Mar 21, 2026

0.2.0

Mar 21, 2026

0.1.0

Mar 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_toll-0.13.0.tar.gz (161.2 kB view details)

Uploaded Mar 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_toll-0.13.0-py3-none-any.whl (41.2 kB view details)

Uploaded Mar 23, 2026 Python 3

File details

Details for the file llm_toll-0.13.0.tar.gz.

File metadata

Download URL: llm_toll-0.13.0.tar.gz
Upload date: Mar 23, 2026
Size: 161.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_toll-0.13.0.tar.gz
Algorithm	Hash digest
SHA256	`edd61bc2e44acb8fb2824bf3177ab152b8302b47ad9a73d44d8ce3464378411f`
MD5	`6ef3ce95a05e8f65fd750a66870437b9`
BLAKE2b-256	`22ce7896971931aee207e97750f90232916a86e4ab7a29bcda450ff7bd863892`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_toll-0.13.0.tar.gz:

Publisher: release.yml on FelipeMorandini/llm-toll

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llm_toll-0.13.0.tar.gz
- Subject digest: edd61bc2e44acb8fb2824bf3177ab152b8302b47ad9a73d44d8ce3464378411f
- Sigstore transparency entry: 1162131890
- Sigstore integration time: Mar 23, 2026
Source repository:
- Permalink: FelipeMorandini/llm-toll@366d01e36499338ff79139422d6e7e0be648431d
- Branch / Tag: refs/tags/v0.13.0
- Owner: https://github.com/FelipeMorandini
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@366d01e36499338ff79139422d6e7e0be648431d
- Trigger Event: push

File details

Details for the file llm_toll-0.13.0-py3-none-any.whl.

File metadata

Download URL: llm_toll-0.13.0-py3-none-any.whl
Upload date: Mar 23, 2026
Size: 41.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_toll-0.13.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ef12866f39c324b6d5fc18e6b845e0ba44a775e73adef813dfa531f868f576f3`
MD5	`73b5e23999a2843779e088e2f8349092`
BLAKE2b-256	`8043945c4cb6f6f97ec4f35297ccefc1001f2338fe85c5de1c15637c44f0f530`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_toll-0.13.0-py3-none-any.whl:

Publisher: release.yml on FelipeMorandini/llm-toll

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llm_toll-0.13.0-py3-none-any.whl
- Subject digest: ef12866f39c324b6d5fc18e6b845e0ba44a775e73adef813dfa531f868f576f3
- Sigstore transparency entry: 1162131983
- Sigstore integration time: Mar 23, 2026
Source repository:
- Permalink: FelipeMorandini/llm-toll@366d01e36499338ff79139422d6e7e0be648431d
- Branch / Tag: refs/tags/v0.13.0
- Owner: https://github.com/FelipeMorandini
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@366d01e36499338ff79139422d6e7e0be648431d
- Trigger Event: push

llm-toll 0.13.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

llm-toll

Overview

Features

Quick Start

Installation

Basic Usage (Auto-detect)

Advanced Usage (Rate Limits & Explicit Models)

Streaming Support

Async Support

Supported Providers

Local/Ollama Models

LiteLLM Integration

LangChain Integration

Error Handling

CLI Dashboard

PostgreSQL Backend (Team-Wide Tracking)

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance