LLM cost tracking and budget enforcement. Zero config — with budget(max_usd=1.00): run_agent(). Works with LangGraph, CrewAI, raw OpenAI/Anthropic.
Project description
shekel
LLM cost tracking and budget enforcement for Python. One line. Zero config.
with budget(max_usd=1.00):
run_my_agent() # raises BudgetExceededError if spend exceeds $1.00
The problem
LLM agent loops can burn money fast. A retry bug, an infinite loop, an unexpectedly expensive prompt — and you wake up to a $47 bill. shekel stops that from happening by letting you set a hard spending cap around any block of code.
Install
pip install shekel[openai] # OpenAI
pip install shekel[anthropic] # Anthropic
pip install shekel[all] # Both
pip install shekel[all-models] # Both + tokencost (400+ model pricing)
Usage
Enforce a budget
from shekel import budget, BudgetExceededError
try:
with budget(max_usd=1.00) as b:
run_my_agent()
print(f"Done. Spent ${b.spent:.4f}")
except BudgetExceededError as e:
print(e)
# Budget of $1.00 exceeded ($1.0023 spent)
# Last call: gpt-4o — 512 input + 1024 output tokens
# Tip: Increase max_usd or add warn_at=0.8 to get an early warning next time.
Get a warning before the limit hits
def on_warning(spent: float, limit: float) -> None:
print(f"Warning: ${spent:.4f} of ${limit:.2f} used")
with budget(max_usd=1.00, warn_at=0.8, on_exceed=on_warning) as b:
run_my_agent()
Or without a callback — shekel will emit a warnings.warn automatically.
Fall back to a cheaper model instead of raising
with budget(max_usd=0.50, fallback="gpt-4o-mini") as b:
run_my_agent() # switches to gpt-4o-mini once $0.50 is hit, keeps going
print(f"Switched model: {b.model_switched}") # True
print(f"Switched at: ${b.switched_at_usd:.4f}")
print(f"Fallback cost: ${b.fallback_spent:.4f}")
A hard_cap (default: max_usd * 2) stops runaway spending on the fallback model.
Decorator
from shekel import with_budget
@with_budget(max_usd=0.10)
def call_llm():
client.chat.completions.create(...)
# Fresh budget on every call
call_llm()
call_llm()
Works with async functions too.
Track spend across multiple runs (persistent budget)
session = budget(max_usd=5.00, persistent=True)
with session:
run_agent_step_1()
with session:
run_agent_step_2()
print(f"Total session cost: ${session.spent:.4f}")
session.reset() # clear for next session
Track spend without enforcing a limit
with budget() as b:
run_my_agent()
print(f"That run cost: ${b.spent:.4f}")
Spend summary
with budget(max_usd=2.00) as b:
run_my_agent()
print(b.summary())
# ┌─────────────────────────────────┐
# │ shekel spend report │
# ├─────────────────────────────────┤
# │ total spent: $0.1234 │
# │ limit: $2.00 │
# │ remaining: $1.8766 │
# ├──────────────┬──────────────────┤
# │ model │ cost │
# ├──────────────┼──────────────────┤
# │ gpt-4o │ $0.1234 │
# └──────────────┴──────────────────┘
Async support
async with budget(max_usd=1.00) as b:
await run_my_async_agent()
Custom / unlisted model pricing
with budget(max_usd=1.00, price_per_1k_tokens={"input": 0.001, "output": 0.003}):
run_my_agent()
Or install shekel[all-models] for automatic pricing of 400+ models via tokencost.
Works with LangGraph, CrewAI, and everything else
shekel intercepts at the SDK level — it works with any framework that uses OpenAI or Anthropic under the hood.
# LangGraph
with budget(max_usd=2.00, warn_at=0.8) as b:
result = app.invoke({"input": "..."})
print(f"Graph run: ${b.spent:.4f}")
# CrewAI
with budget(max_usd=5.00) as b:
crew.kickoff()
# Autogen, LlamaIndex, raw SDK — same pattern
with budget(max_usd=0.50) as b:
for _ in range(100):
client.chat.completions.create(...) # stops when budget hit
API reference
budget(...) / with_budget(...)
| Parameter | Type | Default | Description |
|---|---|---|---|
max_usd |
float | None |
None |
Hard spend cap in USD. None = track only. |
warn_at |
float | None |
None |
Fraction of limit (0.0–1.0) at which to warn. |
on_exceed |
Callable[[float, float], None] | None |
None |
Callback fired at warn_at threshold. Receives (spent, limit). |
price_per_1k_tokens |
dict | None |
None |
Override pricing: {"input": 0.001, "output": 0.003}. |
fallback |
str | None |
None |
Model to switch to when max_usd is hit. Same provider only. |
on_fallback |
Callable[[float, float, str], None] | None |
None |
Callback on fallback switch. Receives (spent, limit, fallback_model). |
hard_cap |
float | None |
max_usd * 2 |
Absolute ceiling when fallback is active. |
persistent |
bool |
False |
If True, spend accumulates across multiple with blocks. |
budget properties
| Property | Type | Description |
|---|---|---|
b.spent |
float |
Total USD spent so far. |
b.remaining |
float | None |
USD remaining, or None in track-only mode. |
b.limit |
float | None |
Configured max_usd, or None. |
b.model_switched |
bool |
True if fallback model was activated. |
b.switched_at_usd |
float | None |
Spend level at which fallback was triggered. |
b.fallback_spent |
float |
Cost accumulated on the fallback model. |
BudgetExceededError
| Attribute | Type | Description |
|---|---|---|
e.spent |
float |
Total spend when limit was hit. |
e.limit |
float |
The configured max_usd. |
e.model |
str |
Model name from the call that triggered the error. |
e.tokens |
dict |
{"input": N, "output": N} from the last call. |
Supported models
10 models are bundled with zero dependencies:
| Model | Input / 1k | Output / 1k |
|---|---|---|
| gpt-4o | $0.00250 | $0.01000 |
| gpt-4o-mini | $0.000150 | $0.000600 |
| o1 | $0.01500 | $0.06000 |
| o1-mini | $0.00300 | $0.01200 |
| gpt-3.5-turbo | $0.000500 | $0.001500 |
| claude-3-5-sonnet-20241022 | $0.00300 | $0.01500 |
| claude-3-haiku-20240307 | $0.000250 | $0.001250 |
| claude-3-opus-20240229 | $0.01500 | $0.07500 |
| gemini-1.5-flash | $0.0000750 | $0.000300 |
| gemini-1.5-pro | $0.00125 | $0.00500 |
For any other model, either pass price_per_1k_tokens or install shekel[all-models] for automatic pricing of 400+ models.
How it works
- Monkey-patching — on context entry, shekel wraps
openai.ChatCompletions.createandanthropic.Messages.createat the class level. Your code calls the real SDK; shekel intercepts the response, reads token counts, and records the cost. Original methods are restored on exit. - ContextVar isolation — each
budget()context stores its counter in acontextvars.ContextVar. Two concurrent agent runs (threads or async tasks) never share a budget counter. - Ref-counted patching — nested
budget()contexts patch only once and unpack cleanly on the last exit. - Zero config — no API keys, no environment variables, no external services.
Contributing
See CONTRIBUTING.md.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file shekel-0.2.0.tar.gz.
File metadata
- Download URL: shekel-0.2.0.tar.gz
- Upload date:
- Size: 26.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31ae674f9174574bd28d9c232f5e9d92e38fb359fd1aa97946864e9cfaa7d2f1
|
|
| MD5 |
b286f6afc8da266cd7fa54bda12fa1c5
|
|
| BLAKE2b-256 |
26255255315e52cc3facd56963c9456a1c9e1153501590d7c98e5b19afc1f85d
|
Provenance
The following attestation bundles were made for shekel-0.2.0.tar.gz:
Publisher:
publish.yml on arieradle/shekel
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
shekel-0.2.0.tar.gz -
Subject digest:
31ae674f9174574bd28d9c232f5e9d92e38fb359fd1aa97946864e9cfaa7d2f1 - Sigstore transparency entry: 1065791205
- Sigstore integration time:
-
Permalink:
arieradle/shekel@7151118e5ebebddb505a95bdfa116f0d80f00c93 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/arieradle
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7151118e5ebebddb505a95bdfa116f0d80f00c93 -
Trigger Event:
push
-
Statement type:
File details
Details for the file shekel-0.2.0-py3-none-any.whl.
File metadata
- Download URL: shekel-0.2.0-py3-none-any.whl
- Upload date:
- Size: 16.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20e654e12b9f637f5ddf226a6723b0ab32830f49b680c9a4c1e48838bc058ce1
|
|
| MD5 |
32910bc699ccdbfe8f0c0e24e7b1b620
|
|
| BLAKE2b-256 |
ecaac0744270476fe23a21ebbf2009fdb597ae250391323569ab685be5d1fd38
|
Provenance
The following attestation bundles were made for shekel-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on arieradle/shekel
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
shekel-0.2.0-py3-none-any.whl -
Subject digest:
20e654e12b9f637f5ddf226a6723b0ab32830f49b680c9a4c1e48838bc058ce1 - Sigstore transparency entry: 1065791208
- Sigstore integration time:
-
Permalink:
arieradle/shekel@7151118e5ebebddb505a95bdfa116f0d80f00c93 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/arieradle
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7151118e5ebebddb505a95bdfa116f0d80f00c93 -
Trigger Event:
push
-
Statement type: