Pre-flight LLM cost estimation and budget enforcement
Project description
llm-budget
Stop overpaying for LLM API calls. One line of code, automatic model selection, same results.
One function. Auto-selects the cheapest model that fits your task. Tracks every dollar.
Before / After
# BEFORE: You always use the expensive model. $9/day for 1,000 extraction calls.
response = client.messages.create(
model="claude-sonnet-4-20250514", # $0.009/call
messages=[{"role": "user", "content": "Extract dates from this text..."}],
max_tokens=1024,
)
# AFTER: llm-budget picks the cheapest model that can handle the task.
response = llm_budget.smart_route(
client, "Extract dates from this text...",
task_type="extraction",
)
# -> Used claude-3-5-haiku ($0.003/call). 67% cheaper. Same result.
# -> $3/day instead of $9/day. That's $180/month saved.
smart_route() returns the exact same response object as the native SDK. Your response.content[0].text, response.choices[0].message.content, response.usage -- all unchanged. Your existing parsing code doesn't change. It's literally a 1-line swap.
Install
pip install llm-budget
With provider SDKs:
pip install llm-budget[openai] # OpenAI integration
pip install llm-budget[anthropic] # Anthropic integration
pip install llm-budget[langchain] # LangChain integration
pip install llm-budget[crewai] # CrewAI integration
pip install llm-budget[all] # Everything
How It Works
llm-budget knows the pricing, capability tier, and quality score of 20 models across 7 providers. When you tell it what kind of task you're doing, it picks the cheapest model that won't compromise quality.
| Task Type | What It Means | Model Tier Selected |
|---|---|---|
extraction |
Parsing, dates, JSON, structured output | Efficient (cheapest) |
simple |
Formatting, translation, classification | Efficient |
moderate |
Summarization, Q&A, general tasks | Efficient / Mid |
creative |
Writing, brainstorming, marketing copy | Frontier |
coding |
Code generation, debugging, review | Frontier / Reasoning |
complex |
Multi-step reasoning, deep analysis | Frontier |
math |
Calculations, proofs, science | Reasoning |
Quick Start: smart_route() (Recommended)
One function. Auto-selects model, makes the API call, tracks cost.
Anthropic
from anthropic import Anthropic
import llm_budget
client = Anthropic()
# Extraction task -> llm-budget picks haiku (cheap, good enough)
response = llm_budget.smart_route(
client, "Extract all dates from: 'Started March 15. Deadline June 30.'",
task_type="extraction",
)
print(response.content[0].text) # Works exactly like client.messages.create()
# Complex analysis -> llm-budget picks sonnet (frontier, worth the cost)
response = llm_budget.smart_route(
client, "Analyze the trade-offs between microservices and monoliths",
task_type="complex",
)
print(response.content[0].text)
# Check what you spent
print(f"Total: ${llm_budget.get_tracker().get_spend('today'):.4f}")
OpenAI
from openai import OpenAI
import llm_budget
client = OpenAI()
# Simple task -> picks gpt-4o-mini ($0.0004/call instead of $0.006 with gpt-4o)
response = llm_budget.smart_route(
client, "Translate this to French: 'Hello, how are you?'",
task_type="simple",
)
print(response.choices[0].message.content) # Same as client.chat.completions.create()
# Coding task -> picks gpt-4o or o3-mini (needs reasoning capability)
response = llm_budget.smart_route(
client, "Write a Python function to merge two sorted linked lists",
task_type="coding",
)
Full smart_route() API
response = llm_budget.smart_route(
client, # OpenAI() or Anthropic() client
"Your prompt here", # String or messages list
task_type="moderate", # simple/moderate/complex/coding/math/creative/extraction
budget=5.00, # Daily budget in USD (influences model selection when low)
model="gpt-4o", # Explicit override (skips routing, still tracks cost)
max_tokens=1024, # Max output tokens
temperature=0.7, # Forwarded to the API
system="Be helpful", # Forwarded to the API (Anthropic)
)
Key behaviors:
task_typecontrols which models are eligible."extraction"picks cheap models."complex"picks powerful ones.budgetmakes routing budget-aware. When remaining budget is low, it favors cheaper models.modeloverrides routing entirely (but still tracks cost).- All extra kwargs (
temperature,system,top_p, etc.) are forwarded to the underlying API call. - Returns the exact same response as the native SDK. No wrappers, no transformations.
- Cost is automatically recorded to the tracker after every call.
More Control: Decorators
@cost_aware -- Auto-swap models per call
Wrap your existing function. The decorator intercepts the model parameter and swaps it to a cheaper suitable model based on task_type.
import llm_budget
from anthropic import Anthropic
client = Anthropic()
@llm_budget.cost_aware(budget=5.00)
def my_step(task, model="claude-sonnet-4-20250514", task_type="moderate"):
return client.messages.create(
model=model,
messages=[{"role": "user", "content": task}],
max_tokens=1024,
)
# Decorator auto-swaps to haiku for extraction (saves 67%):
result = my_step("Extract dates from this invoice", task_type="extraction")
# Keeps sonnet for complex reasoning (worth the cost):
result = my_step("Analyze quarterly revenue trends", task_type="complex")
@budget -- Hard budget enforcement
Pre-flight cost estimation + hard limits. Blocks calls that would exceed budget.
from llm_budget import budget, BudgetExceeded
@budget(
max_cost=5.00, # $5 daily limit
period="daily", # Resets daily
on_exceed="raise", # "raise" | "warn" | "skip" | "downgrade:gpt-4o-mini"
alert_at=0.8, # Warn at 80% usage
track_model="gpt-4o",
)
def my_llm_call(model, messages):
return openai.chat.completions.create(model=model, messages=messages)
try:
result = my_llm_call(model="gpt-4o", messages=[{"role": "user", "content": "Hello"}])
except BudgetExceeded as e:
print(f"Budget exceeded: {e}")
if e.suggested_model:
print(f"Try cheaper model: {e.suggested_model}")
on_exceed modes:
| Mode | Behavior |
|---|---|
"raise" |
Raises BudgetExceeded exception |
"warn" |
Logs warning, proceeds anyway |
"skip" |
Returns None, skips the API call |
"downgrade:gpt-4o-mini" |
Auto-swaps to cheaper model, proceeds |
Pre-Flight Cost Estimation
Know what you'll spend before you spend it. No API key required.
from llm_budget import estimate, compare
# Estimate a single call
est = estimate(
messages=[{"role": "user", "content": "Summarize this 10-page document..."}],
model="gpt-4o",
expected_output_tokens=500,
)
print(est)
# gpt-4o: ~42 input + ~500 output tokens = $0.005105
print(est.estimated_cost) # 0.005105
print(est.input_cost) # 0.000105
print(est.output_cost) # 0.005000
print(est.input_tokens) # 42
print(est.output_tokens) # 500
# Compare across models (prints a sorted table)
results = compare(
messages="Explain quantum computing in simple terms",
models=["gpt-4o", "gpt-4o-mini", "claude-sonnet-4-20250514", "claude-3-5-haiku-20241022", "deepseek-chat"],
)
# Output:
# Model Input Tok Est. Output Est. Cost
# ---------------------------------------------------------------
# deepseek-chat 8 ~8 $0.000006
# gpt-4o-mini 8 ~8 $0.000006
# claude-3-5-haiku-20241022 8 ~8 $0.000048
# ...
Spend Tracking
Middleware (auto-tracking, zero code changes)
Wrap your client once. Every API call is automatically tracked.
from openai import OpenAI
from llm_budget import track_openai, get_tracker
client = track_openai(OpenAI()) # Wrap once
# Use normally -- cost is recorded automatically
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
)
# Check your spend
tracker = get_tracker()
print(f"Today: ${tracker.get_spend('today'):.4f}")
print(f"This week: ${tracker.get_spend('this_week'):.4f}")
print(f"This month: ${tracker.get_spend('this_month'):.4f}")
print(tracker.get_spend_breakdown("today")) # {"gpt-4o": 0.0042, "gpt-4o-mini": 0.0001}
Works with Anthropic too:
from anthropic import Anthropic
from llm_budget import track_anthropic
client = track_anthropic(Anthropic())
Manual tracking
from llm_budget import Tracker
tracker = Tracker() # SQLite-backed, persists across sessions
tracker = Tracker(db_path="my_app.db") # Custom database path
tracker.record(
model="gpt-4o",
input_tokens=100,
output_tokens=50,
cost_usd=0.00075,
)
tracker.get_spend("today") # Total spend today
tracker.get_spend("monthly", model="gpt-4o") # Spend on gpt-4o this month
tracker.get_spend_breakdown("this_week") # {"gpt-4o": 3.21, "gpt-4o-mini": 0.14}
tracker.get_history(last_n=20) # Recent call records
Agent Tools (for AI agents that self-optimize)
Give your agent three tools. It calls them to decide which model to use, check budget, and compare options. This is the most powerful mode -- the agent itself makes cost-quality decisions.
OpenAI Agent
from openai import OpenAI
from llm_budget import cost_tools_openai, cost_context, cost_prompt
client = OpenAI()
tools = cost_tools_openai() # 3 tools: estimate_cost, compare_models, check_budget
system = cost_prompt(daily_budget=5.00, strategy="balanced")
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "system", "content": system}, ...],
tools=tools,
)
# When the agent calls a tool:
for tool_call in response.choices[0].message.tool_calls or []:
result = cost_context(tool_call) # Auto-detects format, returns string
Anthropic Agent
from anthropic import Anthropic
from llm_budget import cost_tools_anthropic, cost_context, cost_prompt
client = Anthropic()
tools = cost_tools_anthropic()
system = cost_prompt(monthly_budget=50.00, strategy="quality_first")
response = client.messages.create(
model="claude-sonnet-4-20250514",
system=system,
tools=tools,
messages=[...],
)
for block in response.content:
if block.type == "tool_use":
result = cost_context(block) # Auto-detects Anthropic format
The Three Agent Tools
| Tool | What It Does | When to Call |
|---|---|---|
estimate_cost |
Pre-flight cost + capability check + alternatives | Before every LLM call |
compare_models |
Rank models by cost-quality for a specific task type | When choosing which model to use |
check_budget |
Live spend, remaining budget, projections, strategy advice | Periodically, or when budget is tight |
Strategy Presets
# Balanced (default) -- smart tradeoffs
prompt = cost_prompt(strategy="balanced", daily_budget=5.00)
# Cost-first -- minimize spending, cheapest viable models
prompt = cost_prompt(strategy="cost_first", daily_budget=2.00)
# Quality-first -- best models, only save on trivial tasks
prompt = cost_prompt(strategy="quality_first", monthly_budget=100.00)
MCP Server
Expose the tools via Model Context Protocol for Claude Desktop, Cursor, etc.
pip install llm-budget[mcp]
llm-budget serve-mcp
Or in your MCP config:
{
"mcpServers": {
"llm-budget": {
"command": "python",
"args": ["-m", "llm_budget.mcp_server"]
}
}
}
Framework Integrations
LangChain
Budget-aware callback handler that plugs into any LangChain LLM.
pip install llm-budget[langchain] langchain-anthropic # or langchain-openai
from langchain_anthropic import ChatAnthropic
from llm_budget.integrations.langchain import BudgetCallbackHandler
# Tracking only (no enforcement)
handler = BudgetCallbackHandler()
llm = ChatAnthropic(model="claude-sonnet-4-20250514", callbacks=[handler])
result = llm.invoke("Explain quantum computing")
print(f"Cost: ${handler.total_cost:.6f}")
print(f"Tokens: {handler.total_tokens}")
With budget enforcement:
from llm_budget.integrations.langchain import budget_callback
from llm_budget import BudgetExceeded
with budget_callback(max_cost=1.00, period="daily", on_exceed="raise") as cb:
llm = ChatAnthropic(model="claude-sonnet-4-20250514", callbacks=[cb])
try:
for question in questions:
result = llm.invoke(question)
except BudgetExceeded:
print(f"Budget hit after {cb.call_count} calls (${cb.total_cost:.4f})")
Works with any LangChain LLM: ChatOpenAI, ChatAnthropic, ChatGoogleGenerativeAI, etc.
CrewAI
Budget hooks that register globally with CrewAI's hook system.
pip install llm-budget[crewai] crewai
from llm_budget.integrations.crewai import track_crewai
from llm_budget import BudgetExceeded
with track_crewai(max_cost=10.00, period="daily") as hooks:
crew = Crew(agents=[researcher, writer], tasks=[...])
try:
result = crew.kickoff()
except BudgetExceeded:
print(f"Budget hit: ${hooks.total_cost:.4f}")
print(f"Crew run cost: ${hooks.total_cost:.4f} ({hooks.call_count} LLM calls)")
Or register hooks manually:
from llm_budget.integrations.crewai import BudgetHooks
hooks = BudgetHooks(max_cost=10.00, period="daily")
hooks.register()
# ... run agents ...
hooks.unregister()
CLI
# Check your spend
$ llm-budget status
LLM Budget Status
==================================================
Daily $0.45
Weekly $3.21
Monthly $12.54
Total $47.89
Breakdown by model (monthly):
gpt-4o $9.80
gpt-4o-mini $2.74
# Compare costs across models
$ llm-budget compare "Explain quantum computing" --models gpt-4o,gpt-4o-mini,claude-sonnet-4-20250514
# Estimate a single call
$ llm-budget estimate "Write a Python tutorial" --model gpt-4o --output-tokens 500
# List all 20 supported models with pricing
$ llm-budget models
$ llm-budget models --provider anthropic
# Update pricing from LiteLLM (stays current)
$ llm-budget update-prices
# Show recent API call history
$ llm-budget history --last 20
Supported Models
20 models across 7 providers. Pricing auto-updates via llm-budget update-prices.
| Provider | Models | Tier |
|---|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo, o1, o1-mini, o3-mini | Frontier / Efficient / Reasoning |
| Anthropic | claude-sonnet-4, claude-opus-4, claude-3.5-haiku, claude-3.5-sonnet | Frontier / Efficient |
| DeepSeek | deepseek-chat, deepseek-reasoner | Efficient / Reasoning |
| gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash | Frontier / Efficient | |
| Meta | llama-3.1-70b, llama-3.1-8b | Mid / Efficient |
| Mistral | mistral-large, mistral-small | Mid / Efficient |
Capability Tiers
| Tier | Quality Score | Cost Range (per call) | Best For |
|---|---|---|---|
| Reasoning | 88-98 | $0.001 - $0.01 | Math, proofs, science, multi-step logic |
| Frontier | 88-96 | $0.003 - $0.02 | Complex analysis, coding, creative writing |
| Mid | 75-78 | $0.001 - $0.003 | General tasks, moderate reasoning |
| Efficient | 55-73 | $0.0001 - $0.003 | Extraction, formatting, translation, Q&A |
API Reference
Smart Routing
smart_route(client, task, *, task_type, budget, model, max_tokens, **api_kwargs)-- One-call routing + trackingcost_aware(budget, task_type, provider, tracker)-- Decorator for auto model selection
Cost Estimation
estimate(messages, model, expected_output_tokens)-- Pre-flight cost estimatecompare(messages, models, expected_output_tokens)-- Compare costs across modelscount_tokens(text_or_messages, model)-- Count tokens
Budget Enforcement
@budget(max_cost, period, on_exceed, alert_at, track_model)-- DecoratorBudgetEnforcer(tracker)-- Programmatic enforcementBudgetPolicy(max_cost, period, on_exceed, alert_at)-- Policy configBudgetExceeded-- Exception (has.suggested_model,.remaining_budget)
Spend Tracking
Tracker(db_path=None)-- SQLite spend trackerget_tracker()-- Default tracker singletontracker.record(model, input_tokens, output_tokens, cost_usd)-- Record a calltracker.get_spend(period, model)-- Query spendtracker.get_spend_breakdown(period)-- Spend by modeltracker.get_history(last_n, model)-- Recent recordstrack_openai(client, tracker)-- Auto-tracking middleware for OpenAItrack_anthropic(client, tracker)-- Auto-tracking middleware for Anthropic
Model Intelligence
recommend_models(prompt, task_type, budget_remaining, available_models)-- Get model recommendationsis_model_suitable(model, task_type)-- Check if model fits task(bool, reason)get_capability_summary(model)-- One-line model summaryget_pricing(model)-- GetModelPricingobjectget_registry()-- Access the full pricing registry
Agent Tools
cost_tools_openai()-- Tool schemas in OpenAI function calling formatcost_tools_anthropic()-- Tool schemas in Anthropic tool_use formatcost_tools()-- Framework-agnostic tool definitionscost_context(tool_call, tracker, budget_config)-- Universal tool handler (auto-detects format)cost_prompt(daily_budget, weekly_budget, monthly_budget, strategy)-- System prompt generator
Framework Integrations
LangChain (from llm_budget.integrations.langchain import ...):
BudgetCallbackHandler(max_cost, period, on_exceed, alert_at, tracker, model)-- LangChain callback handlerbudget_callback(...)-- Context manager that creates + yields a handler
CrewAI (from llm_budget.integrations.crewai import ...):
BudgetHooks(max_cost, period, on_exceed, alert_at, tracker, model, default_model)-- CrewAI hook classtrack_crewai(...)-- Context manager that creates + registers hooks
Feature Comparison
| Feature | llm-budget | TokenCost | LiteLLM |
|---|---|---|---|
| Automatic model routing | Yes (smart_route) |
No | No |
| Pre-flight cost estimation | Yes | No | No |
| Budget enforcement | Yes (decorator) | No | No |
| Auto-downgrade on budget | Yes | No | No |
| Agent cost tools | Yes (3 tools) | No | No |
| LangChain integration | Yes (callback handler) | No | No |
| CrewAI integration | Yes (hooks) | No | No |
| MCP server | Yes | No | No |
| Standalone (no SDK required) | Yes | Yes | No |
| Multi-model comparison | Yes | No | Partial |
| Local SQLite tracking | Yes | No | Yes (proxy) |
| Zero cloud dependency | Yes | Yes | No |
| CLI | Yes | No | Yes |
Contributing
llm-budget is open source under the MIT license. Contributions are welcome!
git clone https://github.com/aman-source/llm-budget.git
cd llm-budget
pip install -e ".[dev]"
pytest
Areas where we'd love help:
- Framework integrations -- Add support for LlamaIndex, AutoGen, or other frameworks
- Async support -- LangChain
ainvoke()and streaming response tracking - Provider coverage -- Add pricing for new models and providers
- Bug fixes -- See open issues
Running tests:
pytest # All tests (260+)
pytest --cov=llm_budget -q # With coverage (target: 90%+)
pytest tests/test_langchain_integration.py -v # Specific module
All framework integrations (LangChain, CrewAI, MCP) use optional dependencies with mocked tests -- you don't need to install the frameworks to run the test suite.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_budget-0.1.0.tar.gz.
File metadata
- Download URL: llm_budget-0.1.0.tar.gz
- Upload date:
- Size: 57.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b5243044882ca158ab6dd4423e3bdc128384068ecea8d6bc1088d207be28b78
|
|
| MD5 |
31f9fc6b266811043aa587bb85572e75
|
|
| BLAKE2b-256 |
e46d3427897796cba9d9f6651cbe17751bd379147a6bebd07a67b17c9aabea5e
|
File details
Details for the file llm_budget-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llm_budget-0.1.0-py3-none-any.whl
- Upload date:
- Size: 51.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76bd7a8cb037c652c8f08d381b1bdf7ca10e14c279741b7134ce5e31476a3a59
|
|
| MD5 |
70542e21c6e6e067ba41712ed529a14f
|
|
| BLAKE2b-256 |
d39e0cc48b8c367995b71a969e7ddb2362e76ca4fe6ff41310eb7bddf61a5827
|