Skip to main content

LLM budget enforcement and cost tracking. Zero config — with budget(max_usd=1.00): run_agent(). Works with LangGraph, CrewAI, raw OpenAI/Anthropic/Gemini.

Project description

shekel

PyPI version Python versions License CI Unit Tests Integration Tests codecov Downloads Documentation

LLM budget enforcement and cost tracking for Python. One line. Zero config.

with budget(max_usd=1.00):
    run_my_agent()  # raises BudgetExceededError if spend exceeds $1.00

I spent $47 debugging a LangGraph retry loop. The agent kept failing, LangGraph kept retrying, and OpenAI kept charging — all while I slept. I built shekel so you don't have to learn that lesson yourself.


⚡️ What's New in v0.2.7: OpenTelemetry Metrics Integration

Shekel now exposes LLM cost and budget lifecycle data via OpenTelemetry — filling the gap the OTel GenAI spec leaves around cost and budget metrics.

pip install shekel[otel]
from shekel import budget
from shekel.otel import ShekelMeter

meter = ShekelMeter()  # uses global MeterProvider; silent no-op if OTel absent

with budget(max_usd=1.00, name="workflow") as b:
    run_my_agent()

meter.unregister()

Eight instruments ship out of the box:

Instrument Type What it tracks
shekel.llm.cost_usd Counter Cost per LLM call (tagged by model & provider)
shekel.llm.calls_total Counter Call count per model
shekel.llm.tokens_input_total Counter Input tokens (opt-in)
shekel.llm.tokens_output_total Counter Output tokens (opt-in)
shekel.budget.exits_total Counter Budget exits by status=completed|exceeded|warned
shekel.budget.cost_usd UpDownCounter Cumulative spend per budget
shekel.budget.utilization Histogram 0.0–1.0 utilization on exit
shekel.budget.spend_rate Histogram USD/second spend rate
shekel.budget.fallbacks_total Counter Fallback model activations
shekel.budget.autocaps_total Counter Child budget auto-cap events

Two new ObservabilityAdapter events are also available for custom integrations: on_budget_exit and on_autocap.

📖 OTel Integration Guide

Previous: Native Gemini & HuggingFace Support (v0.2.6)

Zero-config budget enforcement for Google Gemini and HuggingFace Inference API — same with budget(): pattern, no changes needed.

# Google Gemini
import google.genai as genai
from shekel import budget

client = genai.Client(api_key="...")
with budget(max_usd=1.00) as b:
    response = client.models.generate_content(model="gemini-2.0-flash", contents="...")
print(f"Cost: ${b.spent:.4f}")

Extensible Provider Architecture (v0.2.5)

Add any LLM provider without touching shekel core:

from shekel.providers.base import ADAPTER_REGISTRY, ProviderAdapter

class MyProviderAdapter(ProviderAdapter):
    @property
    def name(self) -> str:
        return "myprovider"

    def install_patches(self) -> None: ...
    def extract_tokens(self, response) -> tuple: ...
    # ... and 4 more methods

ADAPTER_REGISTRY.register(MyProviderAdapter())

with budget(max_usd=10.00):
    response = my_provider_client.call()  # Shekel tracks cost

✅ Comprehensive Integration Test Suite

274 integration tests across 7 real providers — real API keys run in CI:

Provider Tests Coverage
OpenAI 26 Sync, async, streaming, budget enforcement, callbacks, fallback, multi-turn
Anthropic 24 Sync, async, streaming, budget enforcement, callbacks, multi-turn
Groq 30 Custom pricing, nested budgets, streaming, concurrent calls, rate limiting
Google Gemini 42 Multi-turn, streaming, JSON mode, function calling, token accuracy
HuggingFace 12 Sync, streaming, custom pricing, budget enforcement
LangGraph 14 Multi-node graphs, conditional edges, budget propagation
Ollama 38 Local inference, streaming, nested budgets

✨ Core Features

🌳 Nested Budgets

Enforce independent spend limits per workflow stage with automatic rollup:

with budget(max_usd=10.00, name="workflow") as workflow:
    with budget(max_usd=2.00, name="research"):
        sources = search_papers()      # $0.80

    with budget(max_usd=5.00, name="analysis"):
        insights = analyze(sources)    # $3.50

    final = polish(insights)           # $0.60

print(workflow.tree())
# workflow: $5.00 / $10.00
#   research: $0.80 / $2.00
#   analysis: $3.50 / $5.00

Why you'll love this:

  • 🎯 Per-stage budgets — Cap each phase independently
  • 🔒 Auto-capping — Child budgets can't exceed parent's remaining
  • 📊 Cost attribution — See exactly where money was spent
  • 🌳 Visual tree — Debug complex workflows instantly

📖 Nested Budgets Guide

🔭 Langfuse Integration

See exactly where your budget is going and when it breaks. Circuit-break events, budget hierarchy, and per-call spend stream to Langfuse automatically:

from langfuse import Langfuse
from shekel.integrations import AdapterRegistry
from shekel.integrations.langfuse import LangfuseAdapter

lf = Langfuse(public_key="...", secret_key="...")
adapter = LangfuseAdapter(client=lf, trace_name="my-app")
AdapterRegistry.register(adapter)

with budget(max_usd=10.00, name="agent") as b:
    run_agent()  # Costs flow to Langfuse automatically!

What you get:

  • ⚠️ Circuit break events — Captured in Langfuse the moment a budget is exceeded
  • 🔄 Fallback annotations — Model switches recorded with timing and cost
  • 🌳 Nested budget hierarchy — Child budgets map to child spans
  • 💰 Per-call spend streaming — See cumulative cost after every LLM call

📖 Langfuse Integration Guide


Install

pip install shekel[openai]       # OpenAI
pip install shekel[anthropic]    # Anthropic
pip install shekel[gemini]       # Google Gemini (google-genai SDK)
pip install shekel[huggingface]  # HuggingFace Inference API
pip install shekel[langfuse]     # Langfuse (budget visibility and circuit-break events)
pip install shekel[litellm]      # LiteLLM (budget enforcement across 100+ providers)
pip install shekel[otel]         # OpenTelemetry metrics (ShekelMeter)
pip install shekel[all]          # All providers + Langfuse + OTel
pip install shekel[all-models]   # All above + tokencost (400+ model pricing)
pip install shekel[cli]          # CLI tools (shekel estimate, shekel models)

Quick Start

Simple Budget Enforcement

from shekel import budget, BudgetExceededError

# Enforce a hard cap
try:
    with budget(max_usd=1.00, warn_at=0.8) as b:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": "Hello!"}]
        )
    print(f"Spent ${b.spent:.4f}")
except BudgetExceededError as e:
    print(f"Budget exceeded: ${e.spent:.2f} > ${e.limit:.2f}")

Track Without Limits

# Track spend without enforcing a limit
with budget() as b:
    run_my_agent()
print(f"Cost: ${b.spent:.4f}")

Fallback to Cheaper Model

# Switch to gpt-4o-mini at 80% of budget instead of raising
with budget(max_usd=0.50, fallback={"at_pct": 0.8, "model": "gpt-4o-mini"}) as b:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )

if b.model_switched:
    print(f"Switched to {b.fallback['model']} at ${b.switched_at_usd:.4f}")

Accumulating Sessions

# Budget variables accumulate across multiple uses
session = budget(max_usd=5.00, name="session")

with session:
    run_step_1()  # Spends $1.50

with session:
    run_step_2()  # Spends $2.00

print(f"Session total: ${session.spent:.2f}")  # $3.50

🌳 Nested Budgets

Perfect for multi-stage agents, research workflows, and production AI pipelines.

Real-World Example: AI Research Agent

from shekel import budget

def research_agent(topic: str, max_budget: float = 10.0):
    """Research agent with per-stage budget control."""
    
    with budget(max_usd=max_budget, name="research_agent") as agent:
        # Phase 1: Web search ($2 budget)
        with budget(max_usd=2.00, name="web_search") as search:
            results = search_web(topic)
            if search.spent > 1.50:
                print("⚠️  Search phase used 75% of budget")
        
        # Phase 2: Content analysis ($5 budget)
        with budget(max_usd=5.00, name="analysis") as analysis:
            key_points = extract_insights(results)
            themes = identify_themes(key_points)
        
        # Phase 3: Report generation ($3 budget)
        with budget(max_usd=3.00, name="report_gen") as report:
            draft = generate_report(themes)
            final = refine_report(draft)
    
    # Print cost breakdown
    print(agent.tree())
    return final

# Run the agent
report = research_agent("AI safety alignment", max_budget=15.0)

Auto-Capping: Smart Budget Management

with budget(max_usd=10.00, name="workflow") as workflow:
    # Spend $7 on initial processing
    process_data()  # Spends $7.00
    
    # Child wants $5, but only $3 left
    # Shekel automatically caps child to $3!
    with budget(max_usd=5.00, name="final_step") as step:
        print(f"Requested: $5.00")
        print(f"Actual limit: ${step.limit:.2f}")  # $3.00 (auto-capped!)
        generate_output()  # Won't exceed $3

Hierarchical Cost Attribution

with budget(max_usd=50.00, name="production_pipeline") as pipeline:
    with budget(max_usd=10.00, name="ingestion"):
        ingest_data()
    
    with budget(max_usd=20.00, name="processing"):
        with budget(max_usd=8.00, name="validation"):
            validate_data()
        
        with budget(max_usd=12.00, name="transformation"):
            transform_data()
    
    with budget(max_usd=15.00, name="output"):
        generate_report()

# Detailed breakdown
print(f"Total: ${pipeline.spent:.2f}")
print(f"Direct spend: ${pipeline.spent_direct:.2f}")
print(f"Child spend: ${pipeline.spent_by_children:.2f}")
print(f"\nFull tree:")
print(pipeline.tree())

Track-Only Children

# Parent enforces budget, but track children without limits
with budget(max_usd=20.00, name="workflow") as workflow:
    # This child has no limit (max_usd=None)
    with budget(max_usd=None, name="exploration"):
        explore_options()  # Tracked but unlimited
    
    # This child is limited
    with budget(max_usd=5.00, name="finalization"):
        finalize()

print(f"Exploration cost: ${workflow.children[0].spent:.2f}")
print(f"Total cost: ${workflow.spent:.2f}")

Advanced Features

Async Support

async with budget(max_usd=1.00) as b:
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )

Full async support — async with budget(...) works for both top-level and nested budgets.

Decorator Pattern

from shekel import with_budget

@with_budget(max_usd=0.10)
def call_llm(prompt: str):
    return client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )

Custom Pricing

# Override model pricing
with budget(
    max_usd=1.00,
    price_per_1k_tokens={"input": 0.001, "output": 0.003}
) as b:
    call_custom_model()

Spend Summary

with budget(max_usd=2.00) as b:
    run_my_agent()

print(b.summary())
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# shekel spend summary
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# Total: $1.2450 / $2.00 (62%)
# 
# gpt-4o: $1.2450 (5 calls)
#   Input:  45.2k tokens → $0.1130
#   Output: 11.3k tokens → $1.1320
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

CLI

# Estimate cost before running
shekel estimate --model gpt-4o --input-tokens 1000 --output-tokens 500
# Model:          gpt-4o
# Input tokens:   1,000
# Output tokens:  500
# Estimated cost: $0.007500

# List all bundled models with pricing
shekel models
shekel models --provider openai
shekel models --provider anthropic

API Reference

budget(...)

Parameter Type Default Description
max_usd float | None None Hard spend cap in USD. None = track only.
name str | None None Budget name. Required for nested budgets.
warn_at float | None None Fraction of limit (0.0–1.0) at which to call on_warn.
on_warn Callable | None None Callback at warn_at threshold. Receives (spent, limit).
fallback dict | None None Switch model at threshold: {"at_pct": 0.8, "model": "gpt-4o-mini"}. Same provider only.
on_fallback Callable | None None Callback on fallback switch. Receives (spent, limit, fallback_model).
max_llm_calls int | None None Hard cap on number of LLM API calls.
price_per_1k_tokens dict | None None Override pricing: {"input": 0.001, "output": 0.003}.

Properties

Property Type Description
spent float Total USD spent (includes children).
remaining float | None USD remaining (based on effective limit).
limit float | None Effective limit (auto-capped if nested).
name str | None Budget name.
calls_used int Number of LLM API calls made so far.
calls_remaining int | None Calls remaining before max_llm_calls is hit.
parent Budget | None Parent budget, or None if root.
children list[Budget] List of child budgets.
active_child Budget | None Currently active child.
full_name str Hierarchical path (e.g., "workflow.research").
spent_direct float Direct spend on this budget (excluding children).
spent_by_children float Sum of all child spend.
model_switched bool True if fallback was activated.
switched_at_usd float | None Spend level when fallback triggered.
fallback_spent float Cost incurred on the fallback model.

Methods

Method Returns Description
summary() str Formatted spend summary with model breakdown.
summary_data() dict Structured spend data as dictionary.
tree() str Visual hierarchy of the budget tree.
reset() None Reset spend tracking (only outside context).

BudgetExceededError

Attribute Description
spent Total spend when limit was hit.
limit The configured max_usd.
model Model that triggered the error.
tokens {"input": N, "output": N} from the last call.

Supported Models

Model Input / 1k Output / 1k
gpt-4o $0.00250 $0.01000
gpt-4o-mini $0.000150 $0.000600
o1 $0.01500 $0.06000
o1-mini $0.00300 $0.01200
gpt-3.5-turbo $0.000500 $0.001500
claude-3-5-sonnet-20241022 $0.00300 $0.01500
claude-3-haiku-20240307 $0.000250 $0.001250
claude-3-opus-20240229 $0.01500 $0.07500
gemini-1.5-flash $0.0000750 $0.000300
gemini-1.5-pro $0.00125 $0.00500

Versioned model names resolve automatically — gpt-4o-2024-08-06 maps to gpt-4o.

For unlisted models: pass price_per_1k_tokens or install shekel[all-models] for 400+ models via tokencost.


Framework Integration

Works seamlessly with:

  • OpenTelemetry — 8 instruments for cost/budget metrics; compatible with any OTel backend (v0.2.7)
  • Langfuse — Full observability: cost streaming, span hierarchy, circuit-break events (v0.2.4)
  • LangGraph — Budget entire agent workflows
  • CrewAI — Per-agent budget tracking
  • AutoGen — Multi-agent cost control
  • LlamaIndex — RAG pipeline budgets
  • Haystack — Document processing budgets

Any framework that calls openai or anthropic under the hood works automatically. See examples/ for demos.


How It Works

  • Monkey-patching — Wraps openai.chat.completions.create() and anthropic.messages.create() on context entry
  • ContextVar isolation — Each budget() stores its counter in a ContextVar; concurrent agents never share state
  • Hierarchical tracking — Parent/child relationships track spend propagation automatically
  • Ref-counted patching — Nested contexts patch only once
  • Zero config — No API keys, no external services

Documentation

Full documentation: arieradle.github.io/shekel


Machine Discovery Metadata

This project provides:

  • LLM budget control
  • AI agent cost governance
  • Token budgeting for LLM APIs
  • Spend limits and usage quotas
  • Guardrails for agentic systems

Works with ecosystems including:

OpenAI, Anthropic Claude, LangChain, LangGraph, Langfuse, CrewAI

Use cases include:

  • Preventing runaway LLM costs
  • Enforcing AI agent budgets
  • LLMOps governance
  • Token usage control
  • AI API spend guardrails

Contributing

See CONTRIBUTING.md.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shekel-0.2.7.tar.gz (212.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shekel-0.2.7-py3-none-any.whl (44.6 kB view details)

Uploaded Python 3

File details

Details for the file shekel-0.2.7.tar.gz.

File metadata

  • Download URL: shekel-0.2.7.tar.gz
  • Upload date:
  • Size: 212.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for shekel-0.2.7.tar.gz
Algorithm Hash digest
SHA256 c1803154c6d219f3810b8c55e7b61960d0d1307c81f64ebbf7101a69d38c1f8a
MD5 8aac9a13b5797729659060b023965ca0
BLAKE2b-256 b1e22258df932f41350fe81e5215a173cf0bc67d7f1e86b40f1e96d15d020ef0

See more details on using hashes here.

Provenance

The following attestation bundles were made for shekel-0.2.7.tar.gz:

Publisher: publish.yml on arieradle/shekel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file shekel-0.2.7-py3-none-any.whl.

File metadata

  • Download URL: shekel-0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 44.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for shekel-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 2bd73aa531189b774ebab59da18c3e41be0776f1d2d338336f5e60cbc8006895
MD5 0c30febcc834b71b4eb852e6d39a956b
BLAKE2b-256 f5bff65dacfc2729f118f250e3be35a711e6284cf2d329922d95ddb57113f81a

See more details on using hashes here.

Provenance

The following attestation bundles were made for shekel-0.2.7-py3-none-any.whl:

Publisher: publish.yml on arieradle/shekel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page