Skip to main content

LLM cost tracking and budget enforcement. Zero config — with budget(max_usd=1.00): run_agent(). Works with LangGraph, CrewAI, raw OpenAI/Anthropic.

Project description

shekel

PyPI version Python versions License CI Performance Integration Tests codecov Downloads Documentation

LLM cost tracking and budget enforcement for Python. One line. Zero config.

with budget(max_usd=1.00):
    run_my_agent()  # raises BudgetExceededError if spend exceeds $1.00

I spent $47 debugging a LangGraph retry loop. The agent kept failing, LangGraph kept retrying, and OpenAI kept charging — all while I slept. I built shekel so you don't have to learn that lesson yourself.


⚡️ What's New in v0.2.5: Extensible Provider Architecture

Built an open architecture for adding new LLM providers without touching shekel's core. Validated with comprehensive integration tests.

🔧 Provider Registry Architecture

Shekel now uses a pluggable provider adapter pattern, enabling the community to add support for any LLM provider:

from shekel.providers.base import ADAPTER_REGISTRY, ProviderAdapter

class MyProviderAdapter(ProviderAdapter):
    @property
    def name(self) -> str:
        return "myprovider"

    # Implement: install_patches(), remove_patches(), extract_tokens(), wrap_stream()
    def install_patches(self) -> None: ...
    def extract_tokens(self, response) -> tuple: ...
    # ... 5 more methods

# Register once at import time
ADAPTER_REGISTRY.register(MyProviderAdapter())

# Works everywhere automatically:
with budget(max_usd=10.00):
    response = my_provider_client.call()  # Shekel tracks cost

What this enables:

  • Add new providers without modifying shekel core
  • Standard interface all providers implement
  • Easy community contributions (Cohere, Replicate, vLLM, Mistral, etc.)

✅ Validated with Real-World Integration Tests

Provider architecture validated and stress-tested with comprehensive integration test suites:

  • 25+ Groq API integration tests — Custom pricing, nested budgets, streaming, concurrent calls, rate limiting
  • 30+ Google Gemini API integration tests — Multi-turn conversations, JSON mode, function calls, token accuracy
  • Real API keys in CI pipeline ensure it works end-to-end

⚙️ Production-Grade Reliability

  • Exponential backoff retry logic — Gracefully handles rate limiting and transient failures
  • 100+ integration test scenarios — Comprehensive validation across multiple providers
  • Concurrent test stability — Reduced flakiness in multi-provider scenarios

✨ Core Features

🌳 Nested Budgets

Control costs for multi-stage AI workflows with hierarchical budget tracking:

with budget(max_usd=10.00, name="workflow") as workflow:
    with budget(max_usd=2.00, name="research"):
        sources = search_papers()      # $0.80

    with budget(max_usd=5.00, name="analysis"):
        insights = analyze(sources)    # $3.50

    final = polish(insights)           # $0.60

print(workflow.tree())
# workflow: $5.00 / $10.00
#   research: $0.80 / $2.00
#   analysis: $3.50 / $5.00

Why you'll love this:

  • 🎯 Per-stage budgets — Cap each phase independently
  • 🔒 Auto-capping — Child budgets can't exceed parent's remaining
  • 📊 Cost attribution — See exactly where money was spent
  • 🌳 Visual tree — Debug complex workflows instantly

📖 Nested Budgets Guide

🔭 Langfuse Integration

Full LLM observability with zero configuration. Track costs, visualize budget hierarchies, and debug overruns in Langfuse automatically:

from langfuse import Langfuse
from shekel.integrations import AdapterRegistry
from shekel.integrations.langfuse import LangfuseAdapter

lf = Langfuse(public_key="...", secret_key="...")
adapter = LangfuseAdapter(client=lf, trace_name="my-app")
AdapterRegistry.register(adapter)

with budget(max_usd=10.00, name="agent") as b:
    run_agent()  # Costs flow to Langfuse automatically!

What you get:

  • 💰 Real-time cost streaming — See spend after each LLM call
  • 🌳 Nested budget hierarchy — Child budgets → child spans
  • ⚠️ Circuit break events — Alerts when budgets are exceeded
  • 🔄 Fallback annotations — Track model switches

📖 Langfuse Integration Guide


Install

pip install shekel[openai]       # OpenAI
pip install shekel[anthropic]    # Anthropic
pip install shekel[langfuse]     # Langfuse observability
pip install shekel[all]          # OpenAI + Anthropic + Langfuse
pip install shekel[all-models]   # All above + tokencost (400+ model pricing)
pip install shekel[cli]          # CLI tools (shekel estimate, shekel models)

Quick Start

Simple Budget Enforcement

from shekel import budget, BudgetExceededError

# Enforce a hard cap
try:
    with budget(max_usd=1.00, warn_at=0.8) as b:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": "Hello!"}]
        )
    print(f"Spent ${b.spent:.4f}")
except BudgetExceededError as e:
    print(f"Budget exceeded: ${e.spent:.2f} > ${e.limit:.2f}")

Track Without Limits

# Track spend without enforcing a limit
with budget() as b:
    run_my_agent()
print(f"Cost: ${b.spent:.4f}")

Fallback to Cheaper Model

# Fall back to gpt-4o-mini instead of raising
with budget(max_usd=0.50, fallback="gpt-4o-mini") as b:
    response = client.chat.completions.create(
        model="gpt-4o",  # Will switch to gpt-4o-mini if needed
        messages=[{"role": "user", "content": prompt}]
    )
    
if b.model_switched:
    print(f"Switched to {b.fallback} at ${b.switched_at_usd:.4f}")

Accumulating Sessions

# Budget variables accumulate across multiple uses
session = budget(max_usd=5.00, name="session")

with session:
    run_step_1()  # Spends $1.50

with session:
    run_step_2()  # Spends $2.00

print(f"Session total: ${session.spent:.2f}")  # $3.50

🌳 Nested Budgets (v0.2.3)

Perfect for multi-stage agents, research workflows, and production AI pipelines.

Real-World Example: AI Research Agent

from shekel import budget

def research_agent(topic: str, max_budget: float = 10.0):
    """Research agent with per-stage budget control."""
    
    with budget(max_usd=max_budget, name="research_agent") as agent:
        # Phase 1: Web search ($2 budget)
        with budget(max_usd=2.00, name="web_search") as search:
            results = search_web(topic)
            if search.spent > 1.50:
                print("⚠️  Search phase used 75% of budget")
        
        # Phase 2: Content analysis ($5 budget)
        with budget(max_usd=5.00, name="analysis") as analysis:
            key_points = extract_insights(results)
            themes = identify_themes(key_points)
        
        # Phase 3: Report generation ($3 budget)
        with budget(max_usd=3.00, name="report_gen") as report:
            draft = generate_report(themes)
            final = refine_report(draft)
    
    # Print cost breakdown
    print(agent.tree())
    return final

# Run the agent
report = research_agent("AI safety alignment", max_budget=15.0)

Auto-Capping: Smart Budget Management

with budget(max_usd=10.00, name="workflow") as workflow:
    # Spend $7 on initial processing
    process_data()  # Spends $7.00
    
    # Child wants $5, but only $3 left
    # Shekel automatically caps child to $3!
    with budget(max_usd=5.00, name="final_step") as step:
        print(f"Requested: $5.00")
        print(f"Actual limit: ${step.limit:.2f}")  # $3.00 (auto-capped!)
        generate_output()  # Won't exceed $3

Hierarchical Cost Attribution

with budget(max_usd=50.00, name="production_pipeline") as pipeline:
    with budget(max_usd=10.00, name="ingestion"):
        ingest_data()
    
    with budget(max_usd=20.00, name="processing"):
        with budget(max_usd=8.00, name="validation"):
            validate_data()
        
        with budget(max_usd=12.00, name="transformation"):
            transform_data()
    
    with budget(max_usd=15.00, name="output"):
        generate_report()

# Detailed breakdown
print(f"Total: ${pipeline.spent:.2f}")
print(f"Direct spend: ${pipeline.spent_direct:.2f}")
print(f"Child spend: ${pipeline.spent_by_children:.2f}")
print(f"\nFull tree:")
print(pipeline.tree())

Track-Only Children

# Parent enforces budget, but track children without limits
with budget(max_usd=20.00, name="workflow") as workflow:
    # This child has no limit (max_usd=None)
    with budget(max_usd=None, name="exploration"):
        explore_options()  # Tracked but unlimited
    
    # This child is limited
    with budget(max_usd=5.00, name="finalization"):
        finalize()

print(f"Exploration cost: ${workflow.children[0].spent:.2f}")
print(f"Total cost: ${workflow.spent:.2f}")

Advanced Features

Async Support

async with budget(max_usd=1.00) as b:
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )

Note: Async nesting not yet supported in v0.2.3. Use sync nested budgets or single-level async.

Decorator Pattern

from shekel import with_budget

@with_budget(max_usd=0.10)
def call_llm(prompt: str):
    return client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )

Custom Pricing

# Override model pricing
with budget(
    max_usd=1.00,
    price_per_1k_tokens={"input": 0.001, "output": 0.003}
) as b:
    call_custom_model()

Spend Summary

with budget(max_usd=2.00) as b:
    run_my_agent()

print(b.summary())
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# shekel spend summary
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# Total: $1.2450 / $2.00 (62%)
# 
# gpt-4o: $1.2450 (5 calls)
#   Input:  45.2k tokens → $0.1130
#   Output: 11.3k tokens → $1.1320
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

CLI

# Estimate cost before running
shekel estimate --model gpt-4o --input-tokens 1000 --output-tokens 500
# Model:          gpt-4o
# Input tokens:   1,000
# Output tokens:  500
# Estimated cost: $0.007500

# List all bundled models with pricing
shekel models
shekel models --provider openai
shekel models --provider anthropic

API Reference

budget(...)

Parameter Type Default Description
max_usd float | None None Hard spend cap in USD. None = track only.
name str | None None v0.2.3: Budget name. Required for nesting.
warn_at float | None None Fraction of limit (0.0–1.0) at which to warn.
on_exceed Callable | None None Callback at warn_at threshold. Receives (spent, limit).
fallback str | None None Model to switch to when max_usd is hit. Same provider only.
on_fallback Callable | None None Callback on fallback switch. Receives (spent, limit, fallback_model).
hard_cap float | None max_usd * 2 Absolute ceiling when fallback is active.
price_per_1k_tokens dict | None None Override pricing: {"input": 0.001, "output": 0.003}.
persistent bool False DEPRECATED v0.2.3: Budgets always accumulate now.

Properties

Property Type Description
spent float Total USD spent (includes children).
remaining float | None USD remaining (based on effective limit).
limit float | None Effective limit (auto-capped if nested).
name str | None Budget name.
parent Budget | None v0.2.3: Parent budget, or None if root.
children list[Budget] v0.2.3: List of child budgets.
active_child Budget | None v0.2.3: Currently active child.
full_name str v0.2.3: Hierarchical path (e.g., "workflow.research").
spent_direct float v0.2.3: Direct spend (excluding children).
spent_by_children float v0.2.3: Sum of all child spend.
model_switched bool True if fallback was activated.
switched_at_usd float | None Spend level when fallback triggered.
fallback_spent float Cost on the fallback model.

Methods

Method Returns Description
summary() str Formatted spend summary with model breakdown.
summary_data() dict Structured spend data as dictionary.
tree() str v0.2.3: Visual hierarchy of budget tree.
reset() None Reset spend tracking (only outside context).

BudgetExceededError

Attribute Description
spent Total spend when limit was hit.
limit The configured max_usd.
model Model that triggered the error.
tokens {"input": N, "output": N} from the last call.

Supported Models

Model Input / 1k Output / 1k
gpt-4o $0.00250 $0.01000
gpt-4o-mini $0.000150 $0.000600
o1 $0.01500 $0.06000
o1-mini $0.00300 $0.01200
gpt-3.5-turbo $0.000500 $0.001500
claude-3-5-sonnet-20241022 $0.00300 $0.01500
claude-3-haiku-20240307 $0.000250 $0.001250
claude-3-opus-20240229 $0.01500 $0.07500
gemini-1.5-flash $0.0000750 $0.000300
gemini-1.5-pro $0.00125 $0.00500

Versioned model names resolve automatically — gpt-4o-2024-08-06 maps to gpt-4o.

For unlisted models: pass price_per_1k_tokens or install shekel[all-models] for 400+ models via tokencost.


Framework Integration

Works seamlessly with:

  • Langfuse — Full observability: cost streaming, span hierarchy, circuit-break events (v0.2.4)
  • LangGraph — Budget entire agent workflows
  • CrewAI — Per-agent budget tracking
  • AutoGen — Multi-agent cost control
  • LlamaIndex — RAG pipeline budgets
  • Haystack — Document processing budgets

Any framework that calls openai or anthropic under the hood works automatically. See examples/ for demos.


How It Works

  • Monkey-patching — Wraps openai.chat.completions.create() and anthropic.messages.create() on context entry
  • ContextVar isolation — Each budget() stores its counter in a ContextVar; concurrent agents never share state
  • Hierarchical tracking — Parent/child relationships track spend propagation automatically
  • Ref-counted patching — Nested contexts patch only once
  • Zero config — No API keys, no external services

Migration Guide (v0.2.2 → v0.2.3)

Breaking Changes

Budget variables now accumulate by default:

# v0.2.2: Budget reset on each entry
b = budget(max_usd=10.00)
with b: spend_1()  # Spends $2
with b: spend_2()  # Was $0, now spends $2 more
# v0.2.2: b.spent == $2
# v0.2.3: b.spent == $4  ⚠️ ACCUMULATES!

Migration:

  • If you relied on reset behavior: Create new budget() instances instead
  • If you used persistent=True: Remove it (now the default)

Names required for nesting:

# v0.2.3: Names required when nesting
with budget(max_usd=10, name="parent"):    # ✅ Required
    with budget(max_usd=5, name="child"):  # ✅ Required
        work()

New Features

  • ✅ Nested budgets with automatic propagation
  • ✅ Auto-capping to parent's remaining budget
  • tree() method for visual hierarchy
  • spent_direct and spent_by_children properties
  • full_name for hierarchical naming
  • ✅ Max nesting depth of 5 levels

Documentation

Full documentation: arieradle.github.io/shekel


Contributing

See CONTRIBUTING.md.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shekel-0.2.5.tar.gz (163.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shekel-0.2.5-py3-none-any.whl (35.3 kB view details)

Uploaded Python 3

File details

Details for the file shekel-0.2.5.tar.gz.

File metadata

  • Download URL: shekel-0.2.5.tar.gz
  • Upload date:
  • Size: 163.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for shekel-0.2.5.tar.gz
Algorithm Hash digest
SHA256 bf0d31d919ff268f55650f62ebd9bdc246a948c65e785f27f3ef7876b596005e
MD5 3bfb5507891e23b4e93930273a840ecc
BLAKE2b-256 60719a05e445c1cd7747340d0507964d02e04c16e810a269ca2f74fe527e51c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for shekel-0.2.5.tar.gz:

Publisher: publish.yml on arieradle/shekel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file shekel-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: shekel-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 35.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for shekel-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 b218a3eea0386efa3510c40c74d248f452978f1198ead4c6591033f721d70c18
MD5 635547b650f9dd1b5d1d22ed5c11f1e0
BLAKE2b-256 35469179f7a88bd8308335bb82dd2411b752d843a5057036491bd32ae1e84796

See more details on using hashes here.

Provenance

The following attestation bundles were made for shekel-0.2.5-py3-none-any.whl:

Publisher: publish.yml on arieradle/shekel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page