Skip to main content

LLM cost reservation ledger — pre-flight reserve, commit, release.

Project description

Snipz

An LLM cost reservation ledger for Python. Cap your spend per user, per tenant, per feature — and never overshoot, even under concurrent load.

Status: v0.2.x — pre-1.0. The engine is feature-complete; the head-to-head benchmark holds the cap on real Postgres at 1000 concurrent reservations while LiteLLM BudgetManager and Shekel overshoot by 20×. API may shift before v1.0; pin snipz>=0.2,<0.3 to allow patches and forbid breaks.

async with await budget.reserve(Scope("user", "u_42"), Decimal("10")) as r:
    response = await call_anthropic(...)
    await r.observe(price(response))   # exit auto-commits at observed cost
# on exception: auto-release; the cap is never overshot

Why

Every team building LLM features rebuilds cost guardrails from scratch. Existing libraries (LiteLLM BudgetManager, Shekel) follow an estimate-then-record pattern — they check the cap, run the call, then log the spend. Under concurrent load this lets two requests both pass a cap check at $4.95 of a $5.00 cap and both run, blowing the cap. The benchmark below shows them overshooting a $5.00 cap by 20× at 1000 concurrent — spending $100.00 instead of $5.00 — not a typo.

Snipz is a reservation ledger: every call holds budget inside a transaction with SELECT … FOR UPDATE (or BEGIN IMMEDIATE on SQLite) before the LLM runs, commits the actual cost on success, releases on failure. The cap-check and the ledger insert are a single atomic step. The cap is never overshot.


Head-to-head correctness benchmark

The proof: 1000 concurrent reservations of $0.10 each against a $5.00 cap. Same workload, three backends, side-by-side.

Cap-correctness comparison — Snipz vs. estimate-then-record competitors
=======================================================================
  Concurrency: 1000
  Cap:         $5.00
  Per-req:     $0.10

  Cap     [########################################] $5.00

  Snipz                 [########################################                                        ] $5.00    — ok (held)
  LiteLLM BudgetManager [################################################################################] $100.00  — OVERSHOT by $95.00
  Shekel                [################################################################################] $100.00  — OVERSHOT by $95.00

  Headline claim reproduced: Snipz held the cap; LiteLLM BudgetManager, Shekel overshot.
Backend Successes Final spend Cap held? Duration
Snipz (Postgres) 50 / 1000 $5.00 yes 3.6s
LiteLLM BudgetManager 1000 / 1000 $100.00 (20× cap) no 0.03s
Shekel 1000 / 1000 $100.00 (20× cap) no 0.03s

Snipz is ~120× slower per attempt — because it actually does the work: open a transaction, take a row lock, sum the ledger, check the cap, insert if OK, commit. The competitors are fast because they skip the lock entirely. Two concurrent callers both read current_cost=0.00, both pass the check, both write — at 1000 concurrent on a $5 cap, every single attempt commits.

The benchmark uses a 1 ms simulated LLM-call gap between cap-check and cost-record. Real LLM calls are 100–2000 ms — the race window in production is 100–2000× larger than the simulation. This is the conservative number.

Reproduce in one command (needs Docker + the bench-competitors extra):

pip install "snipz[bench-competitors]"
uv run python -m benchmarks.competitor_comparison --concurrency 1000

Or run just Snipz's single-backend cap-correctness benchmark (the same numbers, no competitors):

uv run python benchmarks/cap_correctness.py --testcontainers-postgres --concurrency 1000

Quickstart

import asyncio
from decimal import Decimal
from snipz import Budget, Scope

async def main():
    budget = Budget("snipz.db")                                # or "postgresql://..."
    await budget.migrate()
    await budget.set_limit(Scope("user", "u_42"), Decimal("500"))   # $5/month cap

    async with await budget.reserve(Scope("user", "u_42"), Decimal("10")) as r:
        response = await call_anthropic(...)
        await r.observe(price_from_usage(response.usage))
        # exit auto-commits at observed cost; auto-releases on exception

    await budget.close()

asyncio.run(main())

What you can rely on:

  • Atomic cap-check. Two concurrent reserves at the cap → one wins, one raises BudgetExceededError. Verified by the benchmark above.
  • Idempotent retries. Pass request_id="..." to reserve(); parallel retries with the same id converge on one ledger row.
  • Streaming-aware. r.observe(actual) updates the in-flight cost mid-stream; the cap-check formula uses MAX(actual, estimated) so concurrent requests see the true running total.
  • Late-commit safety. If your call takes longer than the TTL, the sweeper releases the row; a subsequent commit() still settles cleanly and fires on_overrun.
  • Multi-scope. Reserve against [user_scope, tenant_scope, feature_scope] in one call — all caps checked atomically, atomic rollback on any failure.

Install

pip install snipz                       # core: SQLite, async
pip install snipz[postgres]             # + asyncpg for Postgres
pip install snipz[openai]               # + tiktoken for exact OpenAI token counts
pip install snipz[bench-competitors]    # + litellm, shekel to reproduce the head-to-head benchmark

What's in the box

What Where What it's for
Budget, Reservation, Scope from snipz import ... The async engine — reserve / commit / release / observe
Sync wrapper (experimental) from snipz.sync import Budget For sync codebases — background event loop, raises if called from inside an async loop
Pricing from snipz import Pricing Vendored price book (Pricing.default()) + DB overrides (Pricing.with_backend(...))
Estimators from snipz.estimators import AnthropicEstimator, OpenAIEstimator, FallbackEstimator Pre-flight token counters; OpenAI is exact via tiktoken
@budget.guard budget.guard(scope=..., estimate=..., actual=...) Decorator that wraps an async LLM call with the full reserve/observe/commit/release lifecycle
Hooks budget.on_reserved, on_committed, on_released, on_overrun Plug-in points for metrics, alerting, audit logs
Sweeper snipz sweep [--interval N] CLI or snipz.sweep.sweep_loop() Background job that releases expired reservations
snipz update-pricing CLI Refresh the vendored pricing.toml from LiteLLM upstream

All async/sync surfaces share the same engine and correctness guarantees.


Deep dives


Development

uv sync                          # install all deps + .venv
uv run pytest                    # 140 tests against SQLite
uv run pytest --postgres         # + 15 Postgres integration tests (needs Docker)
uv run ruff check src/ tests/ benchmarks/
uv run mypy src/snipz benchmarks/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snipz-0.2.0.tar.gz (62.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snipz-0.2.0-py3-none-any.whl (55.2 kB view details)

Uploaded Python 3

File details

Details for the file snipz-0.2.0.tar.gz.

File metadata

  • Download URL: snipz-0.2.0.tar.gz
  • Upload date:
  • Size: 62.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.24 {"installer":{"name":"uv","version":"0.11.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for snipz-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a95ec16944a681422679633ef828bfe57d8d7aec5ef3dd3249a92cc0297aaa03
MD5 55ec5d88d0dcd3ce894ca286c1d67b12
BLAKE2b-256 237714339c4d378e689d35855278560216de687f6aba055814f3bec5dc97704c

See more details on using hashes here.

File details

Details for the file snipz-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: snipz-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 55.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.24 {"installer":{"name":"uv","version":"0.11.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for snipz-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a54425dc310c35e753f478fb640105e21a6dfa7bd1caf6268d0bb4e5d132e4d6
MD5 c204fb1cce509d44a1883cfceb7f0154
BLAKE2b-256 857f9e2e40a44bc50016c021d833364859c58f74745460a2185dd0be6a15c0bd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page