LLM cost reservation ledger — pre-flight reserve, commit, release.
Project description
Snipz
An LLM cost reservation ledger for Python. Cap your spend per user, per tenant, per feature — and never overshoot, even under concurrent load.
Status: v0.2.x — pre-1.0. The engine is feature-complete; the head-to-head benchmark holds the cap on real Postgres at 1000 concurrent reservations while LiteLLM
BudgetManagerand Shekel overshoot by 20×. API may shift before v1.0; pinsnipz>=0.2,<0.3to allow patches and forbid breaks.
async with await budget.reserve(Scope("user", "u_42"), Decimal("10")) as r:
response = await call_anthropic(...)
await r.observe(price(response)) # exit auto-commits at observed cost
# on exception: auto-release; the cap is never overshot
Why
Every team building LLM features rebuilds cost guardrails from scratch. Existing libraries (LiteLLM BudgetManager, Shekel) follow an estimate-then-record pattern — they check the cap, run the call, then log the spend. Under concurrent load this lets two requests both pass a cap check at $4.95 of a $5.00 cap and both run, blowing the cap. The benchmark below shows them overshooting a $5.00 cap by 20× at 1000 concurrent — spending $100.00 instead of $5.00 — not a typo.
Snipz is a reservation ledger: every call holds budget inside a transaction with SELECT … FOR UPDATE (or BEGIN IMMEDIATE on SQLite) before the LLM runs, commits the actual cost on success, releases on failure. The cap-check and the ledger insert are a single atomic step. The cap is never overshot.
Head-to-head correctness benchmark
The proof: 1000 concurrent reservations of $0.10 each against a $5.00 cap. Same workload, three backends, side-by-side.
Cap-correctness comparison — Snipz vs. estimate-then-record competitors
=======================================================================
Concurrency: 1000
Cap: $5.00
Per-req: $0.10
Cap [########################################] $5.00
Snipz [######################################## ] $5.00 — ok (held)
LiteLLM BudgetManager [################################################################################] $100.00 — OVERSHOT by $95.00
Shekel [################################################################################] $100.00 — OVERSHOT by $95.00
Headline claim reproduced: Snipz held the cap; LiteLLM BudgetManager, Shekel overshot.
| Backend | Successes | Final spend | Cap held? | Duration |
|---|---|---|---|---|
| Snipz (Postgres) | 50 / 1000 | $5.00 | yes | 3.6s |
LiteLLM BudgetManager |
1000 / 1000 | $100.00 (20× cap) | no | 0.03s |
| Shekel | 1000 / 1000 | $100.00 (20× cap) | no | 0.03s |
Snipz is ~120× slower per attempt — because it actually does the work: open a transaction, take a row lock, sum the ledger, check the cap, insert if OK, commit. The competitors are fast because they skip the lock entirely. Two concurrent callers both read current_cost=0.00, both pass the check, both write — at 1000 concurrent on a $5 cap, every single attempt commits.
The benchmark uses a 1 ms simulated LLM-call gap between cap-check and cost-record. Real LLM calls are 100–2000 ms — the race window in production is 100–2000× larger than the simulation. This is the conservative number.
Reproduce in one command (needs Docker + the bench-competitors extra):
pip install "snipz[bench-competitors]"
uv run python -m benchmarks.competitor_comparison --concurrency 1000
Or run just Snipz's single-backend cap-correctness benchmark (the same numbers, no competitors):
uv run python benchmarks/cap_correctness.py --testcontainers-postgres --concurrency 1000
Quickstart
import asyncio
from decimal import Decimal
from snipz import Budget, Scope
async def main():
budget = Budget("snipz.db") # or "postgresql://..."
await budget.migrate()
await budget.set_limit(Scope("user", "u_42"), Decimal("500")) # $5/month cap
async with await budget.reserve(Scope("user", "u_42"), Decimal("10")) as r:
response = await call_anthropic(...)
await r.observe(price_from_usage(response.usage))
# exit auto-commits at observed cost; auto-releases on exception
await budget.close()
asyncio.run(main())
What you can rely on:
- Atomic cap-check. Two concurrent reserves at the cap → one wins, one raises
BudgetExceededError. Verified by the benchmark above. - Idempotent retries. Pass
request_id="..."toreserve(); parallel retries with the same id converge on one ledger row. - Streaming-aware.
r.observe(actual)updates the in-flight cost mid-stream; the cap-check formula usesMAX(actual, estimated)so concurrent requests see the true running total. - Late-commit safety. If your call takes longer than the TTL, the sweeper releases the row; a subsequent
commit()still settles cleanly and fireson_overrun. - Multi-scope. Reserve against
[user_scope, tenant_scope, feature_scope]in one call — all caps checked atomically, atomic rollback on any failure.
Install
pip install snipz # core: SQLite, async
pip install snipz[postgres] # + asyncpg for Postgres
pip install snipz[openai] # + tiktoken for exact OpenAI token counts
pip install snipz[bench-competitors] # + litellm, shekel to reproduce the head-to-head benchmark
What's in the box
| What | Where | What it's for |
|---|---|---|
Budget, Reservation, Scope |
from snipz import ... |
The async engine — reserve / commit / release / observe |
| Sync wrapper (experimental) | from snipz.sync import Budget |
For sync codebases — background event loop, raises if called from inside an async loop |
Pricing |
from snipz import Pricing |
Vendored price book (Pricing.default()) + DB overrides (Pricing.with_backend(...)) |
| Estimators | from snipz.estimators import AnthropicEstimator, OpenAIEstimator, FallbackEstimator |
Pre-flight token counters; OpenAI is exact via tiktoken |
@budget.guard |
budget.guard(scope=..., estimate=..., actual=...) |
Decorator that wraps an async LLM call with the full reserve/observe/commit/release lifecycle |
| Hooks | budget.on_reserved, on_committed, on_released, on_overrun |
Plug-in points for metrics, alerting, audit logs |
| Sweeper | snipz sweep [--interval N] CLI or snipz.sweep.sweep_loop() |
Background job that releases expired reservations |
snipz update-pricing |
CLI | Refresh the vendored pricing.toml from LiteLLM upstream |
All async/sync surfaces share the same engine and correctness guarantees.
Deep dives
snipz.md— positioning, competitor analysis, build phasesarchitecture.md— layered architecture, schema, full decision logsnipz-protocol.md— wire protocol spec (DRAFT — comments open)scenarios.md— concurrency walkthroughs
Development
uv sync # install all deps + .venv
uv run pytest # 140 tests against SQLite
uv run pytest --postgres # + 15 Postgres integration tests (needs Docker)
uv run ruff check src/ tests/ benchmarks/
uv run mypy src/snipz benchmarks/
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file snipz-0.2.0.tar.gz.
File metadata
- Download URL: snipz-0.2.0.tar.gz
- Upload date:
- Size: 62.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.24 {"installer":{"name":"uv","version":"0.11.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a95ec16944a681422679633ef828bfe57d8d7aec5ef3dd3249a92cc0297aaa03
|
|
| MD5 |
55ec5d88d0dcd3ce894ca286c1d67b12
|
|
| BLAKE2b-256 |
237714339c4d378e689d35855278560216de687f6aba055814f3bec5dc97704c
|
File details
Details for the file snipz-0.2.0-py3-none-any.whl.
File metadata
- Download URL: snipz-0.2.0-py3-none-any.whl
- Upload date:
- Size: 55.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.24 {"installer":{"name":"uv","version":"0.11.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a54425dc310c35e753f478fb640105e21a6dfa7bd1caf6268d0bb4e5d132e4d6
|
|
| MD5 |
c204fb1cce509d44a1883cfceb7f0154
|
|
| BLAKE2b-256 |
857f9e2e40a44bc50016c021d833364859c58f74745460a2185dd0be6a15c0bd
|