Skip to main content

Budget: pre-flight cost caps, spend attribution, and circuit-breakers for LLM calls.

Project description

cendor-tokenguard

Stop runaway LLM bills, and get per-feature / per-user cost attribution for free. One decorator, one context manager. No dashboard, no account, no infra.

Caught a $40 runaway loop before it ran away — and told you which feature spent the rest.

PyPI license · pip install cendor-tokenguard

from cendor.core import instrument
from cendor.tokenguard import budget, track, report

client = instrument(openai_client)              # wrap once; tokenguard subscribes, never patches

@budget(usd=0.50, on_exceed="downgrade", downgrade={"gpt-4o": "gpt-4o-mini"})
def answer(q: str) -> str:
    with track(feature="support_bot", user_id="alice"):   # ambient attribution, zero bookkeeping
        resp = client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": q}])
        return resp.choices[0].message.content

for row in report(group_by=["feature", "user_id"]):       # where did the money go?
    print(row["tags"], row["usd"], row["calls"])

Highlights

  • Pre-flight circuit breakeron_exceed="block" raises before an over-budget call runs; "downgrade" reroutes to a cheaper model pre-flight; "truncate" degrades; "raise" stops a runaway loop; or call your own function.
  • Reasoning models, handled — you can't predict a thinking model's hidden reasoning pre-flight, so on_exceed="clamp" injects the provider's own token ceiling (max_completion_tokens/max_tokens) sized to the remaining budget — the call is capped server-side instead of overspending. report() breaks out reasoning_tokens, and the cumulative gate enforces on exact usage (which already includes reasoning). See docs/tokenguard.md → Reasoning models.
  • Decorator and context manager — budgets nest (an inner downgrade never masks an outer hard cap); config is validated at creation (a typo'd on_exceed or a map-less downgrade is a ValueError, never a silent no-op).
  • Cost attribution, freetrack(feature=…, user_id=…) tags ambient spend via contextvars (sync + async); report(group_by=[…]) shows where the money went, reasoning tokens included.
  • Cost as a test assertionreport().assert_under(usd=0.05, feature="search").
  • Pre-flight projectionestimate(model, messages) prices a call without making it.
  • Durable + bounded — pluggable use_sink(tokenguard.sinks.SQLiteSink / OTelSink); FIFO-bounded in-memory buffer (configure(max_records=…), dropped()).
  • No silent USD blind spots — a call whose model isn't in the price table records $0, so a USD cap can't bite. tokenguard warns once per model (UnpricedModelWarning) and counts these in unpriced_calls() / report()'s unpriced_calls; configure(on_unpriced="raise") makes on_exceed="block" reject them. A token cap is unaffected — tokens are counted regardless of price.
  • Thread-safe, with one caveat — the spend buffer and SQLiteSink are lock-guarded for concurrent emits, but budgets/tags are ContextVar-based: asyncio tasks inherit them, a plain threading.Thread does not (carry them with contextvars.copy_context()).

Streaming timing — post-flight raise/truncate fire when a stream is consumed, not when it's launched (the call is accounted once the chunk iterator drains). A loop that launches many streams before draining them can overspend — drain each stream before the next, or use a pre-flight mode (block/downgrade/clamp), which is unaffected.

Wrap-around — it rides the call you already make. Offline and standalone — bundled prices, no account.

See docs/tokenguard.md · CHANGELOG. Part of the Cendor stack — github.com/cendorhq/Cendor. Powered by PowerAI Labs. Apache-2.0; provided "as is", without warranty — use at your own risk (LICENSE §7–8).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cendor_tokenguard-1.0.0.tar.gz (25.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cendor_tokenguard-1.0.0-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file cendor_tokenguard-1.0.0.tar.gz.

File metadata

  • Download URL: cendor_tokenguard-1.0.0.tar.gz
  • Upload date:
  • Size: 25.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cendor_tokenguard-1.0.0.tar.gz
Algorithm Hash digest
SHA256 2da5baf17e8a1691d4598b936231658f35e5401e9baa57deb477b2e536691858
MD5 133e76e594c10f71a49e2a96f10166c6
BLAKE2b-256 24c068867dfa3833a67bf1f1a506e6e2e2d4c4e5371242be4fd07f77d0351c3a

See more details on using hashes here.

Provenance

The following attestation bundles were made for cendor_tokenguard-1.0.0.tar.gz:

Publisher: release.yml on cendorhq/Cendor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cendor_tokenguard-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for cendor_tokenguard-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 455f10d963a8f14d116ff3f0463cce66f598f597f7aef2be1e72b38affd34d51
MD5 ebffd40baf801e95a5942edf9e53e4a4
BLAKE2b-256 f3ba98a7920a8f94c45154db782de4d5fc38344999fefec484aed9a127679e21

See more details on using hashes here.

Provenance

The following attestation bundles were made for cendor_tokenguard-1.0.0-py3-none-any.whl:

Publisher: release.yml on cendorhq/Cendor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page