Static budget certificates for LLM-agent workflows (LangGraph / CrewAI / OpenAI Agents SDK). Backed by a machine-checked (Lean 4) cost-soundness theorem.
Project description
costwright
Static budget certificates for LLM-agent workflows. Point it at your LangGraph / CrewAI / OpenAI-Agents-SDK repo and it tells you — before anything runs — the worst-case budget ceiling of every workflow graph, where that ceiling comes from, and when it is effectively vacuous.
Backed by a machine-checked (Lean 4) cost-soundness theorem: A Potential-Based Calculus for Resource Bounds of LLM-Agent Workflows (repo) — well-typed ⟹ spend ≤ declared budget, on every trace. costwright is the static frontend of that calculus. It never executes your code (pure AST analysis, zero dependencies).
pip install costwright
costwright check .
costwright — budget certificate check (schema costwright.v1)
▲ src/agents/researcher.py :42 default_dependent ≤1000 supersteps (framework_default)
‼ scripts/run_forever.py :17 runaway while-true-driver
✗ src/agents/fanout.py :88 non_certifiable send-fanout
12 graph units | ✓ 3 certifiable | ▲ 7 default-dependent | ✗ 1 non-certifiable | ‼ 1 runaway
⚠ 6 unit(s) rely on a framework default of ≥1000 supersteps (LangGraph ≥1.0.6) —
that budget ceiling is effectively vacuous. Set recursion_limit explicitly.
Why
We measured 254 real agent workflows from 45 public production-grade repos:
- 88% of cyclic workflows rely on a framework default for their only budget bound — or have none at all.
- LangGraph's modern default (
recursion_limit) is 1000 supersteps (it was 25 before v1.0.6). A "protected" workflow can take a thousand turns before the framework stops it. - Only 12% of LLM calls in real code carry an explicit token cap.
A runtime budget tracker tells you what you spent — after you spent it. A static certificate tells you the worst case before you deploy, rejects the runaway pattern at check time, and the math behind the bound is machine-checked, not vibes.
Methodology disclosure: dataset is ~80% LangGraph (204/254 units; CrewAI and Agents-SDK
samples are small); units cluster ~5.6 per repo; public-GitHub visibility bias applies; LangGraph
version assumed ≥1.0.6 (default 1000) where undeclared; false-reject 0 was measured on N=3
surviving rejects. Full experiment: spec, frozen dataset, and audit trail in this repo's
specs/ and results/.
Commands
costwright check [path]
Maps every workflow graph to the typed-budget calculus and reports, per graph unit:
| Category | Meaning |
|---|---|
certifiable |
bound is explicit in your code — real certificate |
default_dependent |
bound exists only via a framework default (often vacuous) |
non_certifiable |
uses a construct outside the calculus (Send fan-out, interrupts, hierarchical manager, dynamic goto, or a subgraph too dynamic to bound — see scope below) |
runaway |
genuinely unbounded (while True driver, max_turns=None, astronomically large limit) |
parse_error |
the analyzer could not reconstruct the graph (reported, never silent) |
--json→ stablecostwright.v1output for CI/tooling.--fail-on reject|default-dependent|non-certifiable→ exit 1 on policy violation (default: never fails — findings are warnings; strictness is opt-in).- Exit codes:
0ran fine,1policy violated,2infrastructure error. Never anything else.
costwright caps [path]
Finds LLM constructors without a token cap and tells you the right parameter for that provider (it is parameter-specific, not provider-specific — verified against primary provider docs):
- OpenAI Responses →
max_output_tokens(bounds reasoning+output) - Azure/OpenAI reasoning (Chat) →
max_completion_tokens(reasoning tokens are inside it) - Anthropic standard →
max_tokens(withbudget_tokens < max_tokens); interleaved/adaptive thinking degrades the ceiling (budget may exceedmax_tokens) — flagged - Gemini →
max_output_tokensplusthinking_budget(thinking is billed as output but NOT bounded bymaxOutputTokens) — flagged when missing
--patch out.diff emits a unified diff (git apply out.diff). costwright never edits your files.
costwright fuse (experimental)
Bundles a costwright cost certificate with an eleata-verify
risk certificate (a per-output selective-risk SLA) into one tamper-evident costwright.fusion.v1
audit record per agent run — answering both "will it blow my budget?" and "can I trust this
output, or send it to a human?"
costwright check . --json > cost.json # the cost certificate
# ... your eleata-verify step writes risk.json (VerifyResult.to_dict()) ...
costwright fuse --cost cost.json --risk risk.json --run-id "$RUN_ID"
It is the cartesian product of two independently-scoped certificates — NOT a composed
guarantee: it does not assert the agent is "safe", has no aggregate "both passed" boolean, and does
not claim a bounded budget preserves the risk SLA. The non-interference theorem (budget ⊥ risk,
Hoare-style) that would justify composition is unproven / future work. The bundler is pure stdlib
and never imports eleata-verify (costwright's zero-dependency core is intact). See
docs/FUSION.md, docs/NON-INTERFERENCE.md, and the
runnable examples/fusion_demo.py.
GitHub Action
- uses: hernaninverso/costwright/action@v0.1
with:
fail-on: reject # reject | default-dependent | non-certifiable | none
JSON schema
costwright.v1 is frozen: closed enums (category, provenance, aggregation), stable unit_id,
line spans only (no source code in output — CI-log safe). New fields are additive in v2.
The signature field is reserved (null in OSS output): signed, independently verifiable
certificates are a separate service — contact below.
What costwright does NOT do (exact scope)
- It does not bound
Sendfan-out, interrupts/human-in-the-loop, CrewAI hierarchical mode, or dynamicgoto— those map tonon_certifiable, never to a fake bound. Conservative by construction: when in doubt, no certificate. - Nested subgraphs (
graph.add_node("x", inner.compile())) ARE composed since0.2.0— but only when the inner graph is fully visible: a straight-line sequence of directadd_nodecalls on a uniquely-bound, non-imported localStateGraph. Any way a static reader can't pin down the inner graph's identity or node count — built in a loop/comprehension, bound more than once, passed into a helper, imported,from x import *,add_sequence, aRetryPolicy/error_handler— fails closed (non_certifiable), never a smaller composed number. Dynamic rebinding (globals(),setattr,exec, monkeypatching) is unobservable to any static analyzer and voids the certificate, as with every type checker. - The bound is worst-case (deliberately over-approximate). Tightening it is roadmap.
- A token-level dollar bound additionally needs caps on every call (
costwright caps) and a per-provider billing ceiling — see §3.2 of the paper for where that holds and degrades.
Theory
The calculus (7 constructs, affine potential in the style of Hofmann–Jost AARA), its cost-soundness
theorem, the relational↔functional equivalence, the termination measure, and the affine
no-double-spend layer are machine-checked in Lean 4 (no sorry; axioms [propext, Quot.sound]):
hernaninverso/typed-resources.
Certification / enterprise
Signed budget certificates (Ed25519, independently verifiable), a live per-provider billing-ceiling feed (providers change semantics monthly — your certificate can silently invalidate), and org-level CI policy are a paid layer on top of this OSS tool. Contact: hernaninverso@gmail.com.
License
Apache-2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file costwright-0.2.15.tar.gz.
File metadata
- Download URL: costwright-0.2.15.tar.gz
- Upload date:
- Size: 77.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.5 {"installer":{"name":"uv","version":"0.10.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e3e61940e51dad906e3681c750fcb51093da410e3c7e7093bb0acb057555368
|
|
| MD5 |
73cc6a5e77af510568f1ec6484ae951a
|
|
| BLAKE2b-256 |
b511f4bef11e9da73a6cc3405cb05a60ec625c7189e771de68aa80ec850bb58f
|
File details
Details for the file costwright-0.2.15-py3-none-any.whl.
File metadata
- Download URL: costwright-0.2.15-py3-none-any.whl
- Upload date:
- Size: 76.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.5 {"installer":{"name":"uv","version":"0.10.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a80cfcb2675a3566da79f4a952f0bc7040c32f4ea0390fbed2fd96ef5205c26
|
|
| MD5 |
3e8a3b3c720047e99bc4bbbd513560c7
|
|
| BLAKE2b-256 |
004e2bfe19eaa8f2efd16a4e79370e2098ea0cb47581787334a8365017d7f2d6
|