Static budget certificates for LLM-agent workflows (LangGraph / CrewAI / OpenAI Agents SDK). Backed by a machine-checked (Lean 4) cost-soundness theorem.

These details have not been verified by PyPI

Project links

Project description

costwright

Static budget certificates for LLM-agent workflows. Point it at your LangGraph / CrewAI / OpenAI-Agents-SDK repo and it tells you — before anything runs — the worst-case budget ceiling of every workflow graph, where that ceiling comes from, and when it is effectively vacuous.

Backed by a machine-checked (Lean 4) cost-soundness theorem: A Potential-Based Calculus for Resource Bounds of LLM-Agent Workflows (repo) — well-typed ⟹ spend ≤ declared budget, on every trace. costwright is the static frontend of that calculus. It never executes your code (pure AST analysis, zero dependencies).

pip install costwright
costwright check .

costwright — budget certificate check  (schema costwright.v1)

  ▲ src/agents/researcher.py :42   default_dependent  ≤1000 supersteps (framework_default)
  ‼ scripts/run_forever.py   :17   runaway            while-true-driver
  ✗ src/agents/fanout.py     :88   non_certifiable    send-fanout

  12 graph units | ✓ 3 certifiable | ▲ 7 default-dependent | ✗ 1 non-certifiable | ‼ 1 runaway

  ⚠ 6 unit(s) rely on a framework default of ≥1000 supersteps (LangGraph ≥1.0.6) —
    that budget ceiling is effectively vacuous. Set recursion_limit explicitly.

Why

We measured 254 real agent workflows from 45 public production-grade repos:

88% of cyclic workflows rely on a framework default for their only budget bound — or have none at all.
LangGraph's modern default (recursion_limit) is 1000 supersteps (it was 25 before v1.0.6). A "protected" workflow can take a thousand turns before the framework stops it.
Only 12% of LLM calls in real code carry an explicit token cap.

A runtime budget tracker tells you what you spent — after you spent it. A static certificate tells you the worst case before you deploy, rejects the runaway pattern at check time, and the math behind the bound is machine-checked, not vibes.

_{Methodology disclosure: dataset is ~80% LangGraph (204/254 units; CrewAI and Agents-SDK
samples are small); units cluster ~5.6 per repo; public-GitHub visibility bias applies; LangGraph
version assumed ≥1.0.6 (default 1000) where undeclared; false-reject 0 was measured on N=3
surviving rejects. Full experiment: spec, frozen dataset, and audit trail in this repo's
specs/ and results/.}

Commands

`costwright check [path]`

Maps every workflow graph to the typed-budget calculus and reports, per graph unit:

Category	Meaning
`certifiable`	bound is explicit in your code — real certificate
`default_dependent`	bound exists only via a framework default (often vacuous)
`non_certifiable`	uses a construct outside the calculus (Send fan-out, interrupts, hierarchical manager, dynamic goto, or a subgraph too dynamic to bound — see scope below)
`runaway`	genuinely unbounded (`while True` driver, `max_turns=None`, astronomically large limit)
`parse_error`	the analyzer could not reconstruct the graph (reported, never silent)

--json → stable costwright.v1 output for CI/tooling.
--fail-on reject|default-dependent|non-certifiable → exit 1 on policy violation (default: never fails — findings are warnings; strictness is opt-in).
Exit codes: 0 ran fine, 1 policy violated, 2 infrastructure error. Never anything else.

`costwright caps [path]`

Finds LLM constructors without a token cap and tells you the right parameter for that provider (it is parameter-specific, not provider-specific — verified against primary provider docs):

OpenAI Responses → max_output_tokens (bounds reasoning+output)
Azure/OpenAI reasoning (Chat) → max_completion_tokens (reasoning tokens are inside it)
Anthropic standard → max_tokens (with budget_tokens < max_tokens); interleaved/adaptive thinking degrades the ceiling (budget may exceed max_tokens) — flagged
Gemini → max_output_tokens plus thinking_budget (thinking is billed as output but NOT bounded by maxOutputTokens) — flagged when missing

--patch out.diff emits a unified diff (git apply out.diff). costwright never edits your files.

`costwright fuse` (experimental)

Bundles a costwright cost certificate with an eleata-verify risk certificate (a per-output selective-risk SLA) into one tamper-evident costwright.fusion.v1 audit record per agent run — answering both "will it blow my budget?" and "can I trust this output, or send it to a human?"

costwright check . --json > cost.json                 # the cost certificate
# ... your eleata-verify step writes risk.json (VerifyResult.to_dict()) ...
costwright fuse --cost cost.json --risk risk.json --run-id "$RUN_ID"

It is the cartesian product of two independently-scoped certificates — NOT a composed guarantee: it does not assert the agent is "safe", has no aggregate "both passed" boolean, and does not claim a bounded budget preserves the risk SLA. The non-interference theorem (budget ⊥ risk, Hoare-style) that would justify composition is unproven / future work. The bundler is pure stdlib and never imports eleata-verify (costwright's zero-dependency core is intact). See docs/FUSION.md, docs/NON-INTERFERENCE.md, and the runnable examples/fusion_demo.py.

GitHub Action

- uses: hernaninverso/costwright/action@v0.1
  with:
    fail-on: reject        # reject | default-dependent | non-certifiable | none

JSON schema

costwright.v1 is frozen: closed enums (category, provenance, aggregation), stable unit_id, line spans only (no source code in output — CI-log safe). New fields are additive in v2. The signature field is reserved (null in OSS output): signed, independently verifiable certificates are a separate service — contact below.

What costwright does NOT do (exact scope)

It does not bound Send fan-out, interrupts/human-in-the-loop, CrewAI hierarchical mode, or dynamic goto — those map to non_certifiable, never to a fake bound. Conservative by construction: when in doubt, no certificate.
Nested subgraphs (graph.add_node("x", inner.compile())) ARE composed since 0.2.0 — but only when the inner graph is fully visible: a straight-line sequence of direct add_node calls on a uniquely-bound, non-imported local StateGraph. Any way a static reader can't pin down the inner graph's identity or node count — built in a loop/comprehension, bound more than once, passed into a helper, imported, from x import *, add_sequence, a RetryPolicy/error_handler — fails closed (non_certifiable), never a smaller composed number. Dynamic rebinding (globals(), setattr, exec, monkeypatching) is unobservable to any static analyzer and voids the certificate, as with every type checker.
The bound is worst-case (deliberately over-approximate). Tightening it is roadmap.
A token-level dollar bound additionally needs caps on every call (costwright caps) and a per-provider billing ceiling — see §3.2 of the paper for where that holds and degrades.

Theory

The calculus (7 constructs, affine potential in the style of Hofmann–Jost AARA), its cost-soundness theorem, the relational↔functional equivalence, the termination measure, and the affine no-double-spend layer are machine-checked in Lean 4 (no sorry; axioms [propext, Quot.sound]): hernaninverso/typed-resources.

Certification / enterprise

Signed budget certificates (Ed25519, independently verifiable), a live per-provider billing-ceiling feed (providers change semantics monthly — your certificate can silently invalidate), and org-level CI policy are a paid layer on top of this OSS tool. Contact: hernaninverso@gmail.com.

License

Apache-2.0.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.15

Jun 14, 2026

0.2.14

Jun 14, 2026

0.2.13

Jun 14, 2026

0.2.12

Jun 14, 2026

0.2.11

Jun 14, 2026

0.2.10

Jun 14, 2026

This version

0.2.9

Jun 14, 2026

0.2.8

Jun 14, 2026

0.2.7

Jun 14, 2026

0.2.6

Jun 14, 2026

0.2.5

Jun 14, 2026

0.2.4

Jun 14, 2026

0.2.3

Jun 14, 2026

0.2.2

Jun 14, 2026

0.2.1

Jun 14, 2026

0.2.0

Jun 14, 2026

0.1.0

Jun 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

costwright-0.2.9.tar.gz (74.6 kB view details)

Uploaded Jun 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

costwright-0.2.9-py3-none-any.whl (72.9 kB view details)

Uploaded Jun 14, 2026 Python 3

File details

Details for the file costwright-0.2.9.tar.gz.

File metadata

Download URL: costwright-0.2.9.tar.gz
Upload date: Jun 14, 2026
Size: 74.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.5 {"installer":{"name":"uv","version":"0.10.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for costwright-0.2.9.tar.gz
Algorithm	Hash digest
SHA256	`e518b8cba386325c25c59ef4c345801fa37c47ad6a8c8d7dc75e77cff7716db8`
MD5	`bb29a42eb7779cbbace5aeb718f748b0`
BLAKE2b-256	`086e39900b88acffe2562983045f2c300e8177f599edea05be3215188681ef9c`

See more details on using hashes here.

File details

Details for the file costwright-0.2.9-py3-none-any.whl.

File metadata

Download URL: costwright-0.2.9-py3-none-any.whl
Upload date: Jun 14, 2026
Size: 72.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.5 {"installer":{"name":"uv","version":"0.10.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for costwright-0.2.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7db8f348e63d8a623f2ab3d7c845628ce56c1228bb45ca48039263dd5990feda`
MD5	`597206d2ffb60191a65409a91914d160`
BLAKE2b-256	`598e261692a55f31b784943ebf9704bfb1618a3b060fbd3ddb718a7d55a8e786`

See more details on using hashes here.

costwright 0.2.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

costwright

Why

Commands

`costwright check [path]`

`costwright caps [path]`

`costwright fuse` (experimental)

GitHub Action

JSON schema

What costwright does NOT do (exact scope)

Theory

Certification / enterprise

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

costwright 0.2.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

costwright

Why

Commands

costwright check [path]

costwright caps [path]

costwright fuse (experimental)

GitHub Action

JSON schema

What costwright does NOT do (exact scope)

Theory

Certification / enterprise

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`costwright check [path]`

`costwright caps [path]`

`costwright fuse` (experimental)