Skip to main content

Static budget certificates for LLM-agent workflows (LangGraph / CrewAI / OpenAI Agents SDK). Backed by a machine-checked (Lean 4) cost-soundness theorem.

Project description

costwright

PyPI Paper DOI

Static budget certificates for LLM-agent workflows. Point it at your LangGraph / CrewAI / OpenAI-Agents-SDK repo and it tells you — before anything runs — the worst-case budget ceiling of every workflow graph, where that ceiling comes from, and when it is effectively vacuous.

Backed by a machine-checked (Lean 4) cost-soundness theorem: A Potential-Based Calculus for Resource Bounds of LLM-Agent Workflows (repo) — well-typed ⟹ spend ≤ declared budget, on every trace. costwright is the static frontend of that calculus. It never executes your code (pure AST analysis, zero dependencies).

pip install costwright
costwright check .
costwright — budget certificate check  (schema costwright.v1)

  ▲ src/agents/researcher.py :42   default_dependent  ≤1000 supersteps (framework_default)
  ‼ scripts/run_forever.py   :17   runaway            while-true-driver
  ✗ src/agents/fanout.py     :88   non_certifiable    send-fanout

  12 graph units | ✓ 3 certifiable | ▲ 7 default-dependent | ✗ 1 non-certifiable | ‼ 1 runaway

  ⚠ 6 unit(s) rely on a framework default of ≥1000 supersteps (LangGraph ≥1.0.6) —
    that budget ceiling is effectively vacuous. Set recursion_limit explicitly.

Why

We measured 254 real agent workflows from 45 public production-grade repos:

  • 88% of cyclic workflows rely on a framework default for their only budget bound — or have none at all.
  • LangGraph's modern default (recursion_limit) is 1000 supersteps (it was 25 before v1.0.6). A "protected" workflow can take a thousand turns before the framework stops it.
  • Only 12% of LLM calls in real code carry an explicit token cap.

A runtime budget tracker tells you what you spent — after you spent it. A static certificate tells you the worst case before you deploy, rejects the runaway pattern at check time, and the math behind the bound is machine-checked, not vibes.

Methodology disclosure: dataset is ~80% LangGraph (204/254 units; CrewAI and Agents-SDK samples are small); units cluster ~5.6 per repo; public-GitHub visibility bias applies; LangGraph version assumed ≥1.0.6 (default 1000) where undeclared; false-reject 0 was measured on N=3 surviving rejects. Full experiment: spec, frozen dataset, and audit trail in this repo's specs/ and results/.

Commands

costwright check [path]

Maps every workflow graph to the typed-budget calculus and reports, per graph unit:

Category Meaning
certifiable bound is explicit in your code — real certificate
default_dependent bound exists only via a framework default (often vacuous)
non_certifiable uses a construct outside the calculus (Send fan-out, interrupts, hierarchical manager, dynamic goto, or a subgraph too dynamic to bound — see scope below)
runaway genuinely unbounded (while True driver, max_turns=None, astronomically large limit)
parse_error the analyzer could not reconstruct the graph (reported, never silent)
  • --json → stable costwright.v1 output for CI/tooling.
  • --fail-on reject|default-dependent|non-certifiable → exit 1 on policy violation (default: never fails — findings are warnings; strictness is opt-in).
  • Exit codes: 0 ran fine, 1 policy violated, 2 infrastructure error. Never anything else.

costwright caps [path]

Finds LLM constructors without a token cap and tells you the right parameter for that provider (it is parameter-specific, not provider-specific — verified against primary provider docs):

  • OpenAI Responses → max_output_tokens (bounds reasoning+output)
  • Azure/OpenAI reasoning (Chat) → max_completion_tokens (reasoning tokens are inside it)
  • Anthropic standard → max_tokens (with budget_tokens < max_tokens); interleaved/adaptive thinking degrades the ceiling (budget may exceed max_tokens) — flagged
  • Gemini → max_output_tokens plus thinking_budget (thinking is billed as output but NOT bounded by maxOutputTokens) — flagged when missing

--patch out.diff emits a unified diff (git apply out.diff). costwright never edits your files.

costwright fuse (experimental)

Bundles a costwright cost certificate with an eleata-verify risk certificate (a per-output selective-risk SLA) into one tamper-evident costwright.fusion.v1 audit record per agent run — answering both "will it blow my budget?" and "can I trust this output, or send it to a human?"

costwright check . --json > cost.json                 # the cost certificate
# ... your eleata-verify step writes risk.json (VerifyResult.to_dict()) ...
costwright fuse --cost cost.json --risk risk.json --run-id "$RUN_ID"

It is the cartesian product of two independently-scoped certificates — NOT a composed guarantee: it does not assert the agent is "safe", has no aggregate "both passed" boolean, and does not claim a bounded budget preserves the risk SLA. The non-interference theorem (budget ⊥ risk, Hoare-style) that would justify composition is unproven / future work. The bundler is pure stdlib and never imports eleata-verify (costwright's zero-dependency core is intact). See docs/FUSION.md, docs/NON-INTERFERENCE.md, and the runnable examples/fusion_demo.py.

GitHub Action

- uses: hernaninverso/costwright/action@v0.1
  with:
    fail-on: reject        # reject | default-dependent | non-certifiable | none

JSON schema

costwright.v1 is frozen: closed enums (category, provenance, aggregation), stable unit_id, line spans only (no source code in output — CI-log safe). New fields are additive in v2. The signature field is reserved (null in OSS output): signed, independently verifiable certificates are a separate service — contact below.

What costwright does NOT do (exact scope)

  • It does not bound Send fan-out, interrupts/human-in-the-loop, CrewAI hierarchical mode, or dynamic goto — those map to non_certifiable, never to a fake bound. Conservative by construction: when in doubt, no certificate.
  • Nested subgraphs (graph.add_node("x", inner.compile())) ARE composed since 0.2.0 — but only when the inner graph is fully visible: a straight-line sequence of direct add_node calls on a uniquely-bound, non-imported local StateGraph. Any way a static reader can't pin down the inner graph's identity or node count — built in a loop/comprehension, bound more than once, passed into a helper, imported, from x import *, add_sequence, a RetryPolicy/error_handler — fails closed (non_certifiable), never a smaller composed number. Dynamic rebinding (globals(), setattr, exec, monkeypatching) is unobservable to any static analyzer and voids the certificate, as with every type checker.
  • The bound is worst-case (deliberately over-approximate). Tightening it is roadmap.
  • A token-level dollar bound additionally needs caps on every call (costwright caps) and a per-provider billing ceiling — see §3.2 of the paper for where that holds and degrades.

Theory

The calculus (7 constructs, affine potential in the style of Hofmann–Jost AARA), its cost-soundness theorem, the relational↔functional equivalence, the termination measure, and the affine no-double-spend layer are machine-checked in Lean 4 (no sorry; axioms [propext, Quot.sound]): hernaninverso/typed-resources.

Certification / enterprise

Signed budget certificates (Ed25519, independently verifiable), a live per-provider billing-ceiling feed (providers change semantics monthly — your certificate can silently invalidate), and org-level CI policy are a paid layer on top of this OSS tool. Contact: hernaninverso@gmail.com.

License

Apache-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

costwright-0.2.9.tar.gz (74.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

costwright-0.2.9-py3-none-any.whl (72.9 kB view details)

Uploaded Python 3

File details

Details for the file costwright-0.2.9.tar.gz.

File metadata

  • Download URL: costwright-0.2.9.tar.gz
  • Upload date:
  • Size: 74.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.5 {"installer":{"name":"uv","version":"0.10.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for costwright-0.2.9.tar.gz
Algorithm Hash digest
SHA256 e518b8cba386325c25c59ef4c345801fa37c47ad6a8c8d7dc75e77cff7716db8
MD5 bb29a42eb7779cbbace5aeb718f748b0
BLAKE2b-256 086e39900b88acffe2562983045f2c300e8177f599edea05be3215188681ef9c

See more details on using hashes here.

File details

Details for the file costwright-0.2.9-py3-none-any.whl.

File metadata

  • Download URL: costwright-0.2.9-py3-none-any.whl
  • Upload date:
  • Size: 72.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.5 {"installer":{"name":"uv","version":"0.10.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for costwright-0.2.9-py3-none-any.whl
Algorithm Hash digest
SHA256 7db8f348e63d8a623f2ab3d7c845628ce56c1228bb45ca48039263dd5990feda
MD5 597206d2ffb60191a65409a91914d160
BLAKE2b-256 598e261692a55f31b784943ebf9704bfb1618a3b060fbd3ddb718a7d55a8e786

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page