Skip to main content

Static contract experiments for agent-authored Python

Project description

ClauZ3 — Static contracts for agent-authored Python

clauz3

Static contracts for agent-authored Python.

clauz3 experiments with a different permission surface for agents: the agent writes a small Python program, attaches a contract describing its trusted side effects, and a static prover checks the contract before anything runs.

The contract is the thing the user should be asked to accept. The program is still available for inspection, but a user should not have to read every branch to answer questions like:

  • Will this email anyone other than Bob?
  • Will this email the same person twice?
  • Will this withdraw more than $5 total?
  • Will this write outside a sandbox?

In one picture

A run, end to end. The user asks for a bill to be paid (capped); the agent writes a small program that looks up the outstanding balance and pays min(balance, $500); ClauZ3 proves the spend stays under the cap on every branch and that only the named account is touched; the user approves the guarantees, not the code; only then does the runtime execute it.

UML sequence diagram of a ClauZ3 run. Lifelines: User, Agent, ClauZ3 gate, Python runtime. The User asks the Agent to pay a card bill capped at $500. The Agent submits a program plus guarantees to the ClauZ3 gate as a single tool call. The prover proves spending stays under $500 in both branches and that only the card account is paid, then presents only the two guarantees to the User, who approves. ClauZ3 invokes the Python runtime to execute the program; the result returns up through the agent, which summarizes success in plain English.

Editable vector source: docs/assets/figure-sequence.svg.

Current Shape

Projects are split into three layers:

  • trusted roots such as tools/email/trusted/: small audited modules containing both side-effecting functions and reusable domain contracts. These are the environment contract: the prover trusts their signatures, @deal.has(...) markers, @deal.pre(...) preconditions, and @contract helper definitions.
  • agent-authored code: ordinary Python that imports trusted functions and adds @clauz3.guarantee(...) decorators to the function being proved.

Trusted calls bottom out into symbolic effect facts. For example:

@deal.pre(lambda addr, msg: "@" in addr)
@deal.has("trusted")
def send_email(addr: str, msg: str) -> None:
    ...

A call to send_email("bob@example.com", "hi") creates a fact whose relation name is send_email, whose fields are addr and msg, and whose condition is the branch condition under which the call is reachable.

Domain logic can query those generated facts:

from clauz3.spec import ContractSpec, contract, effect
from clauz3.spec import no_guarantees as core_no_guarantees


Email = effect("send_email")


@contract
def only(addresses: list[str]) -> ContractSpec:
    return Email.all(lambda e: e.addr in addresses)


@contract
def no_guarantees() -> ContractSpec:
    return core_no_guarantees()


@contract
def unique_recipients() -> ContractSpec:
    return Email.distinct(lambda e: e.addr)

Agent code then states guarantees in terms of that vocabulary:

import clauz3
from tools.email.trusted import contracts as emails
from tools.email.trusted.effects import send_email


@clauz3.guarantee(emails.only(["bob@example.com"]))
@clauz3.guarantee(emails.unique_recipients())
def main() -> None:
    send_email("bob@example.com", "hi")

Run the prover:

cd examples/email
uv run clauz3 prove \
  --trusted-root tools/email/trusted \
  cases/only_bob_pass.py

For programs that combine domains, either repeat --trusted-root or pass several roots with --trusted-roots:

uv run clauz3 prove plan.py \
  --trusted-roots tools/email/trusted tools/db/trusted

Installing a trusted layer

To start a new project from an existing one, copy its trusted tools/ layer with clauz3 install. It accepts a local path, a gh:org/repo shorthand, or a full git URL, and copies each tools/<domain>/ into the current directory:

# Local path
clauz3 install /path/to/repo/examples/email

# GitHub shorthand (clones into ~/.cache/clauz3/sources/<sha>/)
clauz3 install gh:normalform-ai/clauz3-tools-autolabs

# Pin to a tag, branch, or sha for reproducibility
clauz3 install gh:normalform-ai/clauz3-tools-autolabs@v0.3.1

# Full URL (HTTPS or SSH)
clauz3 install git@github.com:normalform-ai/clauz3-tools-assistant.git

Pass --into <dir> to target a different destination, --force to overwrite an existing layer, and --skills to also generate agents/skills/<domain>/SKILL.md describing the installed effects and contracts:

clauz3 install gh:normalform-ai/clauz3-tools-autolabs --into my-project --skills

Authentication for git remotes uses your existing git configuration (SSH key or credential helper); clauz3 does not manage credentials itself. The remote cache is keyed by the resolved commit sha; pinning to @<sha> hits a stable entry forever, while ref-pinned or HEAD installs re-resolve on each call and get a fresh cache entry whenever upstream moves. Override the cache location with CLAUZ3_CACHE.

The install itself is a filesystem copy. Signing — so a user can verify the installed layer matches what was attested upstream — is tracked in issue #47.

Standard library tools

clauz3 ships a small set of real (non-mock) tools under src/clauz3/stdlib. After pip install clauz3, install one into the current repo with the stdlib: scheme:

clauz3 install stdlib:filesystem
clauz3 install stdlib:grep

The bundled tools are located with importlib.resources, so this works the same from a source checkout or an installed wheel. (A bare clauz3 install filesystem also works as a shorthand.)

Tool Effects Headline contracts
filesystem read_file, write_file read_only, only_write_under, never_read_under, writes_at_most
grep grep (imports filesystem) only_read_under, never_read_under, searches_at_most, only_pattern

These differ from the examples/ layers in two ways: their effect bodies do real work (so the prover skips them but clauz3 run executes them), and their proof cases live under tests/cases/ with a tests/Justfile rather than at the top level. just stdlib proves them all, or run one library's suite directly with clauz3 test stdlib:filesystem (it invokes the library's tests/Justfile with just).

Examples

The repo currently has five worked examples:

Example Contracts What it demonstrates
examples/email emails.only, emails.none, emails.no_guarantees, emails.unique_recipients, emails.content_length_at_most allowlists, explicit absence of guarantees, absence of effects, pairwise uniqueness, string length bounds
examples/bank bank.max_spend, bank.only_account numeric aggregation, field equality, and fixed-bound loop unrolling over trusted calls
examples/email-from-db emails.addresses_from, db.only_table, db.only_where for-loops over trusted query returns, column-binding constraints
examples/text text.length_between, text.must_not_contain, text.no_regex_metacharacters, text.only_edit_under, text.edit_length_at_most string-length bounds, required/banned substrings, regex-metacharacter safety before sending, and file-edit path/size policies
examples/http http.host_only, http.no_posts a shared @deal.has marker spanning two trusted calls, url-prefix matching, and method-specific absence of effects

All examples use generic effect relations inferred from trusted function signatures. There is no email-specific, bank-specific, or database-specific logic in the core.

Status

This is experimental. The current prover vendors and extends deal-solver. Implemented pieces include:

  • clauz3.guarantee(...), a no-op runtime decorator consumed by the prover.
  • trusted-call fact recording at @deal.has(...) boundaries.
  • clauz3.spec.effect(...) for relation-style queries over trusted facts.
  • relation primitives: no_guarantees, all, empty, where, count, distinct, numeric sum, and shares_value.
  • stateful fluents (clauz3.fluent): trusted functions declare successor-state axioms with @effect(...), and contracts query the final valuation with Fluent.final.all(...) / Fluent.final[key] == value. This expresses order- and post-state-sensitive policies such as "every door is locked at the end" that the multiset relation language cannot. See docs/reference/fluents.md.
  • clauz3 prove for proving examples from the CLI.
  • clauz3 install for copying a trusted tools/ layer from a local path or a bundled stdlib tool (stdlib:filesystem, stdlib:grep), optionally generating agents/skills/<domain>/SKILL.md stubs.
  • clauz3 test for running a library's bundled tests/Justfile with just, resolving the source the same way as install.
  • clauz3 config for writing this repo's default Claude Code permissions (read-only tools plus the clauz3 CLI) to .claude/settings.json. Idempotent and the configuration counterpart to install.
  • clauz3 run for proving a complete inline program, submitting an approval request to an externally configured approval service, and executing main only after an approval receipt is returned.
  • clauz3 approval-service for starting a simple localhost FastAPI approval service with REST endpoints and a browser UI for user decisions. With --policy, a policy admin's rules can auto-approve or auto-reject a request by asking the prover whether the program entails the rule's contracts, falling back to a human otherwise. clauz3 policy-check dry-runs a policy against a program. See docs/todos/approval-policies.md.
  • clauz3 mock-approval-service for config-driven tests and local demos.
  • For-loops over list[Row]-returning trusted calls, with column-binding contracts via UserRow.email markers. See docs/explanation/symbolic-iteration.md.
  • Domain coverage policies declared in a trusted-root policy.py: contracts a domain recommendeds are flagged in the approval UI when a used domain is under-constrained, and contracts it marks required are conjoined into the proof and rejected if unmet. See docs/reference/coverage-policies.md.

Important limitations:

  • Contract helper functions are still executed as builder functions. Their lambda bodies are parsed as a small expression subset, but the helper body itself is not yet fully AST-validated.
  • Runtime receipt enforcement in trusted functions is not implemented yet; the first run slice requires a receipt before execution and exposes it to the process as CLAUZ3_APPROVAL_RECEIPT.
  • clauz3 run includes source checks and reduced builtins, but it is not a hardened Python or OS sandbox. The agent harness must still restrict execution to the clauz3 command path.
  • The relation language is intentionally small. Richer string predicates, richer aggregates, Datalog-style rules, and deterministic natural language summaries are future work.

More detail:

Development

uv sync --dev
just test

Built on

deal logo    Z3 logo

ClauZ3 layers on two upstream projects and would not exist without them:

  • deal — the Python runtime contract engine whose @deal.pre, @deal.post, and @deal.has decorators define the trusted-layer vocabulary. ClauZ3 reads the same decorators and discharges them statically; in runtime-only mode, deal enforces them at execution. See Static proof vs runtime for how the two layers relate.
  • Z3 — Microsoft Research's SMT solver, the back-end the prover compiles every guarantee into. The vendored deal-solver machinery is what bridges Python AST and Z3 constraints.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clauz3-0.1.4.tar.gz (536.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clauz3-0.1.4-py3-none-any.whl (160.3 kB view details)

Uploaded Python 3

File details

Details for the file clauz3-0.1.4.tar.gz.

File metadata

  • Download URL: clauz3-0.1.4.tar.gz
  • Upload date:
  • Size: 536.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for clauz3-0.1.4.tar.gz
Algorithm Hash digest
SHA256 b2582fb47b9a9776112501318ff6636746c59e9fb71ae8256806765e09d80ff9
MD5 bed287c4a414777b4824ecf1dfb9abea
BLAKE2b-256 f9a9614bd91e6c0e1e9d1a6be381da06155055d970586db940805f1dece76578

See more details on using hashes here.

Provenance

The following attestation bundles were made for clauz3-0.1.4.tar.gz:

Publisher: release.yaml on normalform-ai/clauz3

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file clauz3-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: clauz3-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 160.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for clauz3-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 18f281f1f502ab61b21cd8d03fe23966d7b38ec05e74f3aa68429cc05552244a
MD5 d07e5ca8b1ec25c3215e664c5ff0da2b
BLAKE2b-256 ad0c6c7b2d09885eccb0eff4ecfc35e59c347922071a11779f91a1803b58b093

See more details on using hashes here.

Provenance

The following attestation bundles were made for clauz3-0.1.4-py3-none-any.whl:

Publisher: release.yaml on normalform-ai/clauz3

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page