Skip to main content

The open trust layer for autonomous AI agents — deterministic gates, budget breakers, and tamper-evident action receipts.

Project description

Surety AI

Probabilistic agents. Verified actions.

The open execution boundary for autonomous AI agents. Agents may hallucinate; execution does not have to. Surety puts deterministic gates, hard exposure limits, and tamper-evident receipts between an agent's proposal and the real world.

CI License: Apache-2.0 Node >= 20 Python >= 3.10 Zero dependencies

Why · Quick start · Earned autonomy · Works with your stack · Evals · Production readiness · Examples · Strategy · Architecture · Spec · Roadmap


Why

In April 2026, a coding agent hit a credential mismatch in staging, found an API token in an unrelated file, and deleted a production database — and its backups — in nine seconds. Its post-incident summary: "I violated every principle I was given."

Principles stated in a prompt are not controls. The agent ecosystem has content guardrails (is this text safe?) and authorization (is this action permitted?), but almost nothing answers the questions that actually decide blast radius:

  • How much autonomy has this agent earned? Static human-in-the-loop doesn't scale — reviewers drown, then reflexively approve, and "oversight" becomes theater. Approving everything and approving nothing both fail.
  • What evidence does each decision leave? Post-incident forensics today rely on the agent's own logs — the defendant writes the police report.
  • What stops a runaway loop at 3 a.m.? A hard ceiling, not a polite instruction.

Surety is that layer. Four invariants, enforced in code:

  1. Rules decide, LLMs propose. Only deterministic rules allow an action. The same action, policy, and state produce the same decision — there is no prompt to inject.
  2. Trust is earned per action-type. Agents graduate SUPERVISED → PROBATIONARY → TRUSTED → BONDED on track record, and one rejection demotes instantly. An email track record grants nothing for refunds.
  3. Hard limits are hard. Daily action/spend ceilings in integer minor units. Checked at the gate, committed only after execution.
  4. Every decision leaves a receipt. Hash-chained, payload-hashed (never payload-stored), verifiable by a third party. An open spec, not a log format.

The next phase closes the loop: require independent evidence for an action's assumptions, hold unresolved actions against an exposure budget, verify the real outcome, and grant more autonomy only from verified success. We call this outcome-bonded autonomy. See the product strategy and roadmap.

Surety's research direction adds calibrated foresight without weakening the deterministic boundary: forecasting and ML may require simulation, canaries, approval, or denial, but may never override failed invariants or hard limits. See reliability research.

Quick start

npm install @suretyainpm/suretyai        # Python: pip install suretyai
import { BondLimits, createGuard } from '@suretyainpm/suretyai'

const limits = new BondLimits({ max_actions_per_day: 100, max_spend_per_day_minor: 10_000 })

const guard = createGuard(
  [
    limits.rule(),
    {
      id: 'refund-ceiling',
      check: (a) => a.type !== 'payment.refund' || (a.payload.amount_minor as number) <= 5000,
      reason: 'Refunds above £50.00 require human approval',
    },
  ],
  { agent_id: 'billing-agent', chain: true }
)

const result = guard({ type: 'payment.refund', payload: { invoice: 'INV-1042', amount_minor: 9900 } })

result.allowed   // false — deterministically, every time
result.reasons   // ['Refunds above £50.00 require human approval']
result.receipt   // tamper-evident Action Receipt for your audit store

Earned autonomy

The full pipeline adds the trust ledger, human approval gates, and oversight-health monitoring:

import { ApprovalSignalHealth, TrustLedger, WebhookApprovalGate, createPipeline } from '@suretyainpm/suretyai'

const pipeline = createPipeline({
  rules: [limits.rule(), refundCeiling],
  trust: new TrustLedger(),                                  // graduated autonomy
  approval: new WebhookApprovalGate({ url: APPROVALS_URL }), // human-in-the-loop
  health: new ApprovalSignalHealth(),                        // rubber-stamp detection
  limits,
  agent_id: 'billing-agent',
  chain: true,
})

const result = await pipeline.run(action)
// decision: 'auto_approved' | 'gate_approved' | 'gate_rejected' | 'gate_timeout' | 'policy_blocked'
if (result.allowed) { await execute(action); limits.record(action) }

What that buys you, measured (run it yourself):

Static HITL No HITL Surety graduated trust
Human decisions per 100 routine actions 100 0 30, falling toward 0
Rogue action stopped before execution ✅ (if reviewer awake) ✅ rules + gate + limits
Reviewer fatigue → reflexive approval guaranteed at scale n/a detected and flagged
Misbehavior consequence none none instant demotion
Audit trail app logs agent's own logs hash-chained receipts

And trust is asymmetric, like it is with people: ~30 clean approvals to earn autonomy, one rejection to lose it.

Works with your stack

One guard object; adapters for wherever your agents live. No framework lock-in, no rewrite:

// MCP — wrap any server's tool dispatcher
const safeTool = mcpGuard(guard, server.tool.bind(server))

// Claude Agent SDK — PreToolUse hook, sync, ~zero latency
const hook = claudePreToolUse(guard)

// OpenAI Agents SDK — input guardrail
new Agent({ inputGuardrails: [openaiGuardrail(guard)] })

Composes with — never replaces — the rest of the safety stack: content guardrails (LlamaFirewall, NeMo Guardrails) above, policy engines (OPA, Cedar) alongside. See where Surety sits.

Evals

Every claim above is backed by a reproducible eval — CI runs them on every push, so the README can't drift from the code. Reproduce with npm run eval:

# Claim Result
E1 Included adversarial corpus: case spoofing, string smuggling, type spoofing, credential laundering 0/10 bypassed
E2 Included canonicalization and collision cases (incl. the nested-key collision class — spec §3) 6/6 correct
E3 Graduated trust vs static HITL, 200 routine actions 85% fewer human decisions
E4 Oversight-health monitor: 4 rubber-stamp patterns + 1 healthy reviewer 5/5 classified correctly

Details and methodology: evals/.

The eval suite also includes a seeded comparative simulation that replays one labeled refund-action stream through unguarded execution, static HITL, and the real Surety guard. It reports both prevented loss and residual risk. Simulation validates mechanisms and hypotheses; field claims require independently labeled historical or shadow-mode traces from your own execution system.

Examples

Five runnable demos, no API keys needed — including the PocketOS incident replayed against Surety (3/3 destructive steps blocked, routine work unimpeded) and an agent earning its autonomy in 60 seconds. Index: examples/.

This repo dogfoods itself: the CI research agent runs under a Surety guard via a Claude Code PreToolUse hook — pushes to main blocked, workflow self-edits blocked, every tool call receipted (scripts/surety-hook.mjs).

Project status

v0.2 — core pipeline (guard, trust, gates, health, limits, receipts) shipped in TypeScript and Python with 80+ tests and the eval suite; MCP/Claude/OpenAI adapters shipped. Pre-1.0: API may still move; the Action Receipt spec is versioned independently. See the roadmap for what's next (receipt persistence, Slack gate, crewAI/LangGraph/pydantic-ai adapters, signed receipts).

Production readiness

Read this before putting Surety on a real money path — we'd rather you know the edges than discover them.

Ready now

  • The deterministic guard, bond-limit checks, canonical hashing, and receipt chaining are pure and side-effect-free — safe to run inline on a hot path.
  • TS and Python cores are at parity for guard / trust / limits / health; the full async pipeline and approval gates are TypeScript-only today.

Not ready yet — design around these

  • State is in-memory and single-process. TrustLedger, BondLimits, ApprovalSignalHealth, and the receipt chain live in process memory. Across concurrent workers or replicas you get per-process trust and limits — two workers can each approve up to the "daily" ceiling, and trust earned on one isn't seen by another. For now, run one guard instance behind a queue, or persist/reload manually via TrustLedger.export()/from(). Durable Postgres/Redis backends with atomic limits are the headline of Phase 2.
  • Budget commit is the caller's job. The pipeline checks limits but does not record them; call limits.record(action) after a successful execution or ceilings won't enforce (see examples/02).
  • Approval is a prediction, not proof. Trust graduates on approvals; close the loop with trust.recordOutcome(success) so an approved-but-ineffective agent is demoted. Without outcome reporting, "trusted" means "approved," not "worked."
  • The bundled evals are internal simulations, not field evidence — they prove the code behaves as specified on synthetic workloads (the simulation deliberately leaves an ambiguous_intent class unblocked). Validate against your own traces before trusting the numbers.

In short: today Surety is production-ready as a single-instance decision boundary with manual state persistence. Distributed, durable, atomic state is next.

Documentation

Architecture & design decisions The stack position, pipeline, and 11 design decisions with rationale
Reliability research Deterministic assurance informed by calibrated forecasting and ML
Action Receipt spec v0.1 The vendor-neutral receipt format — implement it without Surety
Examples · Evals Runnable demos and reproducible measurements
Roadmap Phased plan with measurable exit criteria
Contributing · Security How to help, how to report

License

Apache-2.0 — including its explicit patent grant: everything here is freely usable, forever.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

suretyai-0.2.1.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

suretyai-0.2.1-py3-none-any.whl (15.2 kB view details)

Uploaded Python 3

File details

Details for the file suretyai-0.2.1.tar.gz.

File metadata

  • Download URL: suretyai-0.2.1.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for suretyai-0.2.1.tar.gz
Algorithm Hash digest
SHA256 61073a7ad2ea0f07d9532412b6adaae4517626ede2def80a1d9fe02ae94e20dd
MD5 096ca5ade0e510a5d544a8528a25720c
BLAKE2b-256 19cd993e15acebf15300d8d15086cfe978c96994c493e9bfad214833d4189328

See more details on using hashes here.

File details

Details for the file suretyai-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: suretyai-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 15.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for suretyai-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e4dbb5264851f74075d3f1f0820156ffcb571691a044d6aee8d42ec1262096ea
MD5 78f357e0ef4b2eafd4e85fd9cf4d4343
BLAKE2b-256 72be70c8fd5cdfabdbb8f3baccf024626ca0aa170e3ec6ab2e62de27999c231a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page