agentassert-abc

Formal behavioral specification and runtime enforcement for autonomous AI agents. Agent Behavioral Contracts (ABC).

These details have not been verified by PyPI

Project links

Project description

AgentAssert
Formal Behavioral Contracts for AI Agents

AgentAssert is the formal behavioral specification and runtime enforcement engine for autonomous AI agents. Define what your agent must and must not do in a YAML contract, then enforce those rules at runtime with mathematical guarantees.

It is the only framework combining all 6 pillars of rigorous agent governance:

ContractSpec DSL -- YAML-based behavioral specification with 14 operators
Hard/Soft Constraints -- Formal separation with graduated enforcement and recovery
Drift Detection -- Jensen-Shannon Divergence for distributional behavioral analysis
(p, delta, k)-Satisfaction -- Probabilistic compliance guarantees with statistical bounds
Compositional Safety Proofs -- Formal bounds for multi-agent pipelines
Mathematical Stability -- Ornstein-Uhlenbeck dynamics with Lyapunov stability proof

Paper: Bhardwaj, V.P. (2026). AgentAssert: Formal Behavioral Contracts for Autonomous AI Agents. arXiv:2602.22302

Install

pip install agentassert-abc[yaml,math]

Requires Python 3.12+. Licensed under Elastic License 2.0.

Optional extras:

Extra	What it adds
`yaml`	YAML contract parsing (ruamel.yaml)
`math`	Drift detection, Theta computation (scipy, numpy)
`llm`	Recovery re-prompting (LiteLLM)
`otel`	OpenTelemetry metric export
`all`	Everything above

Quick Start -- 5 Minutes to Behavioral Contracts

import agentassert_abc as aa
from agentassert_abc.integrations.generic import GenericAdapter

# 1. Load a domain contract (12 included out of the box)
contract = aa.load("contracts/examples/ecommerce-product-recommendation.yaml")

# 2. Create an adapter
adapter = GenericAdapter(contract)

# 3. Monitor agent output on every turn
result = adapter.check({
    "output.pii_detected": False,
    "output.competitor_reference_detected": False,
    "output.sponsored_items_disclosed": True,
    "output.brand_tone_score": 0.85,
    "output.recommendation_relevance_score": 0.9,
})

print(f"Hard violations: {result.hard_violations}")
print(f"Soft violations: {result.soft_violations}")

# 4. Raise on critical violations
adapter.check_and_raise({
    "output.pii_detected": False,
    "output.competitor_reference_detected": False,
    "output.sponsored_items_disclosed": True,
    "output.brand_tone_score": 0.85,
    "output.recommendation_relevance_score": 0.9,
})

# 5. Get session reliability score (Theta)
summary = adapter.session_summary()
print(f"Reliability (Theta): {summary.theta:.3f}")
print(f"Deploy-ready: {summary.theta >= 0.90}")

Framework Integration

AgentAssert is plug-and-play with the major 2026 agent frameworks.

LangGraph -- Node Interception

from langgraph.graph import StateGraph, START, END
from agentassert_abc.exceptions import ContractBreachError
from agentassert_abc.integrations.langgraph import LangGraphAdapter

contract = aa.load("contracts/examples/customer-support.yaml")
adapter = LangGraphAdapter(contract)

builder = StateGraph(State)
builder.add_node("classify", adapter.wrap_node(classify_fn))
builder.add_node("respond", adapter.wrap_node(respond_fn))
builder.add_edge(START, "classify")
builder.add_edge("classify", "respond")
builder.add_edge("respond", END)

graph = builder.compile()

try:
    result = graph.invoke(initial_state)
except ContractBreachError as e:
    print(f"Hard violation blocked: {e}")

print(f"Session Theta: {adapter.session_summary().theta:.3f}")

CrewAI -- Task Guardrails

from crewai import Agent, Task, Crew
from agentassert_abc.integrations.crewai import CrewAIAdapter

contract = aa.load("contracts/examples/research-assistant.yaml")
adapter = CrewAIAdapter(contract)

# Guardrail rejects output on hard violations -- CrewAI retries automatically
research_task = Task(
    description="Research AI agent frameworks in 2026",
    expected_output="Cited report on top 5 frameworks",
    agent=researcher,
    guardrail=adapter.guardrail,
    guardrail_max_retries=3,
)

OpenAI Agents SDK -- Output Guardrails

from agents import Agent, Runner
from agentassert_abc.integrations.openai_agents import OpenAIAgentsAdapter

contract = aa.load("contracts/examples/healthcare-triage.yaml")
adapter = OpenAIAgentsAdapter(contract)

agent = Agent(
    name="triage-agent",
    instructions="You are a medical triage assistant.",
    output_guardrails=[adapter.output_guardrail],
    output_type=TriageOutput,
)

result = await Runner.run(agent, "I have chest pain", hooks=adapter.run_hooks)
print(f"Theta: {adapter.session_summary().theta:.3f}")

AgentContract-Bench -- 293 Scenarios, 12 Domains

AgentAssert ships with AgentContract-Bench, a benchmark suite of 293 scenarios across 12 real-world domains for testing contract enforcement accuracy.

Benchmark Results (v0.1.0)

Domain	Scenarios	Pass Rate	Hard P/R/F1	Soft P/R/F1
E-Commerce (Product)	50	100%	1.00 / 1.00 / 1.00	1.00 / 1.00 / 1.00
Financial Advisor	33	100%	1.00 / 1.00 / 1.00	1.00 / 1.00 / 1.00
Healthcare Triage	33	100%	1.00 / 1.00 / 1.00	1.00 / 1.00 / 1.00
MCP Tool Server	28	100%	1.00 / 1.00 / 1.00	1.00 / 1.00 / 1.00
RAG Agent	28	100%	1.00 / 1.00 / 1.00	1.00 / 1.00 / 1.00
Code Generation	23	100%	1.00 / 1.00 / 1.00	1.00 / 1.00 / 1.00
Customer Support	23	100%	1.00 / 1.00 / 1.00	1.00 / 1.00 / 1.00
E-Commerce (CS)	15	100%	1.00 / 1.00 / 1.00	1.00 / 1.00 / 1.00
E-Commerce (Order)	15	100%	1.00 / 1.00 / 1.00	1.00 / 1.00 / 1.00
Research Assistant	15	100%	1.00 / 1.00 / 1.00	1.00 / 1.00 / 1.00
Retail Shopping	15	100%	1.00 / 1.00 / 1.00	1.00 / 1.00 / 1.00
Telecom Support	15	100%	1.00 / 1.00 / 1.00	1.00 / 1.00 / 1.00
Total	293	100%	1.00 / 1.00 / 1.00	1.00 / 1.00 / 1.00

# Run benchmarks locally
python benchmarks/runner.py                     # All 293 scenarios
python benchmarks/runner.py --domain ecommerce  # Single domain
python benchmarks/runner.py --verbose           # Show details

Live LLM Benchmark -- Real Models, Real Contracts

We tested AgentAssert against 3 production LLMs on a 10-16 turn e-commerce session using the retail-shopping-assistant contract with real Azure AI Foundry endpoints:

Model	Turns	Hard Violations	Soft Violations	Theta	Mean Drift
GPT-5.3 (OpenAI)	16	0	11	0.688	0.034
Claude Sonnet 4.6 (Anthropic)	10	4	0	0.823	0.020
Mistral-Large-3 (Mistral)	10	5	0	0.813	0.025

Key findings:

GPT-5.3 achieved zero hard violations but exhibited soft quality drift (response completeness and latency)
Claude Sonnet 4.6 and Mistral-Large-3 triggered no-false-availability hard violations -- fabricating product availability without catalog access
All three models scored below the 0.90 Theta threshold for autonomous deployment, demonstrating why runtime behavioral contracts are essential

These results are consistent with the findings reported in arXiv:2602.22302. AgentAssert catches violations that traditional guardrails miss because it tracks behavioral drift over entire sessions, not just individual outputs.

Domain Contracts -- Ready to Use

12 production-ready contracts ship with AgentAssert in contracts/examples/:

Contract	Domain	Hard	Soft	Key Checks
`ecommerce-product-recommendation`	E-Commerce	7	8	PII, competitor mentions, sponsored disclosure
`ecommerce-order-management`	E-Commerce	7	8	Payment data, order accuracy, refund policy
`ecommerce-customer-service`	E-Commerce	7	8	Escalation, SLA, customer sentiment
`financial-advisor`	Finance	7	8	Regulatory compliance, risk disclosure, suitability
`healthcare-triage`	Healthcare	9	7	Medical safety, urgency detection, no diagnosis
`retail-shopping-assistant`	Retail	7	9	Availability, pricing accuracy, upsell limits
`telecom-customer-support`	Telecom	7	9	Plan accuracy, billing, cancellation handling
`code-generation`	Dev Tools	7	7	License compliance, security, test coverage
`research-assistant`	Research	6	7	Citation accuracy, source attribution, bias
`customer-support`	General	6	5	Tone, escalation, resolution quality
`mcp-tool-server`	MCP (2026)	6	5	Tool authorization, rate limits, output bounds
`rag-agent`	RAG (2026)	7	7	Hallucination, source grounding, retrieval quality

ContractSpec DSL

Define behavioral contracts in YAML:

contractspec: "0.1"
kind: agent
name: my-agent-contract
description: Behavioral contract for my agent
version: "1.0.0"

invariants:
  hard:
    - name: no-pii-leak
      description: Never expose personal information
      check:
        field: output.pii_detected
        equals: false

  soft:
    - name: tone-quality
      description: Maintain professional tone
      check:
        field: output.tone_score
        gte: 0.7
      recovery: fix-tone
      recovery_window: 2

recovery:
  strategies:
    - name: fix-tone
      type: inject_correction
      actions:
        - "Rewrite with professional tone"

satisfaction:
  p: 0.95
  delta: 0.1
  k: 3

14 operators: equals, not_equals, gt, gte, lt, lte, in, not_in, contains, not_contains, matches, exists, expr, between

Writing Your Own Contract

Identify fields -- Examine your agent's output and list the fields that matter for safety and quality
Map to flat dict -- AgentAssert uses output.field_name as keys (e.g., {"output.safe": True})
Choose constraint type -- Hard for non-negotiable safety (violations halt execution), Soft for quality goals (violations trigger recovery)
Set satisfaction -- p = target compliance rate, delta = tolerance, k = max violations before alert

SPRT Certification

Certify agents for production with 50-80% fewer test sessions using Sequential Probability Ratio Testing:

from agentassert_abc.certification.sprt import SPRTCertifier, SPRTDecision

certifier = SPRTCertifier(p0=0.85, p1=0.95, alpha=0.05, beta=0.10)
for session_passed in session_results:
    result = certifier.update(session_passed)
    if result.decision != SPRTDecision.CONTINUE:
        print(f"Decision: {result.decision.value} after {result.sessions_used} sessions")
        break

Compositional Guarantees

Prove safety bounds for multi-agent pipelines:

from agentassert_abc.certification.composition import compose_guarantees

# Agent A (p=0.95) -> Agent B (p=0.98), handoff reliability 0.99
bound = compose_guarantees(p_a=0.95, p_b=0.98, p_h=0.99)
print(f"Pipeline bound: {bound:.3f}")  # p_{A+B} >= 0.921

How AgentAssert Differs

Dimension	AgentAssert	Guardrails AI	NeMo Guardrails	Microsoft AGT
Formal math (Theta, SPRT)	Yes	No	No	No
Session drift detection (JSD)	Yes	No	No	No
Compositional safety proofs	Yes	No	No	No
Hard/Soft constraint separation	Yes	Partial	No	No
Recovery re-prompting	Yes	Yes	Yes	No
Framework integrations	10 adapters	3	1 (LangChain)	2
Statistical certification (SPRT)	Yes	No	No	No
Benchmark suite	293 scenarios	No	No	No
Academic paper	arXiv:2602.22302	No	No	No

Examples

See examples/ for runnable demos:

Example	What It Shows
`01_basic_monitoring.py`	Simplest usage -- load, monitor, get Theta
`02_ecommerce_session.py`	Full e-commerce session from the paper
`03_drift_detection.py`	JSD-based behavioral drift over 20 turns
`04_sprt_certification.py`	SPRT statistical certification
`05_langgraph_middleware.py`	LangGraph StateGraph integration
`06_crewai_integration.py`	CrewAI task guardrails
`07_composition_pipeline.py`	Multi-agent compositional bounds
`08_mcp_tool_monitoring.py`	MCP tool server monitoring

Research Paper

"AgentAssert: Formal Behavioral Contracts for Autonomous AI Agents"

The theoretical foundations, formal proofs, and experimental validation are published in a peer-reviewed paper covering all 6 pillars of the framework, with full mathematical treatment of the Reliability Index, drift dynamics, compositional guarantees, and SPRT certification.

Read the paper on arXiv (cs.AI + cs.SE)

Cite This Work

@article{bhardwaj2026agentassert,
  title={AgentAssert: Formal Behavioral Contracts for Autonomous AI Agents},
  author={Bhardwaj, Varun Pratap},
  journal={arXiv preprint arXiv:2602.22302},
  year={2026},
  url={https://arxiv.org/abs/2602.22302}
}

Contributing

Contributions welcome. See CONTRIBUTING.md for setup instructions, coding standards, and submission guidelines.

License

Elastic License 2.0. See LICENSE for details.

Part of Qualixar -- AI Agent Reliability Engineering
A research initiative by Varun Pratap Bhardwaj

qualixar.com · varunpratap.com · arXiv:2602.22302 · agentassert.com

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.0

May 24, 2026

0.2.3

Apr 17, 2026

0.2.2

Apr 17, 2026

0.2.1

Apr 17, 2026

0.2.0

Apr 10, 2026

This version

0.1.0

Apr 7, 2026

0.0.1

Apr 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentassert_abc-0.1.0.tar.gz (142.3 kB view details)

Uploaded Apr 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentassert_abc-0.1.0-py3-none-any.whl (48.6 kB view details)

Uploaded Apr 7, 2026 Python 3

File details

Details for the file agentassert_abc-0.1.0.tar.gz.

File metadata

Download URL: agentassert_abc-0.1.0.tar.gz
Upload date: Apr 7, 2026
Size: 142.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for agentassert_abc-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`db648f23f937dddc63e21463420433890275c15e6369e133be5d78c2dfb1920e`
MD5	`0c4cabbe9af16448906021ef6c57989b`
BLAKE2b-256	`ee6cc66650e0c5c736c4e0b8bfcf16d4d740b4bb24da53dbb3765ccdc1038e24`

See more details on using hashes here.

File details

Details for the file agentassert_abc-0.1.0-py3-none-any.whl.

File metadata

Download URL: agentassert_abc-0.1.0-py3-none-any.whl
Upload date: Apr 7, 2026
Size: 48.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for agentassert_abc-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2f5a8013439bf4cd2809c5bd72062e84acaa3460043af991c609c4483903fd77`
MD5	`49387c80a9d1bd0d916ef08f217e097d`
BLAKE2b-256	`20fd380c4b3736d78859dd16263ecd909c9239127e92550299f5878d68dd2cd1`

See more details on using hashes here.

agentassert-abc 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Install

Quick Start -- 5 Minutes to Behavioral Contracts

Framework Integration

LangGraph -- Node Interception

CrewAI -- Task Guardrails

OpenAI Agents SDK -- Output Guardrails

AgentContract-Bench -- 293 Scenarios, 12 Domains

Benchmark Results (v0.1.0)

Live LLM Benchmark -- Real Models, Real Contracts

Domain Contracts -- Ready to Use

ContractSpec DSL

Writing Your Own Contract

SPRT Certification

Compositional Guarantees

How AgentAssert Differs

Examples

Research Paper

Cite This Work

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes