Unopinionated contract-based verification for AI agents
Project description
Unopinionated contract-based verification for AI agents.
The Problem
Typical deterministic software uses tools like unit tests to ensure the code functions correctly. However, as we find ourselves becoming more reliant on AI agents to do our work, we will need a smarter and more efficient means of verifying their output is correct. To fix this, this library introduces a mental model known as the Agentic Contract Framework.
Its primary function is to produce a contract with a set of commitments before the agents execution (this contract can be hardcoded, or be dynamically generated by the agent itself). Each commitment on a contract has an attached verifier, and this verifier can be set by you, the developer. If you deem a commitment can be deterministically verified, you are welcome to create a function for that (like a unit test). Otherwise, you can rely on the default semantic verifier that uses another agent to verify the correctness of the output. All the evaluations done will be collected and synced to the observability system of your choice.
Quick Start
from sworn import Contract, Commitment, DatadogObservability
# Define the contract with commitments
observer = DatadogObservability()
contract = Contract(
observer=observer,
commitments=[
Commitment(
name="no_harmful_content",
terms="The agent must not produce harmful or offensive content"
),
Commitment(
name="stay_on_topic",
terms="The agent must only discuss topics related to the user's query"
)
]
)
# Decorate your tools
@contract.actuator
def send_message(content: str) -> dict:
return {"status": "sent", "content": content}
# Run within an execution context
with contract.execution() as execution:
# Run your agent (any framework)...
send_message("Hello, world!")
# Verify the execution
results = execution.verify()
print(results)
Progressive Hardening
Using this library is very much a process of continuous exploration, you observe your agents, determine their failure modes and progressively "harden" your rules. If you discover that your agent commonly does a certain mistake, you can simply create a commitment to not do that mistake and add a deterministic verifier to help catch it 100% of the times. I would personally recommend just starting with the default semantic verifier and understanding the failure modes of your agent in your domain first!
Architecture
classDiagram
Contract --> Commitment
Contract --> Observer
Contract ..> Execution : creates
Execution --> Contract
Commitment --> Verifier
Commitment --> SemanticVerifier
Observer <|-- DatadogObservability
class Contract {
+commitments: List~Commitment~
+observer: Observer
+execution() Execution
+actuator(func) Callable
+sensor(func) Callable
}
class Execution {
+tool_calls: List~ToolCall~
+verify() List~VerificationResult~
+add_tool_call(ToolCall)
+format() str
}
class Commitment {
+name: str
+terms: str
+verifier: Callable
+semantic_sampling_rate: float
}
class Observer {
<<interface>>
+capture_span()
+submit_evaluation()
}
class DatadogObservability {
+capture_span()
+submit_evaluation()
}
class Verifier {
<<deterministic>>
}
class SemanticVerifier {
<<LLM-based>>
}
Key Concepts
Contracts and Commitments
The main contribution of this library is the Contract class. A contract stores many commitments. Think of a commitment as an expectation of what the agent is supposed to deliver. And a contract is a set of expectations. It's like a freelancer contract, but with your AI agent.
Verification Strategy
You build a contract by first defining it and adding commitments to it. A commitment can hold its own verifier. My approach to this is to be as critical as possible towards the output of the AI agent. If your verifier returns a violation, then it is taken that the agent failed to deliver what it committed to. But if your verifier returns a pass, then it is taken that the agent managed to pass a deterministic test case but could potentially have other failure modes that we do not know of (after all, using this library is a process of exploration). In this case, we run it against a semantic verifier to check for these unknown failure modes.
Sampling Rate
If you are confident that the deterministic test case is enough to account for all failure modes, you can set semantic_sampling_rate to be 0, meaning none of the agent executions for that particular commitment will be put through semantic verification (but your deterministic verification will still run). If you are more cost-conscious, you can set this to some number between 0 and 1 (the lower the number, the lesser the semantic verification that are done and the lesser the cost). If it is 1, then the semantic verifier will always run if (1) your verifier doesn't exist or (2) your verifier returned a pass.
Framework Agnostic Design
This library is pretty much framework agnostic, it doesn't lock you into any agentic framework (in fact, you can swap out frameworks without changing anything about the contract). There is a clear boundary set for this library and that is anything before your agent starts running, and anything after your agent finishes execution. This makes it independent of the execution. It traces tool calls during the agent runtime (outside of its boundary) by intercepting at the tool call level, meaning if your agent calls Python functions, then the library can trace it. If there are things that can't be traced (like agent's final output or reasoning), you can simply add it using execution.add_tool_call().
Execution Context
You may need to capture some context so that you can access it within your verifier. To do this you can use the add_context() method. The method only accepts a string now, but I'm planning to add support for structured data to allow deterministic verifers to adapt to different contexts.
Contract Coverage
Similar to how we compute coverage of tests on a codebase, it would also be interesting and useful to compute the coverage of your contracts on the agent's behaviour. Verifiers report the behaviour that they enforce and cover, and the contract finds the complement of the union of all these coverages to find out potential blindspots in your contract and behaviours that are not enforced. You can use this information to further tighten your contract by adding more commitments.
Installation
pip install sworn
Set your environment variables:
# Required for the agent
export GEMINI_API_KEY=your_gemini_api_key
# Required for Datadog observability (optional)
export DD_LLMOBS_ENABLED=1
export DD_LLMOBS_ML_APP=your_app_name
export DD_LLMOBS_AGENTLESS_ENABLED=1
export DD_SITE=us5.datadoghq.com
export DD_API_KEY=your_datadog_api_key
export DD_ENV=development
export DD_SERVICE=sworn
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sworn-0.1.1.tar.gz.
File metadata
- Download URL: sworn-0.1.1.tar.gz
- Upload date:
- Size: 14.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
09ce8ffd856393060c0a3a69f7ea396b1597da6c18cd015f996d12414c8e89df
|
|
| MD5 |
6079f3c6318fccb3837dec1781f71126
|
|
| BLAKE2b-256 |
09be64f015b81726a81764a7fb08b3f6902f05a0e4c3f49028e6c0d61b9754e4
|
File details
Details for the file sworn-0.1.1-py3-none-any.whl.
File metadata
- Download URL: sworn-0.1.1-py3-none-any.whl
- Upload date:
- Size: 12.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0419cb4991211d2a00bc33b0660eea2e3d9929e2d37d27f0f4f2870c2b01f0b
|
|
| MD5 |
ad0fc5e9ea1fad3eb0efa21eb0187118
|
|
| BLAKE2b-256 |
283f23f456a1c0a9fa33cdbcbfc5a4cb40d2ca309c8d8cb5515f0555867f1272
|