Unopinionated contract-based verification for AI agents
Project description
Unopinionated contract-based verification for AI agents.
The Problem
Typical deterministic software uses tools like unit tests to ensure the code functions correctly. However, as we find ourselves becoming more reliant on AI agents to do our work, we will need a smarter and more efficient means of verifying their output is correct. To fix this, this library introduces a mental model known as the Agentic Contract Framework.
Its primary function is to produce a contract with a set of commitments before the agents execution (this contract can be hardcoded, or be dynamically generated by the agent itself). Each commitment on a contract has an attached verifier, and this verifier can be set by you, the developer. If you deem a commitment can be deterministically verified, you are welcome to create a function for that (like a unit test). Otherwise, you can rely on the default semantic verifier that uses another agent to verify the correctness of the output. All the evaluations done will be collected and synced with Datadog.
Quick Start
from observable_agent import ObservableAgent, Contract, Commitment
# Define what the agent must do
contract = Contract(commitments=[
Commitment(
name="no_harmful_content",
terms="The agent must not produce harmful or offensive content"
),
Commitment(
name="stay_on_topic",
terms="The agent must only discuss topics related to the user's query"
)
])
# Create the agent (wraps Google ADK)
agent = ObservableAgent(
name="my_agent",
model="gemini-2.0-flash",
instruction="You are a helpful assistant.",
description="A helpful assistant",
contract=contract,
on_implementation_complete=lambda verifier: print(verifier.verify())
)
Progressive Hardening
Using this library is very much a process of continuous exploration, you observe your agents, determine their failure modes and progressively "harden" your rules. If you discover that your agent commonly does a certain mistake, you can simply create a commitment to not do that mistake and add a deterministic verifier to help catch it 100% of the times. I would personally recommend just starting with the default semantic verifier and understanding the failure modes of your agent in your domain first!
Architecture
classDiagram
ObservableAgent --> Contract
ObservableAgent --> Execution
Contract --> Commitment
Commitment --> Verifier
Commitment --> SemanticVerifier
class ObservableAgent {
+name: str
+model: str
+instruction: str
+contract: Contract
}
class Contract {
+commitments: List~Commitment~
+verify(execution) List~VerificationResult~
}
class Commitment {
+name: str
+terms: str
+verifier: Callable
+semantic_sampling_rate: float
+verify(execution) VerificationResult
}
class Execution {
+tool_calls: List~ToolCall~
+format_tool_calls() str
}
class Verifier {
<<deterministic>>
+verify(execution, terms) Result
}
class SemanticVerifier {
<<LLM-based>>
+verify(execution, terms) Result
}
Key Concepts
The main contribution of this library is the Contract class. A contract stores many commitments. Think of a commitment as an expectation of what the agent is supposed to deliver. And a contract is a set of expectations. It's like a freelancer contract, but with your AI agent.
You build a contract by first defining it and adding commitments to it. A commitment can hold its own verifier. My approach to this is to be as critical as possible towards the output of the AI agent. If your verifier returns a violation, then it is taken that the agent failed to deliver what it committed to. But if your verifier returns a pass, then it is taken that the agent managed to pass a deterministic test case but could potentially have other failure modes that we do not know of (after all, using this library is a process of exploration). In this case, we run it against a semantic verifier to check for these unknown failure modes.
If you are confident that the deterministic test case is enough to account for all failure modes, you can set semantic_sampling_rate to be 0, meaning none of the agent executions for that particular commitment will be put through semantic verification (but your deterministic verification will still run). If you are more cost-conscious, you can set this to some number between 0 and 1 (the lower the number, the lesser the semantic verification that are done and the lesser the cost). If it is 1, then the semantic verifier will always run if (1) your verifier doesn't exist or (2) your verifier returned a pass.
The ObservableAgent function is a wrapper that returns the Agent class from Google ADK. The key differences are the callbacks and the contract. You pass in a contract to the agent, and for the callbacks, you can define a on_tool_call callback and a on_implementation_complete callback (more coming soon!). The ObservableAgent doesn't automatically verify the commitments, instead it provides you a verifier as an argument in the on_implementation_complete callback. Here you can call the .verify() method to start verifying the agent execution. This is so that you can run the verification as a background process, or right after, whichever is best for your context.
Installation
pip install observable-agent
Set your environment variables:
# Required for the agent
export GEMINI_API_KEY=your_gemini_api_key
# Required for Datadog observability (optional)
export DD_LLMOBS_ENABLED=1
export DD_LLMOBS_ML_APP=your_app_name
export DD_LLMOBS_AGENTLESS_ENABLED=1
export DD_SITE=us5.datadoghq.com
export DD_API_KEY=your_datadog_api_key
export DD_ENV=development
export DD_SERVICE=observable-agent
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file observable_agent-0.1.0.tar.gz.
File metadata
- Download URL: observable_agent-0.1.0.tar.gz
- Upload date:
- Size: 12.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d7cfe2138326f7b6097c788c8b561ee9ed4647493c023aa534d9d8f641fc1c4b
|
|
| MD5 |
c21df90ccb193238745e352052d513a7
|
|
| BLAKE2b-256 |
b1d0f894c3a43245e0c4cdb64e3229956eca02be850cc56b51b105435537bc7a
|
File details
Details for the file observable_agent-0.1.0-py3-none-any.whl.
File metadata
- Download URL: observable_agent-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d5db216ac28faf7715306375d456708b0970d94094c18bb52b3fc59c5eb7e13
|
|
| MD5 |
fc68c1e584c2293293bf520342439f89
|
|
| BLAKE2b-256 |
ce60915f4f636062b832223c078d9ee3b2e9df26b11930f6dc5426e4b8fcb0d7
|