observable-agent

Unopinionated contract-based verification for AI agents

Project description

Observable Agent

Unopinionated contract-based verification for AI agents.

The Problem

Typical deterministic software uses tools like unit tests to ensure the code functions correctly. However, as we find ourselves becoming more reliant on AI agents to do our work, we will need a smarter and more efficient means of verifying their output is correct. To fix this, this library introduces a mental model known as the Agentic Contract Framework.

Its primary function is to produce a contract with a set of commitments before the agents execution (this contract can be hardcoded, or be dynamically generated by the agent itself). Each commitment on a contract has an attached verifier, and this verifier can be set by you, the developer. If you deem a commitment can be deterministically verified, you are welcome to create a function for that (like a unit test). Otherwise, you can rely on the default semantic verifier that uses another agent to verify the correctness of the output. All the evaluations done will be collected and synced with Datadog.

Quick Start

from observable_agent import ObservableAgent, Contract, Commitment

# Define what the agent must do
contract = Contract(commitments=[
    Commitment(
        name="no_harmful_content",
        terms="The agent must not produce harmful or offensive content"
    ),
    Commitment(
        name="stay_on_topic",
        terms="The agent must only discuss topics related to the user's query"
    )
])

# Create the agent (wraps Google ADK)
agent = ObservableAgent(
    name="my_agent",
    model="gemini-2.0-flash",
    instruction="You are a helpful assistant.",
    description="A helpful assistant",
    contract=contract,
    on_implementation_complete=lambda verifier: print(verifier.verify())
)

Progressive Hardening

Using this library is very much a process of continuous exploration, you observe your agents, determine their failure modes and progressively "harden" your rules. If you discover that your agent commonly does a certain mistake, you can simply create a commitment to not do that mistake and add a deterministic verifier to help catch it 100% of the times. I would personally recommend just starting with the default semantic verifier and understanding the failure modes of your agent in your domain first!

Architecture

classDiagram
    ObservableAgent --> Contract
    ObservableAgent --> Execution
    Contract --> Commitment
    Commitment --> Verifier
    Commitment --> SemanticVerifier

    class ObservableAgent {
        +name: str
        +model: str
        +instruction: str
        +contract: Contract
    }

    class Contract {
        +commitments: List~Commitment~
        +verify(execution) List~VerificationResult~
    }

    class Commitment {
        +name: str
        +terms: str
        +verifier: Callable
        +semantic_sampling_rate: float
        +verify(execution) VerificationResult
    }

    class Execution {
        +tool_calls: List~ToolCall~
        +format_tool_calls() str
    }

    class Verifier {
        <<deterministic>>
        +verify(execution, terms) Result
    }

    class SemanticVerifier {
        <<LLM-based>>
        +verify(execution, terms) Result
    }

Key Concepts

The main contribution of this library is the Contract class. A contract stores many commitments. Think of a commitment as an expectation of what the agent is supposed to deliver. And a contract is a set of expectations. It's like a freelancer contract, but with your AI agent.

You build a contract by first defining it and adding commitments to it. A commitment can hold its own verifier. My approach to this is to be as critical as possible towards the output of the AI agent. If your verifier returns a violation, then it is taken that the agent failed to deliver what it committed to. But if your verifier returns a pass, then it is taken that the agent managed to pass a deterministic test case but could potentially have other failure modes that we do not know of (after all, using this library is a process of exploration). In this case, we run it against a semantic verifier to check for these unknown failure modes.

If you are confident that the deterministic test case is enough to account for all failure modes, you can set semantic_sampling_rate to be 0, meaning none of the agent executions for that particular commitment will be put through semantic verification (but your deterministic verification will still run). If you are more cost-conscious, you can set this to some number between 0 and 1 (the lower the number, the lesser the semantic verification that are done and the lesser the cost). If it is 1, then the semantic verifier will always run if (1) your verifier doesn't exist or (2) your verifier returned a pass.

The ObservableAgent function is a wrapper that returns the Agent class from Google ADK. The key differences are the callbacks and the contract. You pass in a contract to the agent, and for the callbacks, you can define a on_tool_call callback and a on_implementation_complete callback (more coming soon!). The ObservableAgent doesn't automatically verify the commitments, instead it provides you a verifier as an argument in the on_implementation_complete callback. Here you can call the .verify() method to start verifying the agent execution. This is so that you can run the verification as a background process, or right after, whichever is best for your context.

Installation

pip install observable-agent

Set your environment variables:

# Required for the agent
export GEMINI_API_KEY=your_gemini_api_key

# Required for Datadog observability (optional)
export DD_LLMOBS_ENABLED=1
export DD_LLMOBS_ML_APP=your_app_name
export DD_LLMOBS_AGENTLESS_ENABLED=1
export DD_SITE=us5.datadoghq.com
export DD_API_KEY=your_datadog_api_key
export DD_ENV=development
export DD_SERVICE=observable-agent

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Dec 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

observable_agent-0.1.0.tar.gz (12.4 kB view details)

Uploaded Dec 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

observable_agent-0.1.0-py3-none-any.whl (11.0 kB view details)

Uploaded Dec 25, 2025 Python 3

File details

Details for the file observable_agent-0.1.0.tar.gz.

File metadata

Download URL: observable_agent-0.1.0.tar.gz
Upload date: Dec 25, 2025
Size: 12.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for observable_agent-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`d7cfe2138326f7b6097c788c8b561ee9ed4647493c023aa534d9d8f641fc1c4b`
MD5	`c21df90ccb193238745e352052d513a7`
BLAKE2b-256	`b1d0f894c3a43245e0c4cdb64e3229956eca02be850cc56b51b105435537bc7a`

See more details on using hashes here.

File details

Details for the file observable_agent-0.1.0-py3-none-any.whl.

File metadata

Download URL: observable_agent-0.1.0-py3-none-any.whl
Upload date: Dec 25, 2025
Size: 11.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for observable_agent-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9d5db216ac28faf7715306375d456708b0970d94094c18bb52b3fc59c5eb7e13`
MD5	`fc68c1e584c2293293bf520342439f89`
BLAKE2b-256	`ce60915f4f636062b832223c078d9ee3b2e9df26b11930f6dc5426e4b8fcb0d7`

See more details on using hashes here.

observable-agent 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

The Problem

Quick Start

Progressive Hardening

Architecture

Key Concepts

Installation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes