Open-source CI contract testing for tool-using AI agents.

These details have not been verified by PyPI

Project links

Project description

AgentGuard CI logo

AgentGuard CI

Open-source CI contract testing for tool-using AI agents.

Python 3.11+

AgentGuard CI helps teams block risky prompt, tool, and agent changes before they reach production. Define trace-level contracts, mock tools, run impacted tests only, enforce latency/token budgets, and fail CI when agent behavior regresses.

Why AgentGuard CI Exists

Final-answer evals are not enough for production agents. A response can look acceptable while the trace is unsafe: the wrong tool was called, a required policy lookup was skipped, a refund tool ran before confirmation, or token usage doubled.

AgentGuard CI focuses on deterministic engineering contracts:

Required and forbidden tool calls.
Tool call ordering.
Routing decisions.
Structured output schemas.
Latency, token, cost, and tool-call budgets.
Mocked tool results for safe offline CI.
Baseline comparison for regression prevention.

LLM-as-judge is available as an optional path, but deterministic assertions are the default.

Installation

pip install agentguard

For local development from this repository:

pip install -e ".[dev]"

OpenAI judge support is optional:

pip install "agentguard[openai]"

Quick Start

agentguard init
agentguard test

Starter test:

suite: "calendar-agent"

tests:
  - id: "uses_calendar_tool"
    input: "Book a meeting tomorrow at 5 PM."
    assert:
      trace:
        must_call:
          - tool: "calendar.create_event"
      output:
        contains:
          - "meeting"

CLI

agentguard --help
agentguard --version
agentguard init
agentguard test
agentguard test --config agentguard.yml
agentguard test --suite calendar-agent
agentguard test --case uses_calendar_tool
agentguard test --tag smoke
agentguard test --changed-only --base origin/main
agentguard test --report json --report junit
agentguard test --fail-fast
agentguard test --update-baseline
agentguard test --strict
agentguard list
agentguard diff
agentguard baseline update
agentguard baseline list
agentguard validate
python -m agentguard --help

Exit codes:

0: all blocking tests passed.
1: at least one blocking test failed.
2: configuration error.
3: agent runtime error.

Agent Entrypoint Contract

Your agent exposes a sync or async Python function:

from agentguard import AgentRequest, AgentResult, TraceEvent, Usage


async def run_agent(request: AgentRequest) -> AgentResult:
    trace = [
        TraceEvent(
            type="tool_call",
            name="calendar.create_event",
            input={"time": "5 PM"},
        )
    ]
    return AgentResult(
        output="Meeting booked for 5 PM.",
        trace=trace,
        usage=Usage(total_tokens=512, latency_ms=1200),
    )

Configure it in agentguard.yml:

version: "0.1"

project:
  name: "customer-support-agents"

agent:
  entrypoint: "my_package.agent:run_agent"
  timeout_seconds: 30

paths:
  tests: "agentguard-tests"
  baselines: ".agentguard/baselines"
  reports: ".agentguard/reports"

mocks:
  require_registered: false

Tool Mocking

AgentGuard does not monkeypatch tools in the MVP. The agent voluntarily reads mocks from the request or uses get_mock.

mocks:
  orders.get_order:
    match_args:
      order_id: "ORD-123"
    output:
      order_id: "ORD-123"
      status: "delivered"

from agentguard import get_mock

tool_output = get_mock(
    request,
    "orders.get_order",
    args={"order_id": "ORD-123"},
)

Assertions

Trace assertions:

assert:
  trace:
    must_call:
      - tool: "orders.get_order"
    must_not_call:
      - tool: "payments.issue_refund"
    ordered:
      - tool: "orders.get_order"
      - tool: "policy.lookup_refund_policy"
    max_tool_calls: 5

Output assertions:

assert:
  output:
    contains:
      - "confirmation"
    not_contains:
      - "refunded"
    json_schema:
      type: object
      required: ["category"]
      properties:
        category:
          type: string
          enum: ["billing", "technical", "account", "other"]
    jsonpath:
      - path: "$.category"
        equals: "billing"

Budget assertions:

assert:
  budgets:
    max_latency_ms: 5000
    max_total_tokens: 3000
    max_cost_usd: 0.05

Impact-Aware Testing

Create agentguard-impact.yml:

mappings:
  - files:
      - "agents/refund/**"
      - "prompts/refund/**"
      - "tools/refund.py"
    tests:
      - "refund-agent"
      - "refund-regression"

Run impacted suites only:

agentguard test --changed-only --base origin/main

If no mapping matches, AgentGuard runs smoke-tagged tests when present.

Baseline Regression Testing

Store baselines for passing tests:

agentguard test --update-baseline

Enable baseline comparison:

assert:
  regression:
    compare_to_baseline: true
    allowed_output_similarity_drop: 0.1
    allowed_extra_tool_calls: 1
    allowed_latency_increase_pct: 30

Baselines are stored under:

.agentguard/baselines/{suite}/{test_id}.json

Reports

AgentGuard prints a terminal report and can write machine-readable reports:

agentguard test --report json
agentguard test --report junit

Outputs:

.agentguard/reports/report.json
.agentguard/reports/junit.xml

GitHub Actions

name: AgentGuard CI

on:
  pull_request:
    branches: [main]

jobs:
  agentguard:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install agentguard

      - name: Run impacted agent tests
        run: |
          agentguard test --changed-only --base origin/main --report junit

Branch protection setup:

Open repository settings.
Enable branch protection for main.
Require status checks before merging.
Select the AgentGuard CI workflow.

Optional LLM Judge

LLM judge tests are opt-in and should be used selectively because provider calls can be slower, more expensive, and less deterministic than trace contracts.

assert:
  llm_judge:
    enabled: true
    rubric: |
      Score whether the answer correctly explains the refund policy and does not invent
      unsupported exceptions.
    threshold: 0.85

Use the fake provider for deterministic local tests or configure OpenAI with OPENAI_API_KEY.

Example

cd examples/simple_agent
agentguard test

The example agent uses a mocked calendar.create_event tool and validates the trace, output, latency, token budget, and tool-call budget.

Roadmap

v0.1 Alpha

YAML tests.
CLI runner.
Python agent entrypoint.
Trace assertions.
Output assertions.
Budgets.
Mocks.
Reports.
GitHub Actions example.

v0.2 Core Hardening

Public model contract cleanup.
Complete deterministic assertion coverage.
Strict validation mode.
Clear domain exceptions and failure messages.
Expanded unit coverage.

v0.3 Baselines and Reports

Versioned baseline artifacts.
Baseline diff improvements.
Stable JSON report schema.
Improved JUnit output.
Baseline CLI subcommands.

v0.4 Mocking and Impact

Stricter mock argument matching.
Registered mock failure mode.
Hardened changed-only behavior.
Smoke fallback rules.

v0.5 Adapters

Default custom Python adapter.
OpenAI Agents SDK adapter.
LangChain adapter.
Optional extras and adapter docs.

v1.0 Release Candidate

Frozen public models and JSON report schema.
Complete docs.
Clean install verification.
TestPyPI package validation.
Stable local and CI runtime behavior.

Contributing

Keep the open-source core CI-first, deterministic by default, and framework-agnostic. New integrations should preserve the same trace contract model instead of hiding behavior behind provider-specific abstractions.

Release

Release steps are documented in docs/RELEASE.md.

Documentation

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentguardci-0.1.0.tar.gz (40.5 kB view details)

Uploaded May 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentguardci-0.1.0-py3-none-any.whl (45.2 kB view details)

Uploaded May 19, 2026 Python 3

File details

Details for the file agentguardci-0.1.0.tar.gz.

File metadata

Download URL: agentguardci-0.1.0.tar.gz
Upload date: May 19, 2026
Size: 40.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for agentguardci-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`4efb951b12a54a5a94e631e459993641bbf14e8db2f2c4d46751f0202394c0b6`
MD5	`d9c5dccb6304610d9a82260125aea0e3`
BLAKE2b-256	`b0eaf8ddf06095167755a47be5875a7599c32f128a436b2aba70ca48c7282102`

See more details on using hashes here.

File details

Details for the file agentguardci-0.1.0-py3-none-any.whl.

File metadata

Download URL: agentguardci-0.1.0-py3-none-any.whl
Upload date: May 19, 2026
Size: 45.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for agentguardci-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ab8a1bfdf74c139c8db3bfc92ecce0a73089cbe805865964d7fa88be3cdebb5c`
MD5	`e99e881b5add4117f165bab9b0e7ec77`
BLAKE2b-256	`0b7d4f1fac0c941f0de848c745b53de376c4d6f411f47de3f44268b38ab87d5a`

See more details on using hashes here.

agentguardCI 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AgentGuard CI

Why AgentGuard CI Exists

Installation

Quick Start

CLI

Agent Entrypoint Contract

Tool Mocking

Assertions

Impact-Aware Testing

Baseline Regression Testing

Reports

GitHub Actions

Optional LLM Judge

Example

Roadmap

v0.1 Alpha

v0.2 Core Hardening

v0.3 Baselines and Reports

v0.4 Mocking and Impact

v0.5 Adapters

v1.0 Release Candidate

Contributing

Release

Documentation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes