Skip to main content

Pytest-style behavioral regression testing for AI agents.

Project description

AgentCheck

AgentCheck is pytest for AI agents. Test behavior, not exact text.

Install from source today:

python -m pip install -e .

Planned published package install:

pip install agentcheck-behavior

What It Does

AgentCheck helps you verify agent behavior such as:

  • which tools were used
  • whether tools were used in the expected order
  • whether the agent stayed within a step budget
  • whether the agent claimed success without tool evidence
  • whether behavior regressed against a saved baseline

Current Status

This repo already supports:

  • repeated-run behavioral tests with @agent_test(...)
  • local baseline and regression comparison
  • CLI commands: test, bless, compare, report
  • pytest integration
  • a plain Python adapter
  • an OpenAI Agents SDK adapter
  • real live OpenAI agent tests in integration_examples/

Quick Start

python -m pip install -e .
python -m agentcheck.cli test examples

Minimal Example

from agentcheck import agent_test, expect
from examples.booking_agent import SimpleBookingAgent


@agent_test(runs=5, agent_factory=SimpleBookingAgent)
def test_booking_agent(agent: SimpleBookingAgent):
    result = agent.run("Book a table for 2 tonight")

    check = expect(result, collect=True)
    check.used_tool("restaurant_search")
    check.used_tool("booking_tool")
    check.steps_less_than(5)
    check.did_not_claim_confirmation_without_tool("booking_tool")
    check.verify()
    return result

Real Agent Testing

AgentCheck has been exercised against real OpenAI Agents SDK agents.

Use the included live suite:

python -m agentcheck.cli test integration_examples

or:

python -m pytest integration_examples -q

The included live tests cover:

  • a single-tool weather assistant
  • a multi-tool research assistant

Integration guide:

Included Demos

Passing local demo:

python -m agentcheck.cli test examples

Intentional failure demo:

python -m agentcheck.cli test regression_examples --fail-on-regression

Commands

  • python -m agentcheck.cli test <path>
  • python -m agentcheck.cli bless <path>
  • python -m agentcheck.cli compare
  • python -m agentcheck.cli report

Pytest

AgentCheck tests can also run through pytest:

python -m pytest examples -q
python -m pytest tests -q
python -m pytest integration_examples -q

Decorated @agent_test(...) functions are collected as AgentCheck test items, and each item still runs its configured repeated-run behavior.

Assertions

Current built-in assertions:

  • used_tool(...)
  • did_not_use_tool(...)
  • used_tools_in_order([...])
  • steps_less_than(...)
  • finished_successfully()
  • did_not_error()
  • final_output_contains(...)
  • final_output_does_not_contain(...)
  • did_not_claim_confirmation_without_tool(...)

Use fail-fast assertions:

expect(result).used_tool("restaurant_search")

Use collected assertions when you want one run to report multiple failures:

check = expect(result, collect=True)
check.used_tool("restaurant_search")
check.used_tool("booking_tool")
check.did_not_claim_confirmation_without_tool("booking_tool")
check.verify()

Release Readiness

Before pushing or sharing the repo, use:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygent_test-0.1.0.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pygent_test-0.1.0-py3-none-any.whl (16.7 kB view details)

Uploaded Python 3

File details

Details for the file pygent_test-0.1.0.tar.gz.

File metadata

  • Download URL: pygent_test-0.1.0.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pygent_test-0.1.0.tar.gz
Algorithm Hash digest
SHA256 155e130915271aa5fc4f668914f7d3c2608cae47164f428f01e964fd06869aa0
MD5 7e3370c5910d29388f4a72a85f74b701
BLAKE2b-256 bf9df078ace3672b0a0b13862b3622dc0ba256144ad2ed6bb3f00c1a6c368adc

See more details on using hashes here.

Provenance

The following attestation bundles were made for pygent_test-0.1.0.tar.gz:

Publisher: publish-pypi.yml on ashutosh-rath02/pygent-test

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pygent_test-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pygent_test-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pygent_test-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8ad9ecca5c5fd32fd1c949b53a97425d4b69889bf0bd9519e3286055214c953b
MD5 65edbb19277f1b51762fb779e5933fe4
BLAKE2b-256 b696a7ba26a6574c19179e87a7cfaf2003e12c99d62ff545b748b7d7ca7feb67

See more details on using hashes here.

Provenance

The following attestation bundles were made for pygent_test-0.1.0-py3-none-any.whl:

Publisher: publish-pypi.yml on ashutosh-rath02/pygent-test

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page