Skip to main content

Pytest-style behavioral regression testing for AI agents.

Project description

AgentCheck

AgentCheck is pytest for AI agents. Test behavior, not exact text.

Install from PyPI:

pip install pygent-test

Install from source:

python -m pip install -e .

Optional framework extras:

pip install "pygent-test[langgraph]"
pip install "pygent-test[openai]"

Start Here

If you want the fastest local validation from a source checkout:

python -m pip install -e .
python -m agentcheck.cli test examples
python -m agentcheck.cli bless examples
python -m agentcheck.cli test regression_examples

That flow shows:

  • a healthy agent test passing
  • a suite-specific baseline being saved
  • an intentionally broken agent failing for clear behavioral reasons

What It Does

AgentCheck helps you verify agent behavior such as:

  • which tools were used
  • whether tools were used in the expected order
  • whether the agent stayed within a step budget
  • whether the agent claimed success without tool evidence
  • whether behavior regressed against a saved baseline

Current Status

This repo already supports:

  • repeated-run behavioral tests with @agent_test(...)
  • local baseline and regression comparison
  • CLI commands: test, bless, compare, report
  • pytest integration
  • a plain Python adapter
  • an OpenAI Agents SDK adapter
  • a LangGraph adapter
  • real live OpenAI agent tests in integration_examples/

Quick Start

python -m pip install -e .
python -m agentcheck.cli test examples

Expected result:

  • test_booking_agent
  • Passed: 5
  • Failed: 0
  • Success rate: 100.0%

Minimal Example

from agentcheck import agent_test, expect
from examples.booking_agent import SimpleBookingAgent


@agent_test(runs=5, agent_factory=SimpleBookingAgent)
def test_booking_agent(agent: SimpleBookingAgent):
    result = agent.run("Book a table for 2 tonight")

    check = expect(result, collect=True)
    check.used_tool("restaurant_search")
    check.used_tool("booking_tool")
    check.steps_less_than(5)
    check.did_not_claim_confirmation_without_tool("booking_tool")
    check.verify()
    return result

Real Agent Testing

AgentCheck has been exercised against:

  • real OpenAI Agents SDK agents
  • real local LangGraph graphs built with StateGraph

Use the included repo live suite:

python -m agentcheck.cli test integration_examples

or:

python -m pytest integration_examples -q

The included live tests cover:

  • a single-tool weather assistant
  • a multi-tool research assistant

LangGraph support is tested through the regular unit suite and normalizes the common invoke({"messages": [...]}) flow into AgentResult.

Run the local LangGraph example with:

python -m agentcheck.cli test framework_examples

If LangGraph dependencies are not installed yet:

pip install "pygent-test[langgraph]"

Documentation

Use these docs depending on what you need:

Included Demos

Passing local demo:

python -m agentcheck.cli test examples

Intentional failure demo:

python -m agentcheck.cli test regression_examples

Commands

  • python -m agentcheck.cli test <path>
  • python -m agentcheck.cli bless <path>
  • python -m agentcheck.cli compare
  • python -m agentcheck.cli report

Smoke Test

If you are working from a source checkout, run a quick end-to-end validation with:

python scripts/smoke_test.py

To include the live OpenAI integration tests:

python scripts/smoke_test.py --with-live

Every agentcheck test run also writes:

  • JSON report: .agentcheck/reports/latest.json
  • Markdown report: .agentcheck/reports/latest.md

Every agentcheck bless <path> stores a suite-specific baseline under .agentcheck/baselines/.

Baselines are guarded against unrelated suites. If the current suite and saved baseline suite do not match exactly, AgentCheck warns instead of comparing them. For older baseline files without suite metadata, it falls back to matching test names.

Pytest

AgentCheck tests can also run through pytest:

python -m pytest examples -q
python -m pytest tests -q
python -m pytest integration_examples -q

Decorated @agent_test(...) functions are collected as AgentCheck test items, and each item still runs its configured repeated-run behavior.

CI

AgentCheck already writes:

  • .agentcheck/reports/latest.json
  • .agentcheck/reports/latest.md

In GitHub Actions, if GITHUB_STEP_SUMMARY is present, the Markdown report is also published to the step summary automatically.

Example:

- name: Run AgentCheck examples
  run: python -m agentcheck.cli test examples --fail-on-regression

Minimal baseline-aware CI flow:

- name: Bless demo baseline
  run: python -m agentcheck.cli bless examples

- name: Run AgentCheck examples
  run: python -m agentcheck.cli test examples --fail-on-regression

Assertions

Current built-in assertions:

  • used_tool(...)
  • used_tool_times(...)
  • used_tool_at_least(...)
  • used_tool_at_most(...)
  • did_not_use_tool(...)
  • used_tools_in_order([...])
  • steps_less_than(...)
  • finished_successfully()
  • did_not_error()
  • final_output_contains(...)
  • final_output_does_not_contain(...)
  • did_not_claim_confirmation_without_tool(...)

Use fail-fast assertions:

expect(result).used_tool("restaurant_search")

Use collected assertions when you want one run to report multiple failures:

check = expect(result, collect=True)
check.used_tool("restaurant_search")
check.used_tool("booking_tool")
check.did_not_claim_confirmation_without_tool("booking_tool")
check.verify()

Roadmap

This is the first step.

Near-term priorities:

  • cleaner regression summaries
  • better onboarding for testing a real agent in under 5 minutes
  • more adapters based on actual user demand

Longer-term directions:

  • stronger regression analysis
  • better flakiness reporting
  • richer CI workflows
  • optional hosted features only if the core library proves valuable

For a more detailed breakdown, see ROADMAP.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygent_test-0.1.3.tar.gz (22.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pygent_test-0.1.3-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file pygent_test-0.1.3.tar.gz.

File metadata

  • Download URL: pygent_test-0.1.3.tar.gz
  • Upload date:
  • Size: 22.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pygent_test-0.1.3.tar.gz
Algorithm Hash digest
SHA256 5688107fe0c13a2b667d9abd26197a45b348d6b4af83e07862c429f10bb4eb96
MD5 6e02b8567b544ba61c68b4118145cc77
BLAKE2b-256 1040c82418225676ca912b147111c3505cc82a05977c87f87ab7c3fd9832d21a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pygent_test-0.1.3.tar.gz:

Publisher: publish-pypi.yml on ashutosh-rath02/pygent-test

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pygent_test-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: pygent_test-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 22.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pygent_test-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 bc0d924d330f08d02e40f377f296469c2996cd7e1ac33d8b5d0eced78b5d9add
MD5 6871d7b23a917819ca19e0b976b13586
BLAKE2b-256 a4b730a55797ddedb61ed3c8064ae290a11e24993e00be61cdfd3bd6af90bfa4

See more details on using hashes here.

Provenance

The following attestation bundles were made for pygent_test-0.1.3-py3-none-any.whl:

Publisher: publish-pypi.yml on ashutosh-rath02/pygent-test

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page