Pytest-style behavioral regression testing for AI agents.
Project description
AgentCheck
AgentCheck is pytest for AI agents. Test behavior, not exact text.
Install from source today:
python -m pip install -e .
Planned published package install:
pip install pygent-test
What It Does
AgentCheck helps you verify agent behavior such as:
- which tools were used
- whether tools were used in the expected order
- whether the agent stayed within a step budget
- whether the agent claimed success without tool evidence
- whether behavior regressed against a saved baseline
Current Status
This repo already supports:
- repeated-run behavioral tests with
@agent_test(...) - local baseline and regression comparison
- CLI commands:
test,bless,compare,report - pytest integration
- a plain Python adapter
- an OpenAI Agents SDK adapter
- real live OpenAI agent tests in
integration_examples/
Quick Start
python -m pip install -e .
python -m agentcheck.cli test examples
Minimal Example
from agentcheck import agent_test, expect
from examples.booking_agent import SimpleBookingAgent
@agent_test(runs=5, agent_factory=SimpleBookingAgent)
def test_booking_agent(agent: SimpleBookingAgent):
result = agent.run("Book a table for 2 tonight")
check = expect(result, collect=True)
check.used_tool("restaurant_search")
check.used_tool("booking_tool")
check.steps_less_than(5)
check.did_not_claim_confirmation_without_tool("booking_tool")
check.verify()
return result
Real Agent Testing
AgentCheck has been exercised against real OpenAI Agents SDK agents.
Use the included live suite:
python -m agentcheck.cli test integration_examples
or:
python -m pytest integration_examples -q
The included live tests cover:
- a single-tool weather assistant
- a multi-tool research assistant
Documentation
Use these docs depending on what you need:
- TECHNICAL_GUIDE.md Detailed developer guide covering architecture, assertions, adapters, and workflows
- REAL_WORLD_TESTING.md Real OpenAI Agents SDK testing setup and examples
Included Demos
Passing local demo:
python -m agentcheck.cli test examples
Intentional failure demo:
python -m agentcheck.cli test regression_examples --fail-on-regression
Commands
python -m agentcheck.cli test <path>python -m agentcheck.cli bless <path>python -m agentcheck.cli comparepython -m agentcheck.cli report
Smoke Test
Run a quick end-to-end validation with:
python scripts/smoke_test.py
To include the live OpenAI integration tests:
python scripts/smoke_test.py --with-live
Pytest
AgentCheck tests can also run through pytest:
python -m pytest examples -q
python -m pytest tests -q
python -m pytest integration_examples -q
Decorated @agent_test(...) functions are collected as AgentCheck test items, and each item still runs its configured repeated-run behavior.
Assertions
Current built-in assertions:
used_tool(...)did_not_use_tool(...)used_tools_in_order([...])steps_less_than(...)finished_successfully()did_not_error()final_output_contains(...)final_output_does_not_contain(...)did_not_claim_confirmation_without_tool(...)
Use fail-fast assertions:
expect(result).used_tool("restaurant_search")
Use collected assertions when you want one run to report multiple failures:
check = expect(result, collect=True)
check.used_tool("restaurant_search")
check.used_tool("booking_tool")
check.did_not_claim_confirmation_without_tool("booking_tool")
check.verify()
Roadmap
This is the first step.
Near-term priorities:
- cleaner reports, starting with Markdown output
- a few more broadly useful assertions
- better onboarding for testing a real agent in under 5 minutes
- more adapters based on actual user demand
Longer-term directions:
- stronger regression analysis
- better flakiness reporting
- richer CI workflows
- optional hosted features only if the core library proves valuable
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pygent_test-0.1.1.tar.gz.
File metadata
- Download URL: pygent_test-0.1.1.tar.gz
- Upload date:
- Size: 15.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68369621c83fba5552839c7d5351de530c31b19b35c90bc3742675712021e88e
|
|
| MD5 |
16aa0baf22dc1f651219056f3b283eb8
|
|
| BLAKE2b-256 |
0e4d8cf5573688afbfd0991d4ed45ead144e908fe8db688a49d48b8ae7fb6acf
|
Provenance
The following attestation bundles were made for pygent_test-0.1.1.tar.gz:
Publisher:
publish-pypi.yml on ashutosh-rath02/pygent-test
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pygent_test-0.1.1.tar.gz -
Subject digest:
68369621c83fba5552839c7d5351de530c31b19b35c90bc3742675712021e88e - Sigstore transparency entry: 1396284197
- Sigstore integration time:
-
Permalink:
ashutosh-rath02/pygent-test@1ccf6ba2edf61c63e2d9cb471b48ff0884e3288b -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ashutosh-rath02
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@1ccf6ba2edf61c63e2d9cb471b48ff0884e3288b -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file pygent_test-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pygent_test-0.1.1-py3-none-any.whl
- Upload date:
- Size: 17.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
496a82861505c2ba6fc8d1fe3b010c17ed071822af69f3d05049ce8789f412c9
|
|
| MD5 |
3b49e308c78540a46409564396ed9395
|
|
| BLAKE2b-256 |
9cc2af4af513d4ebccd4fd1d16b4b145409098a61f9fbf5f934865da18345fb1
|
Provenance
The following attestation bundles were made for pygent_test-0.1.1-py3-none-any.whl:
Publisher:
publish-pypi.yml on ashutosh-rath02/pygent-test
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pygent_test-0.1.1-py3-none-any.whl -
Subject digest:
496a82861505c2ba6fc8d1fe3b010c17ed071822af69f3d05049ce8789f412c9 - Sigstore transparency entry: 1396284212
- Sigstore integration time:
-
Permalink:
ashutosh-rath02/pygent-test@1ccf6ba2edf61c63e2d9cb471b48ff0884e3288b -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ashutosh-rath02
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@1ccf6ba2edf61c63e2d9cb471b48ff0884e3288b -
Trigger Event:
workflow_dispatch
-
Statement type: