Pytest-style behavioral regression testing for AI agents.
Project description
AgentCheck
AgentCheck is pytest for AI agents. Test behavior, not exact text.
Install from PyPI:
pip install pygent-test
Install from source:
python -m pip install -e .
Optional framework extras:
pip install "pygent-test[langgraph]"
pip install "pygent-test[openai]"
What It Does
AgentCheck helps you verify agent behavior such as:
- which tools were used
- whether tools were used in the expected order
- whether the agent stayed within a step budget
- whether the agent claimed success without tool evidence
- whether behavior regressed against a saved baseline
Current Status
This repo already supports:
- repeated-run behavioral tests with
@agent_test(...) - local baseline and regression comparison
- CLI commands:
test,bless,compare,report - pytest integration
- a plain Python adapter
- an OpenAI Agents SDK adapter
- a LangGraph adapter
- real live OpenAI agent tests in
integration_examples/
Quick Start
python -m pip install -e .
python -m agentcheck.cli test examples
Minimal Example
from agentcheck import agent_test, expect
from examples.booking_agent import SimpleBookingAgent
@agent_test(runs=5, agent_factory=SimpleBookingAgent)
def test_booking_agent(agent: SimpleBookingAgent):
result = agent.run("Book a table for 2 tonight")
check = expect(result, collect=True)
check.used_tool("restaurant_search")
check.used_tool("booking_tool")
check.steps_less_than(5)
check.did_not_claim_confirmation_without_tool("booking_tool")
check.verify()
return result
Real Agent Testing
AgentCheck has been exercised against:
- real OpenAI Agents SDK agents
- real local LangGraph graphs built with
StateGraph
Use the included repo live suite:
python -m agentcheck.cli test integration_examples
or:
python -m pytest integration_examples -q
The included live tests cover:
- a single-tool weather assistant
- a multi-tool research assistant
LangGraph support is tested through the regular unit suite and normalizes the common
invoke({"messages": [...]}) flow into AgentResult.
Run the local LangGraph example with:
python -m agentcheck.cli test framework_examples
Documentation
Use these docs depending on what you need:
- TECHNICAL_GUIDE.md Detailed developer guide covering architecture, assertions, adapters, and workflows
- ADAPTER_GUIDE.md How adapters are structured and how to build a new one
- REAL_WORLD_TESTING.md Real OpenAI Agents SDK testing setup and examples
- ROADMAP.md What is done, what is next, and what is planned later
Included Demos
Passing local demo:
python -m agentcheck.cli test examples
Intentional failure demo:
python -m agentcheck.cli test regression_examples --fail-on-regression
Commands
python -m agentcheck.cli test <path>python -m agentcheck.cli bless <path>python -m agentcheck.cli comparepython -m agentcheck.cli report
Smoke Test
If you are working from a source checkout, run a quick end-to-end validation with:
python scripts/smoke_test.py
To include the live OpenAI integration tests:
python scripts/smoke_test.py --with-live
Every agentcheck test run also writes:
- JSON report:
.agentcheck/reports/latest.json - Markdown report:
.agentcheck/reports/latest.md
Every agentcheck bless <path> stores a suite-specific baseline under
.agentcheck/baselines/.
Baselines are guarded against unrelated suites. If the current suite and saved baseline suite do not match exactly, AgentCheck warns instead of comparing them. For older baseline files without suite metadata, it falls back to matching test names.
Pytest
AgentCheck tests can also run through pytest:
python -m pytest examples -q
python -m pytest tests -q
python -m pytest integration_examples -q
Decorated @agent_test(...) functions are collected as AgentCheck test items, and each item still runs its configured repeated-run behavior.
Assertions
Current built-in assertions:
used_tool(...)used_tool_times(...)used_tool_at_least(...)used_tool_at_most(...)did_not_use_tool(...)used_tools_in_order([...])steps_less_than(...)finished_successfully()did_not_error()final_output_contains(...)final_output_does_not_contain(...)did_not_claim_confirmation_without_tool(...)
Use fail-fast assertions:
expect(result).used_tool("restaurant_search")
Use collected assertions when you want one run to report multiple failures:
check = expect(result, collect=True)
check.used_tool("restaurant_search")
check.used_tool("booking_tool")
check.did_not_claim_confirmation_without_tool("booking_tool")
check.verify()
Roadmap
This is the first step.
Near-term priorities:
- cleaner regression summaries
- better onboarding for testing a real agent in under 5 minutes
- more adapters based on actual user demand
Longer-term directions:
- stronger regression analysis
- better flakiness reporting
- richer CI workflows
- optional hosted features only if the core library proves valuable
For a more detailed breakdown, see ROADMAP.md.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pygent_test-0.1.2.tar.gz.
File metadata
- Download URL: pygent_test-0.1.2.tar.gz
- Upload date:
- Size: 21.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a25716f040439aad604277fd3d3c9d3067efd3d482130d12d3efe5cff0f79fa
|
|
| MD5 |
fb63c9081f88beadc11a572078222d1e
|
|
| BLAKE2b-256 |
f8d274735fda6cc1cc878c77ad33add5b62adff5d3d40314db8897abe5546c19
|
Provenance
The following attestation bundles were made for pygent_test-0.1.2.tar.gz:
Publisher:
publish-pypi.yml on ashutosh-rath02/pygent-test
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pygent_test-0.1.2.tar.gz -
Subject digest:
4a25716f040439aad604277fd3d3c9d3067efd3d482130d12d3efe5cff0f79fa - Sigstore transparency entry: 1397983237
- Sigstore integration time:
-
Permalink:
ashutosh-rath02/pygent-test@cc64d118c73f1615ee64d1021e0584f6b398a686 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ashutosh-rath02
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@cc64d118c73f1615ee64d1021e0584f6b398a686 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file pygent_test-0.1.2-py3-none-any.whl.
File metadata
- Download URL: pygent_test-0.1.2-py3-none-any.whl
- Upload date:
- Size: 22.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41641c1038041728747b8d2a5cd0240b46c450886b928e633ed7dd628a829305
|
|
| MD5 |
dfc88add910a984cb98fb09421e8d505
|
|
| BLAKE2b-256 |
08f181b42c573f526416bbb7abe0350778b4010a05c527ddd2a921ae86155c57
|
Provenance
The following attestation bundles were made for pygent_test-0.1.2-py3-none-any.whl:
Publisher:
publish-pypi.yml on ashutosh-rath02/pygent-test
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pygent_test-0.1.2-py3-none-any.whl -
Subject digest:
41641c1038041728747b8d2a5cd0240b46c450886b928e633ed7dd628a829305 - Sigstore transparency entry: 1397983251
- Sigstore integration time:
-
Permalink:
ashutosh-rath02/pygent-test@cc64d118c73f1615ee64d1021e0584f6b398a686 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ashutosh-rath02
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@cc64d118c73f1615ee64d1021e0584f6b398a686 -
Trigger Event:
workflow_dispatch
-
Statement type: