Open-source test harness for AI agents that take real-world actions.
Project description
AgentHarness
Open-source test harness for AI agents that take real-world actions.
Problem
Teams can observe agents in traces and score outputs with existing tools, but they lack a shared, pytest-friendly way to assert the sequence of tool calls, arguments, and safety properties on a run. AgentHarness provides trace-oriented assertions and a CLI so those checks can run in CI without calling real APIs by default. It is complementary to observability and LLM evaluation stacks.
Install
Core package (editable, from repo root):
pip install -e "."
LangGraph adapter, dev tooling, and optional resource/cost helpers (matches CI and typical agent projects):
pip install -e ".[langgraph,dev]"
[dev] includes pytest-asyncio, typing stubs, and pulls [resource] (tokencost) per pyproject.toml. Other extras: [openai], [anthropic], [crewai], [live], [compliance], [langfuse], [arize], or [all].
Quickstart
Behavioral checks use @scenario, the run fixture, and assertions on run.trace. For a trace that includes tool arguments (not only tool names), run the bundled LangGraph example test from the repo root:
from agentharness import (
assert_approval_gate,
assert_arg_lte,
assert_called_before,
scenario,
)
@scenario("examples/01_customer_support_langgraph/scenarios/happy_path.yaml")
def test_happy_path(run):
assert_called_before(run.trace, "lookup_order", "issue_refund")
assert_arg_lte(run.trace, tool="issue_refund", arg="amount", value=100)
assert_approval_gate(run.trace, tool="issue_refund")
pip install -e ".[langgraph,dev]"
python -m pytest examples/01_customer_support_langgraph/test_refund_agent.py::test_happy_path -q
The example package overrides the run fixture so YAML steps execute under LangGraph with recorded args. More detail: examples/01_customer_support_langgraph/README.md.
What AgentHarness is not
- Not a monitoring or observability platform (use LangFuse or Arize Phoenix for that)
- Not a full LLMOps platform
- Not framework-specific (not a LangChain product)
- Not an LLM benchmark (not SWE-bench or WebArena)
- Not a replacement for LangSmith, DeepEval, or TruLens — complementary behavioral testing over traces
Available assertions
| Function | What it checks | Regulatory reference (from REFS_* in assertions/base.py) |
|---|---|---|
assert_called_before |
First occurrence of earlier_tool before first of later_tool |
EU AI Act Article 9; NIST AI RMF TEVV Verify |
assert_call_count |
Tool appears exactly expected times in order |
EU AI Act Article 9 |
assert_completion |
No ERROR status on tool spans / no errors on ToolCallRecords |
EU AI Act Article 9; NIST AI RMF TEVV Validate |
assert_mutual_exclusion |
Two tools are not both invoked in the same run | EU AI Act Article 9 |
assert_arg_lte |
Every call to tool has args[arg] <= value (numeric) |
EU AI Act Article 15; Colorado SB 24-205 |
assert_arg_pattern |
Args match a regex | EU AI Act Article 15 |
assert_arg_schema |
Args validate against a JSON Schema | EU AI Act Article 9; NIST AI RMF TEVV Verify |
assert_arg_not_contains |
Args do not contain forbidden substrings | EU AI Act Article 15; OWASP LLM06:2025 |
assert_approval_gate |
Each call to tool has approved or approval_id in args |
EU AI Act Article 14; Colorado SB 24-205; OWASP LLM06:2025 |
assert_no_loop |
Tool call count for tool does not exceed max_calls |
EU AI Act Article 9; OWASP LLM10:2025 |
assert_cost_under |
Estimated trace cost at most max_usd (via trace attrs and optional tokencost) |
OWASP LLM10:2025; EU AI Act Article 9 |
CLI
agentharness run <scenario.yaml>
python -m agentharness run <scenario.yaml>
Optional --mode mock|live (default mock).
Roadmap
Phase 0 (foundation, pytest plugin, assertions, LangGraph adapter,
CLI run, example agent) is complete. Phase 1 adds record/replay,
OpenAI and CrewAI adapters, multi-run statistical mode, and a PyPI
release. A public roadmap will be published before the Phase 1
launch.
License
Apache License 2.0 — see LICENSE and pyproject.toml.
Contributing
See CONTRIBUTING.md for workflow and review expectations.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pytest_agentharness-0.1.0a1.tar.gz.
File metadata
- Download URL: pytest_agentharness-0.1.0a1.tar.gz
- Upload date:
- Size: 47.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d713d359a2a62a7423d269655c245e3c617d120a344fccc85d1c98a093c04211
|
|
| MD5 |
603026c351e9fadef3866a2f36ced04a
|
|
| BLAKE2b-256 |
e864806896415484c11e5f4367f7a84206af2a87e2f0491a336c274c40f23c9c
|
File details
Details for the file pytest_agentharness-0.1.0a1-py3-none-any.whl.
File metadata
- Download URL: pytest_agentharness-0.1.0a1-py3-none-any.whl
- Upload date:
- Size: 60.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3351d926af206d85d9a3075da44f0e5a6537e51aaf1af58590e650ca8005def7
|
|
| MD5 |
bd7f4441879e996d7fc85a1c73ae50a3
|
|
| BLAKE2b-256 |
c7922e303f6ce7f905fe139bc2074b24526ff1139ae031a6248e87b54c351cc7
|