Open-source test harness for AI agents that take real-world actions.

These details have not been verified by PyPI

Project links

Project description

Agent-Harness

Open-source test harness for AI agents that take real-world actions.

Problem

Teams can observe agents in traces and score outputs with existing tools, but they lack a shared, pytest-friendly way to assert the sequence of tool calls, arguments, and safety properties on a run. Agent-Harness provides trace-oriented assertions and a CLI so those checks can run in CI without calling real APIs by default. It is complementary to observability and LLM evaluation stacks.

Install

Core package (editable, from repo root):

pip install -e "."

LangGraph adapter, dev tooling, and optional resource/cost helpers (matches CI and typical agent projects):

pip install -e ".[langgraph,dev]"

[dev] includes pytest-asyncio, typing stubs, and pulls [resource] (tokencost) per pyproject.toml. Other extras: [openai], [anthropic], [crewai], [live], [compliance], [langfuse], [arize], or [all].

PyPI (alpha): The published package is pytest-agentharness. To match CI and the LangGraph example:

pip install "pytest-agentharness[langgraph,dev]==0.1.0a2"

The GitHub repository name is Agent-Harness; the Python package import remains agentharness.

Quickstart

Behavioral checks use @scenario, the run fixture, and assertions on run.trace. For a trace that includes tool arguments (not only tool names), run the bundled LangGraph example test from the repo root:

from agentharness import (
    assert_approval_gate,
    assert_arg_lte,
    assert_called_before,
    scenario,
)


@scenario("examples/01_customer_support_langgraph/scenarios/happy_path.yaml")
def test_happy_path(run):
    assert_called_before(run.trace, "lookup_order", "issue_refund")
    assert_arg_lte(run.trace, tool="issue_refund", arg="amount", value=100)
    assert_approval_gate(run.trace, tool="issue_refund")

pip install -e ".[langgraph,dev]"
python -m pytest examples/01_customer_support_langgraph/test_refund_agent.py::test_happy_path -q

The example package overrides the run fixture so YAML steps execute under LangGraph with recorded args. More detail: examples/01_customer_support_langgraph/README.md.

What Agent-Harness is not

Not a monitoring or observability platform (use LangFuse or Arize Phoenix for that)
Not a full LLMOps platform
Not framework-specific (not a LangChain product)
Not an LLM benchmark (not SWE-bench or WebArena)
Not a replacement for LangSmith, DeepEval, or TruLens — complementary behavioral testing over traces

Available assertions

Function	What it checks	Regulatory reference (from `REFS_*` in `assertions/base.py`)
`assert_called_before`	First occurrence of `earlier_tool` before first of `later_tool`	EU AI Act Article 9; NIST AI RMF TEVV Verify
`assert_call_count`	Tool appears exactly `expected` times in order	EU AI Act Article 9
`assert_completion`	No ERROR status on tool spans / no errors on `ToolCallRecord`s	EU AI Act Article 9; NIST AI RMF TEVV Validate
`assert_mutual_exclusion`	Two tools are not both invoked in the same run	EU AI Act Article 9
`assert_arg_lte`	Every call to `tool` has `args[arg] <= value` (numeric)	EU AI Act Article 15; Colorado SB 24-205
`assert_arg_pattern`	Args match a regex	EU AI Act Article 15
`assert_arg_schema`	Args validate against a JSON Schema	EU AI Act Article 9; NIST AI RMF TEVV Verify
`assert_arg_not_contains`	Args do not contain forbidden substrings	EU AI Act Article 15; OWASP LLM06:2025
`assert_approval_gate`	Each call to `tool` has `approved` or `approval_id` in args	EU AI Act Article 14; Colorado SB 24-205; OWASP LLM06:2025
`assert_no_loop`	Tool call count for `tool` does not exceed `max_calls`	EU AI Act Article 9; OWASP LLM10:2025
`assert_cost_under`	Estimated trace cost at most `max_usd` (via trace attrs and optional tokencost)	OWASP LLM10:2025; EU AI Act Article 9

CLI

agentharness run <scenario.yaml>
python -m agentharness run <scenario.yaml>

Optional --mode mock|live (default mock).

Demo and screenshots

Agent-Harness is terminal-first (pytest + CLI)—there is no separate web UI to demo. Step-by-step commands and a short live script: docs/demo.md.

Pytest — example happy-path (test_happy_path):

Pytest happy path

CLI — agentharness run on the safety scenario (mock):

CLI pass

Replay + diff — cassette comparison:

Replay and diff

Roadmap

Phase 0 (foundation, pytest plugin, assertions, LangGraph adapter, CLI run, example agent) is complete. 0.1.0a2 is on PyPI as an alpha (pytest-agentharness). Phase 1 continues with additional adapters (e.g. OpenAI, CrewAI), multi-run statistical mode, and follow-on releases. A fuller public roadmap is planned before the Phase 1 launch milestone.

License

Apache License 2.0 — see LICENSE and pyproject.toml.

Contributing

See CONTRIBUTING.md for workflow and review expectations.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0a2 pre-release

Apr 20, 2026

0.1.0a1 pre-release

Apr 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_agentharness-0.1.0a2.tar.gz (47.5 kB view details)

Uploaded Apr 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pytest_agentharness-0.1.0a2-py3-none-any.whl (60.7 kB view details)

Uploaded Apr 20, 2026 Python 3

File details

Details for the file pytest_agentharness-0.1.0a2.tar.gz.

File metadata

Download URL: pytest_agentharness-0.1.0a2.tar.gz
Upload date: Apr 20, 2026
Size: 47.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for pytest_agentharness-0.1.0a2.tar.gz
Algorithm	Hash digest
SHA256	`2e1ca6b0298e0ec63e27da5d2cd87018218dcbcc8fdc9f3dc44e52b591ae0dd2`
MD5	`d2bde6b39f00c64abe84e4d8b12c16c3`
BLAKE2b-256	`8f60238e8d5614c34aae61055c86d32e3a359347ed6adbe2327bdef0f0f222b9`

See more details on using hashes here.

File details

Details for the file pytest_agentharness-0.1.0a2-py3-none-any.whl.

File metadata

Download URL: pytest_agentharness-0.1.0a2-py3-none-any.whl
Upload date: Apr 20, 2026
Size: 60.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for pytest_agentharness-0.1.0a2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f398eb343882f650b6a27f10d09d46e5cf63c9163d95a463900f16e224ddbfe9`
MD5	`b8b3b78b0f116f6530e91c28f2d91f74`
BLAKE2b-256	`3cde772ec4be49522a5e82a90ae829b1980107c449452852f5bb4e279b886441`

See more details on using hashes here.

pytest-agentharness 0.1.0a2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Agent-Harness

Problem

Install

Quickstart

What Agent-Harness is not

Available assertions

CLI

Demo and screenshots

Roadmap

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes