Skip to main content

Open-source test harness for AI agents that take real-world actions.

Project description

Agent-Harness

Open-source test harness for AI agents that take real-world actions.

Problem

Teams can observe agents in traces and score outputs with existing tools, but they lack a shared, pytest-friendly way to assert the sequence of tool calls, arguments, and safety properties on a run. Agent-Harness provides trace-oriented assertions and a CLI so those checks can run in CI without calling real APIs by default. It is complementary to observability and LLM evaluation stacks.

Install

Core package (editable, from repo root):

pip install -e "."

LangGraph adapter, dev tooling, and optional resource/cost helpers (matches CI and typical agent projects):

pip install -e ".[langgraph,dev]"

[dev] includes pytest-asyncio, typing stubs, and pulls [resource] (tokencost) per pyproject.toml. Other extras: [openai], [anthropic], [crewai], [live], [compliance], [langfuse], [arize], or [all].

PyPI (alpha): The published package is pytest-agentharness. To match CI and the LangGraph example:

pip install "pytest-agentharness[langgraph,dev]==0.1.0a2"

The GitHub repository name is Agent-Harness; the Python package import remains agentharness.

Quickstart

Behavioral checks use @scenario, the run fixture, and assertions on run.trace. For a trace that includes tool arguments (not only tool names), run the bundled LangGraph example test from the repo root:

from agentharness import (
    assert_approval_gate,
    assert_arg_lte,
    assert_called_before,
    scenario,
)


@scenario("examples/01_customer_support_langgraph/scenarios/happy_path.yaml")
def test_happy_path(run):
    assert_called_before(run.trace, "lookup_order", "issue_refund")
    assert_arg_lte(run.trace, tool="issue_refund", arg="amount", value=100)
    assert_approval_gate(run.trace, tool="issue_refund")
pip install -e ".[langgraph,dev]"
python -m pytest examples/01_customer_support_langgraph/test_refund_agent.py::test_happy_path -q

The example package overrides the run fixture so YAML steps execute under LangGraph with recorded args. More detail: examples/01_customer_support_langgraph/README.md.

What Agent-Harness is not

  • Not a monitoring or observability platform (use LangFuse or Arize Phoenix for that)
  • Not a full LLMOps platform
  • Not framework-specific (not a LangChain product)
  • Not an LLM benchmark (not SWE-bench or WebArena)
  • Not a replacement for LangSmith, DeepEval, or TruLens — complementary behavioral testing over traces

Available assertions

Function What it checks Regulatory reference (from REFS_* in assertions/base.py)
assert_called_before First occurrence of earlier_tool before first of later_tool EU AI Act Article 9; NIST AI RMF TEVV Verify
assert_call_count Tool appears exactly expected times in order EU AI Act Article 9
assert_completion No ERROR status on tool spans / no errors on ToolCallRecords EU AI Act Article 9; NIST AI RMF TEVV Validate
assert_mutual_exclusion Two tools are not both invoked in the same run EU AI Act Article 9
assert_arg_lte Every call to tool has args[arg] <= value (numeric) EU AI Act Article 15; Colorado SB 24-205
assert_arg_pattern Args match a regex EU AI Act Article 15
assert_arg_schema Args validate against a JSON Schema EU AI Act Article 9; NIST AI RMF TEVV Verify
assert_arg_not_contains Args do not contain forbidden substrings EU AI Act Article 15; OWASP LLM06:2025
assert_approval_gate Each call to tool has approved or approval_id in args EU AI Act Article 14; Colorado SB 24-205; OWASP LLM06:2025
assert_no_loop Tool call count for tool does not exceed max_calls EU AI Act Article 9; OWASP LLM10:2025
assert_cost_under Estimated trace cost at most max_usd (via trace attrs and optional tokencost) OWASP LLM10:2025; EU AI Act Article 9

CLI

agentharness run <scenario.yaml>
python -m agentharness run <scenario.yaml>

Optional --mode mock|live (default mock).

Demo and screenshots

Agent-Harness is terminal-first (pytest + CLI)—there is no separate web UI to demo. Step-by-step commands and a short live script: docs/demo.md.

Pytest — example happy-path (test_happy_path):

Pytest happy path

CLIagentharness run on the safety scenario (mock):

CLI pass

Replay + diff — cassette comparison:

Replay and diff

Roadmap

Phase 0 (foundation, pytest plugin, assertions, LangGraph adapter, CLI run, example agent) is complete. 0.1.0a2 is on PyPI as an alpha (pytest-agentharness). Phase 1 continues with additional adapters (e.g. OpenAI, CrewAI), multi-run statistical mode, and follow-on releases. A fuller public roadmap is planned before the Phase 1 launch milestone.

License

Apache License 2.0 — see LICENSE and pyproject.toml.

Contributing

See CONTRIBUTING.md for workflow and review expectations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_agentharness-0.1.0a2.tar.gz (47.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pytest_agentharness-0.1.0a2-py3-none-any.whl (60.7 kB view details)

Uploaded Python 3

File details

Details for the file pytest_agentharness-0.1.0a2.tar.gz.

File metadata

  • Download URL: pytest_agentharness-0.1.0a2.tar.gz
  • Upload date:
  • Size: 47.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for pytest_agentharness-0.1.0a2.tar.gz
Algorithm Hash digest
SHA256 2e1ca6b0298e0ec63e27da5d2cd87018218dcbcc8fdc9f3dc44e52b591ae0dd2
MD5 d2bde6b39f00c64abe84e4d8b12c16c3
BLAKE2b-256 8f60238e8d5614c34aae61055c86d32e3a359347ed6adbe2327bdef0f0f222b9

See more details on using hashes here.

File details

Details for the file pytest_agentharness-0.1.0a2-py3-none-any.whl.

File metadata

File hashes

Hashes for pytest_agentharness-0.1.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 f398eb343882f650b6a27f10d09d46e5cf63c9163d95a463900f16e224ddbfe9
MD5 b8b3b78b0f116f6530e91c28f2d91f74
BLAKE2b-256 3cde772ec4be49522a5e82a90ae829b1980107c449452852f5bb4e279b886441

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page