Skip to main content

A lightweight scaffold for evaluating agentic tasks with structured checks.

Project description

Agent Harness CLI

agent-harness-cli is a thin, dependency-free CLI for agentic task checks. It does not own domain logic. It runs user-defined check scripts, stores report JSON, and lets agents page through reports without dumping everything at once.

Install:

uv tool install agent-harness-cli

The core shape is:

Task spec + external check commands + report store + paginated viewer

Quick Start

Run checks for a task file from any workspace:

agent-harness run-checks --task task.json --report-id sample-report

The command prints a compact summary:

PASSED 2/2 checks
report_id: sample-report
report_path: reports/sample-report.json

Next:
  agent-harness view sample-report
  agent-harness view sample-report --failed-only

View a report one page at a time:

agent-harness view sample-report --page 1 --page-size 5
agent-harness view sample-report --failed-only

Run tests:

uv run python -m unittest discover -s tests -p "test_*.py"

Build package distributions:

uv build

Project Layout

src/agent_harness_cli/
  runners/       Thin CLI implementations for run-checks and view.
skills/          Skill that teaches agents how to design check scripts.
schemas/         JSON schemas for tasks, check results, and reports.
tests/           Self-contained CLI tests.

Check Command Contract

Each task check declares a command:

{
  "name": "todo_markers",
  "command": ["{python}", "checks/todo_markers.py"],
  "severity": "warning",
  "config": {
    "patterns": ["TODO", "TBD"]
  }
}

The harness writes an input JSON file and appends --input <path> unless the command already contains {input}. It also replaces {python} with the current Python interpreter.

The input contains:

{
  "root": "project root provided by harness",
  "task_path": "task.json",
  "task": {},
  "check": {}
}

Check Result Contract

Every check returns this shape:

{
  "check": "required_artifacts",
  "passed": true,
  "score": 1.0,
  "severity": "error",
  "summary": "All required artifacts exist.",
  "reasons": []
}

Failed checks should include specific reasons with evidence and a suggested fix.

Design Notes

  • The PyPI distribution is agent-harness-cli; the installed command is agent-harness.
  • Deterministic checks should be preferred over LLM judges.
  • LLM judge checks can import agent_harness_cli.llm.codex_judge, which calls local codex exec and supports checklist-based judging.
  • Warnings guide an agent without blocking the run.
  • Error-level failures block the run.
  • Domain logic belongs in user-owned check scripts.
  • Use skills/harness-check-designer/SKILL.md when asking an agent to design a new check.
  • JSON is used for task and report files to avoid parser dependencies.

Publishing

The GitHub workflow at .github/workflows/publish.yml publishes on tags that match v*.*.*. The tag version must match [project].version without the leading v.

git tag v0.1.0
git push origin v0.1.0

Publishing uses PyPI Trusted Publishing with the pypi GitHub environment. Configure the PyPI project agent-harness-cli to trust this repository and the workflow file .github/workflows/publish.yml before pushing a release tag.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_harness_cli-0.1.0.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_harness_cli-0.1.0-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file agent_harness_cli-0.1.0.tar.gz.

File metadata

  • Download URL: agent_harness_cli-0.1.0.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agent_harness_cli-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1c16bd5a9de9e736f0891c26f3d0baca801cbf4e4d68876b1500fa6ec5aa54d2
MD5 f9373366d366d2582cf12b4495a56131
BLAKE2b-256 f15b826b321ff0afdbcc425c4ecab113512368ecf8495ca217b111371e3961f9

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_harness_cli-0.1.0.tar.gz:

Publisher: publish.yml on Biaoo/agent-harness-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agent_harness_cli-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_harness_cli-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6451c6171a2be9f5ab8fcca146086fe7e9aaf6d559c69f179d681157722c382f
MD5 e2fb8f45f954361d7630cf1b6916dfc1
BLAKE2b-256 349e317e2b6de93794d26c3fc86bd24092d7d4a5c7e3e99a4b198dfb9015b038

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_harness_cli-0.1.0-py3-none-any.whl:

Publisher: publish.yml on Biaoo/agent-harness-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page