Skip to main content

A lightweight scaffold for evaluating agentic tasks with structured checks.

Project description

Agent Harness CLI

agent-harness-cli is a thin, dependency-free CLI for agentic task checks. It does not own domain logic. It runs user-defined check scripts, stores report JSON, and lets agents page through reports without dumping everything at once.

Install:

uv tool install agent-harness-cli

The core shape is:

Task spec + external check commands + report store + paginated viewer

Quick Start

Run checks for a task file from any workspace:

agent-harness run-checks --task task.json --report-id sample-report

The command prints a compact summary:

PASSED 2/2 checks
report_id: sample-report
report_path: reports/sample-report.json

Next:
  agent-harness view sample-report
  agent-harness view sample-report --failed-only

View a report one page at a time:

agent-harness view sample-report --page 1 --page-size 5
agent-harness view sample-report --failed-only

Run tests:

uv run python -m unittest discover -s tests -p "test_*.py"

Build package distributions:

uv build

Project Layout

src/agent_harness_cli/
  runners/       Thin CLI implementations for run-checks and view.
skills/          Skill that teaches agents how to design check scripts.
schemas/         JSON schemas for tasks, check results, and reports.
tests/           Self-contained CLI tests.

Check Command Contract

Each task check declares a command:

{
  "name": "todo_markers",
  "command": ["{python}", "checks/todo_markers.py"],
  "severity": "warning",
  "config": {
    "patterns": ["TODO", "TBD"]
  }
}

The harness writes an input JSON file and appends --input <path> unless the command already contains {input}. It also replaces {python} with the current Python interpreter.

The input contains:

{
  "root": "project root provided by harness",
  "task_path": "task.json",
  "task": {},
  "check": {}
}

Check Result Contract

Every check returns this shape:

{
  "check": "required_artifacts",
  "passed": true,
  "score": 1.0,
  "severity": "error",
  "summary": "All required artifacts exist.",
  "reasons": []
}

Failed checks should include specific reasons with evidence and a suggested fix.

Design Notes

  • The PyPI distribution is agent-harness-cli; the installed command is agent-harness.
  • Deterministic checks should be preferred over LLM judges.
  • LLM judge checks should own their model-call logic inside the user's check script or workspace.
  • Warnings guide an agent without blocking the run.
  • Error-level failures block the run.
  • Domain logic belongs in user-owned check scripts.
  • Use skills/harness-check-designer/SKILL.md when asking an agent to design a new check.
  • JSON is used for task and report files to avoid parser dependencies.

Publishing

The GitHub workflow at .github/workflows/publish.yml publishes on tags that match v*.*.*. The tag version must match [project].version without the leading v.

git tag v0.1.1
git push origin v0.1.1

Publishing uses PyPI Trusted Publishing with the pypi GitHub environment. Configure the PyPI project agent-harness-cli to trust this repository and the workflow file .github/workflows/publish.yml before pushing a release tag.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_harness_cli-0.1.1.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_harness_cli-0.1.1-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file agent_harness_cli-0.1.1.tar.gz.

File metadata

  • Download URL: agent_harness_cli-0.1.1.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agent_harness_cli-0.1.1.tar.gz
Algorithm Hash digest
SHA256 efe2ae9e555f0d6a8cdd12e0fa0356407cc26e46ec62a4c58c634ce311a0125d
MD5 b459d54f94964359839aa666ccc7af42
BLAKE2b-256 d5ba11c83e717c4c8fae6f85594a99c74603a9810c81c6f4cf3cf3731a472da7

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_harness_cli-0.1.1.tar.gz:

Publisher: publish.yml on Biaoo/agent-harness-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agent_harness_cli-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_harness_cli-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 87c1aadb00a654b82d868c031870558671845e9dda7aede27822b83604004973
MD5 dd1b7c37b2650ebd0855a1146bc098ba
BLAKE2b-256 76f4986a2947ca6e09eab56bdf420887678d934ff1b1c7aa34db68901bc1abfc

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_harness_cli-0.1.1-py3-none-any.whl:

Publisher: publish.yml on Biaoo/agent-harness-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page