A lightweight scaffold for evaluating and controlling agentic tasks with structured checks and workflow state.
Project description
Agent Harness CLI
A practical reference framework for building acceptance loops around AI agent work.
agent-harness-cli helps you turn "the agent seems done" into executable,
inspectable feedback. The CLI runs project-owned check scripts, writes
machine-readable reports, gives Codex a paginated report surface it can use for
the next pass, and can control long-running workflows through explicit state.
What Is Agent Harness Engineering?
Agent Harness Engineering is the practice of engineering the system around an AI agent so the agent's work becomes constrained, observable, verifiable, and iteratively improvable through executable feedback loops.
In broader use, that system can include task specs, tools, runtime context, memory, safety controls, environments, evaluators, reports, and human handoff. This project is a focused reference framework for one practical surface of that idea: artifact acceptance loops for Codex workspaces.
Instead of relying on an agent's final message, you define:
- the artifact the agent must produce
- the checks that decide whether the artifact is acceptable
- the report format the agent can inspect
- the hook behavior that turns failures into the next agent prompt
agent-harness-cli provides the smallest reusable skeleton for that loop. Your
workspace owns the domain rules: checks, rubrics, fixtures, and optional LLM
judge calls.
Why It Matters
Agentic work often fails at the last mile: the artifact exists, but nobody has turned the acceptance criteria into a feedback loop the agent can actually use.
agent-harness-cli gives you a reference implementation of a project-local
acceptance loop:
- deterministic checks for objective requirements
- checklist or LLM-assisted checks for semantic requirements
- warning checks for guidance and error checks for blocking failures
- JSON reports for reproducibility and audit
- paginated report viewing for large outputs
- Codex Stop hooks that turn failed checks into continuation prompts
- workflow state control for multi-stage agentic tasks
The key idea is simple: a failed check should become useful context for the next agent pass.
Good Fit
Use this when an agent produces an artifact that can be checked, and you want a pattern your project can copy and adapt:
- project documents, proposals, specifications, and research notes
- code changes that need repository-specific acceptance checks
- generated data reports or analysis outputs
- writing tasks with strict length, structure, or quality requirements
- long-running agentic workflows where the agent should iterate after failures
The main design boundary is that domain logic stays in the workspace's own check scripts, while the CLI provides a consistent way to run those checks and inspect their results.
What This Is Not
agent-harness-cli is not trying to be a complete agent platform. It is not:
- an eval platform
- an agent runtime
- a general workflow engine
- a large built-in check library
- a replacement for project tests, linting, review, or domain judgment
The value is the reference pattern: define the artifact, run project-owned checks, write a report, and feed failures back into the next agent pass.
Related Projects
- Runnable example: Biaoo/agent-harness-cli-example
- This repository contains the CLI and two Codex skills:
harness-check-designerfor designing harnesses andharness-workflow-runnerfor operating existing workflow projects.
Install
Install the CLI with uv:
uv tool install agent-harness-cli
Or run the published package without a global install:
uvx --from agent-harness-cli agent-harness --help
Install Codex Skills
Install the designer skill when you want Codex to create or modify checks,
workflow specs, hooks, or project AGENTS.md files:
npx skills add Biaoo/agent-harness-cli --skill harness-check-designer -a codex
Install the runner skill when you want Codex to operate an existing workflow project from a short user task or idea:
npx skills add Biaoo/agent-harness-cli --skill harness-workflow-runner -a codex
For a global Codex skill install, add -g:
npx skills add Biaoo/agent-harness-cli --skill harness-check-designer -a codex -g
npx skills add Biaoo/agent-harness-cli --skill harness-workflow-runner -a codex -g
Use $harness-check-designer when designing the harness. Use
$harness-workflow-runner when running an existing workflow.
Build an Acceptance Loop
In normal use, configure the project so Codex runs the harness automatically when a turn stops. Manual CLI commands are mainly useful while developing or debugging checks.
- Write the artifact contract. Define what Codex should produce and where it should save it.
- Encode acceptance checks. Write focused scripts under your project, such
as
checks/check_required_sections.py. - Describe the task. Put the artifact and checks in
task.json. - Wire the Stop hook. Run
agent-harness run-checkswhen Codex attempts to finish. - Continue on failure. Convert blocking failures into a Codex continuation decision so the agent keeps working.
- Inspect the report. Use
agent-harness viewwhile debugging or when an agent needs paginated report context.
Minimal task.json:
{
"id": "proposal_review",
"artifacts": [
{
"name": "proposal",
"path": "proposal.md",
"type": "markdown",
"required": true
}
],
"checks": [
{
"name": "required_sections",
"command": ["{python}", "checks/check_required_sections.py"],
"severity": "error",
"config": {
"artifact": "proposal"
}
}
]
}
Minimal .codex/hooks.json:
{
"hooks": {
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "bash \"$(git rev-parse --show-toplevel)/.codex/hooks/run-agent-harness-check.sh\"",
"async": false,
"timeout": 360,
"statusMessage": "Running agent harness"
}
]
}
]
}
}
See the runnable example for a complete Stop hook script: Biaoo/agent-harness-cli-example.
Codex Stop Hooks
agent-harness run-checks exits with 1 when blocking checks fail. That is
correct CLI behavior. In a Codex Stop hook, translate that result into a Codex
continuation decision:
{
"decision": "block",
"reason": "Agent harness found blocking failures. Update the artifact and rerun the harness."
}
For Stop hooks, this does not reject the turn. It tells Codex to continue with the reason as the next prompt. Use the generated report to give Codex a short, actionable continuation reason.
Manual Debugging
Use the CLI directly when developing or debugging checks:
agent-harness run-checks --task task.json --report-id sample-report
The command writes a report and prints a compact summary:
PASSED 2/2 checks
report_id: sample-report
report_path: reports/sample-report.json
Next:
agent-harness view sample-report
agent-harness view sample-report --failed-only
View the report one page at a time:
agent-harness view sample-report --page 1 --page-size 5
agent-harness view sample-report --failed-only
Workflow Controller
For long-running tasks, define a workflow graph instead of a single final
task.json. The workflow controller validates the current active node, updates
state, writes reports, and returns hook JSON that keeps Codex working until the
workflow completes.
Common commands:
agent-harness validate-workflow --task workflows/research.json
agent-harness step --task workflows/research.json --hook-json
agent-harness status --state .agent-harness/state.json
agent-harness history --state .agent-harness/state.json
agent-harness options --state .agent-harness/state.json
agent-harness choose <transition-id> --state .agent-harness/state.json --reason "why this route is correct"
agent-harness approve <node-id> --state .agent-harness/state.json --reason "user approved"
agent-harness reject <node-id> --state .agent-harness/state.json --reason "user rejected"
Workflow node checks receive the normal check input plus workflow, state,
node, and artifacts context. See the runnable example and the full design
document: Workflow Controller Design.
How It Works
The core shape is:
task.json + external check commands + report store + paginated viewer
Each check is an ordinary command. The harness writes an input JSON file and
appends --input <path> unless the command already contains {input}. It also
replaces {python} with the current Python interpreter.
Task check example:
{
"name": "todo_markers",
"command": ["{python}", "checks/todo_markers.py"],
"severity": "warning",
"config": {
"patterns": ["TODO", "TBD"]
}
}
Input passed to each check:
{
"root": "project root provided by harness",
"task_path": "task.json",
"task": {},
"check": {}
}
Check result shape:
{
"check": "required_artifacts",
"passed": true,
"score": 1.0,
"severity": "error",
"summary": "All required artifacts exist.",
"reasons": []
}
Failed checks should include specific reasons with evidence and a suggested fix.
Design Principles
- Keep the CLI thin and the workspace in control.
- Deterministic checks should be preferred over LLM judges.
- LLM judge checks should own model-call logic inside the user's check script or workspace.
- Warning failures guide an agent without blocking the run.
- Error failures block handoff.
- Domain logic belongs in user-owned check scripts.
- JSON is used for task and report files to avoid parser dependencies.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_harness_cli-0.1.2.tar.gz.
File metadata
- Download URL: agent_harness_cli-0.1.2.tar.gz
- Upload date:
- Size: 29.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06d2d28a80f2a93f13e1a0986e2c7ee77792b05d2739b3ff1523d428aab594a1
|
|
| MD5 |
686eaa3995a7b7f08beb325594747f7f
|
|
| BLAKE2b-256 |
cd992f65af24f0bb3447dec51abff349462d4363256197d5df716d0037615a49
|
Provenance
The following attestation bundles were made for agent_harness_cli-0.1.2.tar.gz:
Publisher:
publish.yml on Biaoo/agent-harness-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agent_harness_cli-0.1.2.tar.gz -
Subject digest:
06d2d28a80f2a93f13e1a0986e2c7ee77792b05d2739b3ff1523d428aab594a1 - Sigstore transparency entry: 1566146879
- Sigstore integration time:
-
Permalink:
Biaoo/agent-harness-cli@4ef9f8209569f0d42c73032083f2f43aa62a8d5d -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/Biaoo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4ef9f8209569f0d42c73032083f2f43aa62a8d5d -
Trigger Event:
push
-
Statement type:
File details
Details for the file agent_harness_cli-0.1.2-py3-none-any.whl.
File metadata
- Download URL: agent_harness_cli-0.1.2-py3-none-any.whl
- Upload date:
- Size: 26.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99a5c12ef112f3395dabf7a8407b58cd5b7393508e7cd1d05c4daed615ee76d4
|
|
| MD5 |
97873097010c952efe77b3e5b40f87d6
|
|
| BLAKE2b-256 |
951937b0f0ebfa25ebbe1686dd45cc2220e3f2c966be13b2d8e9ea3c7db65763
|
Provenance
The following attestation bundles were made for agent_harness_cli-0.1.2-py3-none-any.whl:
Publisher:
publish.yml on Biaoo/agent-harness-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agent_harness_cli-0.1.2-py3-none-any.whl -
Subject digest:
99a5c12ef112f3395dabf7a8407b58cd5b7393508e7cd1d05c4daed615ee76d4 - Sigstore transparency entry: 1566146887
- Sigstore integration time:
-
Permalink:
Biaoo/agent-harness-cli@4ef9f8209569f0d42c73032083f2f43aa62a8d5d -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/Biaoo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4ef9f8209569f0d42c73032083f2f43aa62a8d5d -
Trigger Event:
push
-
Statement type: