Skip to main content

Deterministic quality gates for AI-assisted development

Project description

๐ŸฆŽ Agent Harness

Enforce. Enforce. Enforce.

Stop the slop. AI agents aren't sloppy โ€” the human toolchains they inherited are.

44 rules ยท 5 stacks ยท <500ms ยท Zero config

PyPI Downloads CI License: Apache 2.0 Python 3.12+

Built with Claude Code Linted with ruff Policies: OPA / Rego

Quick Start ยท The Problem ยท Stacks ยท For AI Agents ยท Contributing

PyPI: Install as agentic-harness โ€” the agent-harness name is reserved by an unrelated abandoned package (transfer pending). CLI command is agent-harness.


Agent: writes Dockerfile, commits
  โ†“
agent-harness lint โ†’ FAIL: no USER directive, no HEALTHCHECK, COPY . . before pip install
  โ†“
Agent: reads errors, fixes all three, re-lints
  โ†“
agent-harness lint โ†’ 10 passed, 0 failed (476ms)
  โ†“
Agent: commits clean code. No human involved.

Dockerfiles without USER directives. Compose files without healthchecks. Secrets hardcoded in ENV. Dependency caches busted on every build. Coverage gates that don't exist. Formatters that never run.

This isn't the AI's fault. These are human-built toolchains with decades of accumulated slop โ€” implicit defaults, silent failures, missing guardrails. Humans learned to work around them through tribal knowledge. AI agents don't have tribal knowledge. They just hit the wall.

Agent Harness is the wall that talks back. One CLI, deterministic feedback, actionable error messages. Every rule exists because an AI agent made that exact mistake โ€” and will keep making it until something stops it.

$ agent-harness lint

  PASS  conftest-gitignore (39ms)
  PASS  conftest-json (0ms)
  PASS  yamllint (117ms)
  PASS  file-length (0ms)
  PASS  ruff:format (50ms)
  PASS  ruff:check (118ms)
  PASS  ty (109ms)
  PASS  conftest-python (43ms)

8 passed, 0 failed (476ms)

The Problem

AI agents are as good as the feedback they get. Human toolchains give terrible feedback โ€” or none at all.

The slop What the agent does What the harness does
.gitignore is "optional" Commits .env with real secrets Policy catches it before commit
Dockerfile layer order is tribal knowledge COPY . . before pip install โ€” 5min rebuilds Layer ordering policy enforces correct order
pytest.mark.untit silently selects nothing Thinks tests pass (zero ran) Strict markers policy catches the typo
Compose healthcheck is "recommended" Deploy "succeeds," service is dead Healthcheck policy fails the lint
Formatters exist but nobody runs them Reformats differently each iteration Formatter runs on every commit, enforcing consistency

An agent can't act on "consider using healthchecks." It can act on "FAIL: services.api missing healthcheck โ€” add healthcheck: block."

That's the difference between documentation and a harness. Documentation hopes. A harness enforces.

We'll build AI-first frameworks eventually. Until then, agents have to work with what humans built. Agent Harness makes that survivable.

Quick Start

# Install
uv tool install agentic-harness   # or: pip install agentic-harness

# Detect stacks + subprojects
agent-harness detect

# Set up configs and Makefile
agent-harness init

# Run all checks
agent-harness lint

# Auto-fix what's fixable, then lint
agent-harness fix

Stacks

Agent Harness auto-detects your project and activates the right checks. Zero config required.

Python

Detected by pyproject.toml, setup.py, requirements.txt

Tool What it checks
ruff Linting + formatting (fastest Python linter)
ty Type checking
conftest pytest strict-markers, coverage >=90%, verbose output, ruff config
file-length No file exceeds 500 lines

JavaScript / TypeScript

Detected by package.json, tsconfig.json

Tool What it checks
Biome Linting + formatting (single Rust-based tool, ~20x faster than ESLint)
Framework type checker astro check, next lint, or tsc --noEmit โ€” auto-detected
conftest engines field, type: "module", no wildcard * versions

Docker

Detected by Dockerfile, docker-compose*.yml

Tool What it checks
hadolint Dockerfile best practices (DL/SC rules)
conftest Layer ordering, cache mounts, USER directive, HEALTHCHECK, secrets in ENV/ARG, base image pinning (discovers all Dockerfiles in tree)
conftest (compose) Healthchecks, restart policies, image pinning, port binding, $$ escaping, no bind mounts, no inline configs

Dokploy

Detected by dokploy-network reference in compose files

Tool What it checks
conftest traefik.enable=true on labeled services, dokploy-network for routed services

Universal

Always active on every project.

Tool What it checks
yamllint YAML syntax, duplicate keys, truthy values
conftest .gitignore completeness (stack-aware), JSON validity
file-length Extension-aware: .py/.ts 500 lines, .astro/.vue 800 lines

Configuration

Zero config by default โ€” stacks are auto-detected. Override with .agent-harness.yml:

stacks:
  - python
  - docker
  - javascript

exclude:
  - _archive/
  - vendor/

python:
  coverage_threshold: 95
  line_length: 140
  max_file_lines: 500

javascript:
  coverage_threshold: 80

docker:
  own_image_prefix: "ghcr.io/myorg/"

Conftest Exceptions

Skip individual policies per file when legitimate:

docker:
  conftest_skip:
    scripts/autonomy/Dockerfile:
      - dockerfile.user        # runs as root intentionally
      - dockerfile.healthcheck # not a service

See SKILL.md for the full list of exception IDs.

Commands

Command Description
agent-harness detect Show detected stacks and subprojects
agent-harness init Scaffold configs, Makefile, show tool availability
agent-harness init --apply Apply auto-fixes and create missing config files
agent-harness lint Run all checks โ€” exits non-zero on failure
agent-harness fix Auto-fix (ruff, biome), then lint
agent-harness security-audit Scan working dir for vulnerable deps + leaked secrets
agent-harness security-audit-history Deep scan full git history for leaked secrets

For AI Agents

The feedback loop

Agent writes code
       โ†“
agent-harness lint
       โ†“
  โ”Œโ”€ PASS โ†’ commit
  โ””โ”€ FAIL โ†’ agent reads error โ†’ agent fixes โ†’ re-lint

Every error message is actionable. Every Rego policy has a structured comment:

# WHAT: What this rule checks
# WHY: Why it matters for AI agents
# WITHOUT IT: What breaks in practice
# FIX: How to resolve the violation

When a user challenges a rule

  1. Read the WHY block from the .rego file
  2. Explain the risk to the user
  3. If they still want to suppress โ€” that's their call

The WHY exists because agents make these specific mistakes. It's the agent's argument.

Claude Code plugin

Agent Harness ships as a Claude Code plugin with guidance docs:

# Load as plugin
claude --plugin-dir /path/to/agent-harness

# Or add to your shell alias
alias c="claude --plugin-dir ~/path/to/agent-harness"

The plugin includes:

  • Skill โ€” when to use, workflow, stack reference
  • Docker guidance โ€” healthcheck recipes, migration patterns, config strategies
  • Python guidance โ€” why each pyproject.toml knob matters

Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Framework (Django, FastAPI, Next.js)โ”‚  โ† future
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Stack (Python, JS/TS, Go)          โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Infrastructure (Docker, Dokploy)   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Universal                          โ”‚  โ† always active
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Each layer composes on top of the previous. Adding a new stack = creating a new directory. Each check is a self-contained file with its own docstring, test, and single responsibility.

Tool Stack

Agent Harness orchestrates external tools โ€” it doesn't embed them:

Tool Purpose Fallback
conftest Rego policy engine Required
hadolint Dockerfile linting Required for Docker
ruff Python linting + formatting uv run fallback
ty Python type checking uv run fallback
Biome JS/TS linting + formatting npx fallback
yamllint YAML validation uv run fallback

Requirements

  • Python 3.12+
  • conftest (required)
  • hadolint (for Docker projects)
  • Other tools auto-fallback via uv run or npx

Status

Actively developed. See PLANS.md for roadmap.

Current: 44 Rego deny rules, 5 stacks (Python, JavaScript, Docker, Dokploy, Universal), 201 Python tests, 109 Rego tests.

Contributing

See CONTRIBUTING.md.

Every Rego policy follows the WHAT/WHY/WITHOUT IT/FIX pattern. Every Python check has a self-documenting docstring. Adding a rule? Write the WHY first โ€” if you can't articulate why an AI agent needs this specific check, it doesn't belong here.

License

Apache 2.0 โ€” see LICENSE.


๐ŸฆŽ Cold-blooded enforcement since mid 2025.

Built by Denis Tomilin at Agentic Engineering

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentic_harness-0.3.1.tar.gz (188.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentic_harness-0.3.1-py3-none-any.whl (105.4 kB view details)

Uploaded Python 3

File details

Details for the file agentic_harness-0.3.1.tar.gz.

File metadata

  • Download URL: agentic_harness-0.3.1.tar.gz
  • Upload date:
  • Size: 188.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agentic_harness-0.3.1.tar.gz
Algorithm Hash digest
SHA256 adbee3cd391901d1dad517c129d214eddc6ea621219176c8149dd6ef9c5ea8a4
MD5 750300378a9fca392befa619b9d2515b
BLAKE2b-256 42f013507831be5832e524159878f4dbf8cef398dee0663304a26e3db4c5874d

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_harness-0.3.1.tar.gz:

Publisher: publish.yml on agentic-eng/agent-harness

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agentic_harness-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: agentic_harness-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 105.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agentic_harness-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 85ae53b1f7b9a3fe0ecce8efb51a65819e6e899597728f2f967e414604750db8
MD5 7fe010477634ec23dc7286e931ae567d
BLAKE2b-256 976618f5b109d6de3bcf68a1258c6288775db63de5c601b99c1f10612087ce43

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_harness-0.3.1-py3-none-any.whl:

Publisher: publish.yml on agentic-eng/agent-harness

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page