Skip to main content

The definitive standard for AI agent harness quality — 0–100 quantitative diagnostics (CE·AC·EM). Official tool by the author of 'Practical Harness Engineering'. Build with Harness, measure with HAchilles.

Project description

HAchilles 🦾

The definitive standard for AI agent harness quality measurement.

CI License Python 3.10+ PyPI version PyPI downloads HAchilles Score PRs Welcome Code style: ruff

0–100 quantitative diagnostics across 3 pillars · 15 items · 5 failure patterns

Official tool by the author of "Practical Harness Engineering" (실전 하네스 엔지니어링)

한국어 요약 | HAchilles는 AI 에이전트 하네스 품질을 0~100점으로 정량 측정하는 오픈소스 CLI/API 도구입니다. CE(컨텍스트 설계)·AC(아키텍처 제약)·EM(엔트로피 관리) 3대 기둥으로 15개 항목을 진단합니다.


Table of Contents


Overview

HAchilles answers a question that no other tool asks:

"How well-engineered is the harness surrounding your AI agent?"

Industry data shows only 34% of AI agent projects succeed in production. The root cause in 80%+ of failures is not the model, not the API cost, and not the prompts in isolation — it is the harness: the context design, architectural constraints, and entropy management that define how reliably an AI agent operates.

HAchilles quantifies harness quality on a 0–100 scale using a 3-pillar, 15-item diagnostic framework, with automated detection of 5 critical failure patterns.


Quick Start

pip install hachilles
hachilles scan .
╭──────────────────────────────────╮
│   HAchilles Diagnostic Report    │
│   /your/project  ·  Grade: A     │
╰──────────────────────────────────╯

  Score: 84 / 100

  Context Engineering    28 / 40   ████████████░░░
  Architectural Const    31 / 35   ███████████████░
  Entropy Management     25 / 25   ████████████████

  ⚠ Failure Patterns:  Context Drift (MEDIUM)

CI gate: Exit code 1 when score < 60 (Grade C or below).


3-Pillar Framework

Pillar Code Weight What It Measures
Context Engineering CE 40 pts System prompt quality, context window management, tool definitions, few-shot examples, context consistency
Architectural Constraint AC 35 pts Tool access control, loop prevention, output validation layers, fallback design, human-in-the-loop checkpoints
Entropy Management EM 25 pts State management complexity, dependency control, error propagation isolation, observability, version drift prevention

Each pillar contains 5 diagnostic items (CE-0105, AC-0105, EM-01~05) scored individually and aggregated.


5 Failure Patterns

HAchilles detects and measures the risk level of 5 failure patterns identified across real-world AI agent projects:

Pattern Severity Pillar Description
Context Drift 🔴 CRITICAL CE System prompt / context loses consistency over time due to accumulated ad-hoc changes
AI Slop 🟠 HIGH CE + AC Agent produces plausible-sounding but valueless output due to underspecified tool definitions
Entropy Explosion 🟡 MEDIUM EM Agent complexity grows uncontrollably — no one fully understands the system anymore
Over-Engineering Trap 🟡 MEDIUM AC + EM System complexity far exceeds actual use cases, creating maintenance debt without ROI
70-80 Wall 🟢 LOW All Score plateaus in the 70–80 range; further improvement requires non-linear structural investment

For Harness Plugin Users

If you built your agent team with revfactory/harness or harness-100, HAchilles is your natural next step.

Tool Role Core Question
Harness plugin Build agent teams "What team structure should I create?"
HAchilles Measure harness quality "How well does the team I built actually perform?"

They don't compete — they form a Build → Measure → Improve pipeline.

# After building with Harness, measure quality:
pip install hachilles
hachilles scan .
# → CE·AC·EM scores + improvement prescriptions

→ Full integration guide: docs/harness-integration.md


Installation

# CLI only (minimal)
pip install hachilles

# With web dashboard
pip install "hachilles[web]"

# With LLM analysis (Claude / GPT)
pip install "hachilles[web,llm]"

# Everything (dev + web + llm)
pip install "hachilles[all]"

Development setup:

git clone https://github.com/suhopark1/hachilles.git
cd hachilles
pip install -e ".[dev,web]"
pre-commit install
make test          # 611 tests must pass
hachilles scan .   # Self-audit must be S-Grade

Usage

CLI

# Scan current directory
hachilles scan .

# Scan a specific project
hachilles scan /path/to/your/project

# JSON output — for CI pipelines, scripts, dashboards
hachilles scan . --json

# Generate self-contained HTML report
hachilles scan . --html --out report.html

# LLM-powered over-engineering analysis (requires API key)
hachilles scan . --llm

# Save to history database & track trends over time
hachilles scan . --save-history
hachilles history .

# Auto-generate AGENTS.md from scan results
hachilles generate-agents .

Web Dashboard

hachilles serve              # http://localhost:8000
hachilles serve --port 9000  # custom port
hachilles serve --reload     # dev mode with auto-reload

Open http://localhost:8000 — React SPA with scan history, trend charts, and score breakdown.

REST API

# Scan via API
curl -X POST http://localhost:8000/api/v1/scan \
  -H "Content-Type: application/json" \
  -d '{"path": "/path/to/project"}'

# Health check
curl http://localhost:8000/api/health
Method Endpoint Description
GET /api/health Health check + version
POST /api/v1/scan Scan a project, return full ScanResult
GET /api/v1/history Retrieve scan history (SQLite)
GET /api/v1/compare Compare two scan results
POST /api/v1/generate-agents Generate AGENTS.md

Feature Matrix

Feature Description Version
CLI scan hachilles scan <path> — rich terminal output v1.0
JSON output --json flag for CI/CD integration v1.0
HTML report --html — self-contained SVG gauge, dark theme v2.0
AST analysis Layer violation & circular dependency detection (AC-05) v2.0
LLM analysis AI-powered over-engineering detection (--llm) v2.0
Scan history SQLite-based history tracking & trend visualization v2.0
REST API FastAPI — 5 endpoints, full OpenAPI spec v3.0
Web UI React + TypeScript + Vite SPA v3.0
TypeScript analysis ESLint, tsconfig, test coverage deep detection v3.0
Plugin system BaseAuditorPlugin — extend with custom diagnostic items v3.0
AGENTS.md generator hachilles generate-agents — project-aware output v3.0

Grade Scale

Grade Score Range Meaning
S 90 – 100 Harness engineering best practice. Industry benchmark setter.
A 75 – 89 Robust harness. Production-ready with minor improvements possible.
B 60 – 74 Functional harness. Several improvements recommended.
C 40 – 59 Risk level. Significant issues present — immediate action required.
D 0 – 39 Crisis level. Full harness redesign strongly recommended.

Note: hachilles scan exits with code 1 for Grade C or below (score < 60), enabling CI gates.


Architecture

HAchilles enforces a strict 9-layer unidirectional dependency:

models ← scanner ← auditors ← score ← prescriptions ← report ← cli / api

Reverse-direction imports are forbidden — enforced by pre-commit hooks and CI.

src/hachilles/
├── models/          # ScanResult data model (no deps)
├── scanner/         # File-system + AST scanner
├── auditors/        # CE / AC / EM auditors (3 pillars)
├── score/           # ScoreEngine — 0–100 + grade
├── prescriptions/   # Per-item improvement guidance
├── report/          # Jinja2 HTML report generator
├── llm/             # LLM client + evaluator (optional)
├── tracker/         # SQLite history tracker
├── plugins/         # Plugin registry + base class
├── api/             # FastAPI app + routes
└── cli.py           # Click CLI entry point

→ Full architecture details: docs/architecture.md


Docker

# Build
docker build -t hachilles:3.0.0 .

# Run web dashboard
docker run -p 8000:8000 hachilles:3.0.0

# CLI scan via volume mount (read-only)
docker run --rm \
  -v /path/to/your/project:/workspace:ro \
  hachilles:3.0.0 hachilles scan /workspace

Development

make dev           # Install all dev dependencies
make lint          # ruff check + ruff format --check + mypy
make test          # Full test suite (611 tests)
make test-phase3   # Phase 3 (API + web) tests only
make web-build     # Build React frontend (Vite)
make serve         # Start web server (dev mode)
make build         # Build PyPI-ready distribution package
make clean         # Remove build artifacts

Self-audit before every commit:

hachilles scan .   # Must remain S-Grade (≥ 90 pts)

Roadmap

Track our progress and planned features:

v3.1 — Q2 2026

  • GitHub Actions native integrationhachilles-action for zero-config CI gates
  • Score badge generator — embed live HAchilles badge in any README
  • VS Code extension — inline harness quality indicators while coding
  • Baseline comparisonhachilles scan . --compare-baseline

v3.2 — Q3 2026

  • Team / multi-repo dashboard — aggregate scores across an organization
  • HAchilles Cloud (beta) — hosted scanning with history, trends, and team views
  • Automated prescription PRs — auto-generate fix PRs for common issues
  • Additional language support — TypeScript/JavaScript native scanner (beyond tsconfig detection)

v4.0 — Q4 2026

  • Real-time harness monitoring — watch mode with live score updates
  • Regression alerting — notify when score drops below threshold
  • Enterprise SSO / RBAC — multi-tenant access control
  • Benchmark registry — community-contributed harness quality benchmarks

💡 Have a feature idea? Open a Feature Request — community input directly shapes the roadmap.


Contributing

We welcome contributions of all kinds — bug reports, feature requests, documentation improvements, and code.

Quick guide:

# 1. Fork and clone
git clone https://github.com/YOUR_USERNAME/hachilles.git

# 2. Create a branch
git checkout -b feat/your-feature-name

# 3. Make changes, run checks
make lint && make test
hachilles scan .   # Must remain S-Grade

# 4. Open a Pull Request

→ Full contributing guide: CONTRIBUTING.md → Branch protection rules and workflow: docs/branch-protection.md

All PRs must maintain the S-Grade self-audit score (≥ 90 pts). HAchilles measures itself with itself.


Security Policy

⚠️ Please do NOT open a public GitHub issue for security vulnerabilities.

If you discover a security vulnerability in HAchilles, please report it privately by emailing:

📧 suhopark1@gmail.com

Include: a description of the vulnerability, steps to reproduce, potential impact, and your suggested fix if available.

We follow coordinated disclosure: we will acknowledge your report within 48 hours, assess it within 5 business days, and publish a fix before any public disclosure. You will be credited in the release notes (with your permission).

→ Full security policy: SECURITY.md


Community & Standards

Resource Description
STANDARDS.md Public CE·AC·EM diagnostic criteria — the measurement specification
docs/whitepaper.md Scoring algorithm, rationale, and research background
docs/architecture.md 9-layer architecture and dependency rules
CONTRIBUTING.md How to contribute code, docs, or diagnostic items
SECURITY.md Vulnerability reporting policy and supported versions
CHANGELOG.md Version history: v1.0.0 → v2.0.0 → v3.0.0
GitHub Discussions Questions, ideas, community Q&A (Korean welcome)

GitHub Topics: harness-quality · harness-diagnostics · ai-agent · llm · context-engineering · fastapi · cli


License

Copyright 2026 Park Sung Hoon (박성훈) <suhopark1@gmail.com>

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

See LICENSE and NOTICE for full terms and third-party attributions.


HAchilles is itself a meta-example of harness engineering.

Run hachilles scan . on this repository. Current result: 100 pts · S-Grade

Build with Harness. Measure with HAchilles.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hachilles-3.0.1.tar.gz (106.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hachilles-3.0.1-py3-none-any.whl (96.6 kB view details)

Uploaded Python 3

File details

Details for the file hachilles-3.0.1.tar.gz.

File metadata

  • Download URL: hachilles-3.0.1.tar.gz
  • Upload date:
  • Size: 106.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for hachilles-3.0.1.tar.gz
Algorithm Hash digest
SHA256 7456ce1b290707b80b68e2c28991aaffaad8272a9a251e7d7bc22ae88fda99ca
MD5 5664c84cc8b3a3f90ed39d3235022199
BLAKE2b-256 dc579e5a95b148a3881a5befad209d149e297919fe7e6d03aa73dd1d9436fa87

See more details on using hashes here.

File details

Details for the file hachilles-3.0.1-py3-none-any.whl.

File metadata

  • Download URL: hachilles-3.0.1-py3-none-any.whl
  • Upload date:
  • Size: 96.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for hachilles-3.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7a56e29b00a543974989b41467364eb3ab02271ee44424f6aa70548d1d0f5897
MD5 ddf7af2e3923863842d7a1b8494e5532
BLAKE2b-256 9194fa51df50d218a8667cbe2a486e2fd909d2fcee448c298a66f85c3215ab3e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page