The definitive standard for AI agent harness quality — 0–100 quantitative diagnostics (CE·AC·EM). Official tool by the author of 'Practical Harness Engineering'. Build with Harness, measure with HAchilles.

These details have not been verified by PyPI

Project links

Project description

HAchilles 🦾

The definitive standard for AI agent harness quality measurement.

0–100 quantitative diagnostics across 3 pillars · 15 items · 5 failure patterns

Official tool by the author of "Practical Harness Engineering" (실전 하네스 엔지니어링)

한국어 요약 | HAchilles는 AI 에이전트 하네스 품질을 0~100점으로 정량 측정하는 오픈소스 CLI/API 도구입니다. CE(컨텍스트 설계)·AC(아키텍처 제약)·EM(엔트로피 관리) 3대 기둥으로 15개 항목을 진단합니다.

Overview
Quick Start
3-Pillar Framework
5 Failure Patterns
For Harness Plugin Users
Installation
Usage
Feature Matrix
Grade Scale
Architecture
Docker
Development
Roadmap
Contributing
Security Policy
Community & Standards
License

Overview

HAchilles answers a question that no other tool asks:

"How well-engineered is the harness surrounding your AI agent?"

Industry data shows only 34% of AI agent projects succeed in production. The root cause in 80%+ of failures is not the model, not the API cost, and not the prompts in isolation — it is the harness: the context design, architectural constraints, and entropy management that define how reliably an AI agent operates.

HAchilles quantifies harness quality on a 0–100 scale using a 3-pillar, 15-item diagnostic framework, with automated detection of 5 critical failure patterns.

Quick Start

pip install hachilles
hachilles scan .

╭──────────────────────────────────╮
│   HAchilles Diagnostic Report    │
│   /your/project  ·  Grade: A     │
╰──────────────────────────────────╯

  Score: 84 / 100

  Context Engineering    28 / 40   ████████████░░░
  Architectural Const    31 / 35   ███████████████░
  Entropy Management     25 / 25   ████████████████

  ⚠ Failure Patterns:  Context Drift (MEDIUM)

CI gate: Exit code 1 when score < 60 (Grade C or below).

3-Pillar Framework

Pillar	Code	Weight	What It Measures
Context Engineering	CE	40 pts	System prompt quality, context window management, tool definitions, few-shot examples, context consistency
Architectural Constraint	AC	35 pts	Tool access control, loop prevention, output validation layers, fallback design, human-in-the-loop checkpoints
Entropy Management	EM	25 pts	State management complexity, dependency control, error propagation isolation, observability, version drift prevention

Each pillar contains 5 diagnostic items (CE-01~~05, AC-01~~05, EM-01~05) scored individually and aggregated.

5 Failure Patterns

HAchilles detects and measures the risk level of 5 failure patterns identified across real-world AI agent projects:

Pattern	Severity	Pillar	Description
Context Drift	🔴 CRITICAL	CE	System prompt / context loses consistency over time due to accumulated ad-hoc changes
AI Slop	🟠 HIGH	CE + AC	Agent produces plausible-sounding but valueless output due to underspecified tool definitions
Entropy Explosion	🟡 MEDIUM	EM	Agent complexity grows uncontrollably — no one fully understands the system anymore
Over-Engineering Trap	🟡 MEDIUM	AC + EM	System complexity far exceeds actual use cases, creating maintenance debt without ROI
70-80 Wall	🟢 LOW	All	Score plateaus in the 70–80 range; further improvement requires non-linear structural investment

For Harness Plugin Users

If you built your agent team with revfactory/harness or harness-100, HAchilles is your natural next step.

Tool	Role	Core Question
Harness plugin	Build agent teams	"What team structure should I create?"
HAchilles	Measure harness quality	"How well does the team I built actually perform?"

They don't compete — they form a Build → Measure → Improve pipeline.

# After building with Harness, measure quality:
pip install hachilles
hachilles scan .
# → CE·AC·EM scores + improvement prescriptions

→ Full integration guide: docs/harness-integration.md

Installation

# CLI only (minimal)
pip install hachilles

# With web dashboard
pip install "hachilles[web]"

# With LLM analysis (Claude / GPT)
pip install "hachilles[web,llm]"

# Everything (dev + web + llm)
pip install "hachilles[all]"

Development setup:

git clone https://github.com/suhopark1/hachilles.git
cd hachilles
pip install -e ".[dev,web]"
pre-commit install
make test          # 611 tests must pass
hachilles scan .   # Self-audit must be S-Grade

Usage

CLI

# Scan current directory
hachilles scan .

# Scan a specific project
hachilles scan /path/to/your/project

# JSON output — for CI pipelines, scripts, dashboards
hachilles scan . --json

# Generate self-contained HTML report
hachilles scan . --html --out report.html

# LLM-powered over-engineering analysis (requires API key)
hachilles scan . --llm

# Save to history database & track trends over time
hachilles scan . --save-history
hachilles history .

# Auto-generate AGENTS.md from scan results
hachilles generate-agents .

Web Dashboard

hachilles serve              # http://localhost:8000
hachilles serve --port 9000  # custom port
hachilles serve --reload     # dev mode with auto-reload

Open http://localhost:8000 — React SPA with scan history, trend charts, and score breakdown.

REST API

# Scan via API
curl -X POST http://localhost:8000/api/v1/scan \
  -H "Content-Type: application/json" \
  -d '{"path": "/path/to/project"}'

# Health check
curl http://localhost:8000/api/health

Method	Endpoint	Description
`GET`	`/api/health`	Health check + version
`POST`	`/api/v1/scan`	Scan a project, return full ScanResult
`GET`	`/api/v1/history`	Retrieve scan history (SQLite)
`GET`	`/api/v1/compare`	Compare two scan results
`POST`	`/api/v1/generate-agents`	Generate AGENTS.md

Feature Matrix

Feature	Description	Version
CLI scan	`hachilles scan <path>` — rich terminal output	v1.0
JSON output	`--json` flag for CI/CD integration	v1.0
HTML report	`--html` — self-contained SVG gauge, dark theme	v2.0
AST analysis	Layer violation & circular dependency detection (AC-05)	v2.0
LLM analysis	AI-powered over-engineering detection (`--llm`)	v2.0
Scan history	SQLite-based history tracking & trend visualization	v2.0
REST API	FastAPI — 5 endpoints, full OpenAPI spec	v3.0
Web UI	React + TypeScript + Vite SPA	v3.0
TypeScript analysis	ESLint, tsconfig, test coverage deep detection	v3.0
Plugin system	`BaseAuditorPlugin` — extend with custom diagnostic items	v3.0
AGENTS.md generator	`hachilles generate-agents` — project-aware output	v3.0

Grade Scale

Grade	Score Range	Meaning
S	90 – 100	Harness engineering best practice. Industry benchmark setter.
A	75 – 89	Robust harness. Production-ready with minor improvements possible.
B	60 – 74	Functional harness. Several improvements recommended.
C	40 – 59	Risk level. Significant issues present — immediate action required.
D	0 – 39	Crisis level. Full harness redesign strongly recommended.

Note: hachilles scan exits with code 1 for Grade C or below (score < 60), enabling CI gates.

Architecture

HAchilles enforces a strict 9-layer unidirectional dependency:

models ← scanner ← auditors ← score ← prescriptions ← report ← cli / api

Reverse-direction imports are forbidden — enforced by pre-commit hooks and CI.

src/hachilles/
├── models/          # ScanResult data model (no deps)
├── scanner/         # File-system + AST scanner
├── auditors/        # CE / AC / EM auditors (3 pillars)
├── score/           # ScoreEngine — 0–100 + grade
├── prescriptions/   # Per-item improvement guidance
├── report/          # Jinja2 HTML report generator
├── llm/             # LLM client + evaluator (optional)
├── tracker/         # SQLite history tracker
├── plugins/         # Plugin registry + base class
├── api/             # FastAPI app + routes
└── cli.py           # Click CLI entry point

→ Full architecture details: docs/architecture.md

Docker

# Build
docker build -t hachilles:3.0.0 .

# Run web dashboard
docker run -p 8000:8000 hachilles:3.0.0

# CLI scan via volume mount (read-only)
docker run --rm \
  -v /path/to/your/project:/workspace:ro \
  hachilles:3.0.0 hachilles scan /workspace

Development

make dev           # Install all dev dependencies
make lint          # ruff check + ruff format --check + mypy
make test          # Full test suite (611 tests)
make test-phase3   # Phase 3 (API + web) tests only
make web-build     # Build React frontend (Vite)
make serve         # Start web server (dev mode)
make build         # Build PyPI-ready distribution package
make clean         # Remove build artifacts

Self-audit before every commit:

hachilles scan .   # Must remain S-Grade (≥ 90 pts)

Roadmap

Track our progress and planned features:

v3.1 — Q2 2026

GitHub Actions native integration — hachilles-action for zero-config CI gates
Score badge generator — embed live HAchilles badge in any README
VS Code extension — inline harness quality indicators while coding
Baseline comparison — hachilles scan . --compare-baseline

v3.2 — Q3 2026

Team / multi-repo dashboard — aggregate scores across an organization
HAchilles Cloud (beta) — hosted scanning with history, trends, and team views
Automated prescription PRs — auto-generate fix PRs for common issues
Additional language support — TypeScript/JavaScript native scanner (beyond tsconfig detection)

v4.0 — Q4 2026

Real-time harness monitoring — watch mode with live score updates
Regression alerting — notify when score drops below threshold
Enterprise SSO / RBAC — multi-tenant access control
Benchmark registry — community-contributed harness quality benchmarks

💡 Have a feature idea? Open a Feature Request — community input directly shapes the roadmap.

Contributing

We welcome contributions of all kinds — bug reports, feature requests, documentation improvements, and code.

Quick guide:

# 1. Fork and clone
git clone https://github.com/YOUR_USERNAME/hachilles.git

# 2. Create a branch
git checkout -b feat/your-feature-name

# 3. Make changes, run checks
make lint && make test
hachilles scan .   # Must remain S-Grade

# 4. Open a Pull Request

→ Full contributing guide: CONTRIBUTING.md → Branch protection rules and workflow: docs/branch-protection.md

All PRs must maintain the S-Grade self-audit score (≥ 90 pts). HAchilles measures itself with itself.

Security Policy

⚠️ Please do NOT open a public GitHub issue for security vulnerabilities.

If you discover a security vulnerability in HAchilles, please report it privately by emailing:

📧 suhopark1@gmail.com

Include: a description of the vulnerability, steps to reproduce, potential impact, and your suggested fix if available.

We follow coordinated disclosure: we will acknowledge your report within 48 hours, assess it within 5 business days, and publish a fix before any public disclosure. You will be credited in the release notes (with your permission).

→ Full security policy: SECURITY.md

Community & Standards

Resource	Description
STANDARDS.md	Public CE·AC·EM diagnostic criteria — the measurement specification
docs/whitepaper.md	Scoring algorithm, rationale, and research background
docs/architecture.md	9-layer architecture and dependency rules
CONTRIBUTING.md	How to contribute code, docs, or diagnostic items
SECURITY.md	Vulnerability reporting policy and supported versions
CHANGELOG.md	Version history: v1.0.0 → v2.0.0 → v3.0.0
GitHub Discussions	Questions, ideas, community Q&A (Korean welcome)

GitHub Topics: harness-quality · harness-diagnostics · ai-agent · llm · context-engineering · fastapi · cli

License

Copyright 2026 Park Sung Hoon (박성훈) <suhopark1@gmail.com>

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

See LICENSE and NOTICE for full terms and third-party attributions.

HAchilles is itself a meta-example of harness engineering.

Run hachilles scan . on this repository. Current result: 100 pts · S-Grade ✅

Build with Harness. Measure with HAchilles.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

3.0.1

Apr 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hachilles-3.0.1.tar.gz (106.3 kB view details)

Uploaded Apr 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hachilles-3.0.1-py3-none-any.whl (96.6 kB view details)

Uploaded Apr 3, 2026 Python 3

File details

Details for the file hachilles-3.0.1.tar.gz.

File metadata

Download URL: hachilles-3.0.1.tar.gz
Upload date: Apr 3, 2026
Size: 106.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for hachilles-3.0.1.tar.gz
Algorithm	Hash digest
SHA256	`7456ce1b290707b80b68e2c28991aaffaad8272a9a251e7d7bc22ae88fda99ca`
MD5	`5664c84cc8b3a3f90ed39d3235022199`
BLAKE2b-256	`dc579e5a95b148a3881a5befad209d149e297919fe7e6d03aa73dd1d9436fa87`

See more details on using hashes here.

File details

Details for the file hachilles-3.0.1-py3-none-any.whl.

File metadata

Download URL: hachilles-3.0.1-py3-none-any.whl
Upload date: Apr 3, 2026
Size: 96.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for hachilles-3.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7a56e29b00a543974989b41467364eb3ab02271ee44424f6aa70548d1d0f5897`
MD5	`ddf7af2e3923863842d7a1b8494e5532`
BLAKE2b-256	`9194fa51df50d218a8667cbe2a486e2fd909d2fcee448c298a66f85c3215ab3e`

See more details on using hashes here.

hachilles 3.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

HAchilles 🦾

Table of Contents

Overview

Quick Start

3-Pillar Framework

5 Failure Patterns

For Harness Plugin Users

Installation

Usage

CLI

Web Dashboard

REST API

Feature Matrix

Grade Scale

Architecture

Docker

Development

Roadmap

v3.1 — Q2 2026

v3.2 — Q3 2026

v4.0 — Q4 2026

Contributing

Security Policy

Community & Standards

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes