Project-tailored Claude Code / Cursor / Codex harness generator. Grade-gated review + auto-fix, multi-reviewer consensus, edit-preserving block-merge upgrades, anti-rot crawlers, worktree isolation, version-drift detection, and a 3-layer ai-readiness rubric with ranked improvement actions.
Project description
harness-maker
English · 한국어
A harness that knows your project — and stays that way.
Interview-shaped. Grade-gated. Self-evolving. Multi-IDE.
Why · How it fits · Quickstart · Features · How it works · Slash commands · Comparison · Configuration · Targets · FAQ · Roadmap · Deep dive
Why harness-maker?
Most AI coding setups start from a generic template and drift from day one — same reviewer set on every project, same prompts on every stack, same defaults that nobody ever revisits. harness-maker takes the opposite stance: the harness is shaped by your project, and it keeps that shape as the project moves.
| Principle | What this means in practice | |
|---|---|---|
| 🎯 | Personalized | Profiler reads 12+ stack/framework/CI signals. Interview locks 10+ dimensions. A Side experiment and a Production service get structurally different harnesses — different reviewer sets, different workflow stages, different security gates. No generic defaults silently shipped. |
| 🛡️ | Trusted | Every /hm:execute runs in a fresh worktree with a TDD loop. /hm:review doesn't just report — it applies consensus-passed fixes and re-reviews until grade ≥ A. Mechanical checks (lint/tests) gate the LLM reviewer before any token is spent. |
| 🌱 | Self-evolving | Hand-edit any agent, skill, CLAUDE.md — block-merge markers preserve your edits across --update. Memory accumulates project-specific patterns; recurring failures auto-propose new guardrails. |
| 🌀 | Anti-rot | Weekly crawl across 4 sources (Anthropic, GitHub releases, arXiv, OSV CVEs). Adaptive relevance filter learns from your accept/reject history. Always manual-confirmed — no silent auto-apply path exists. |
| 🎛️ | Multi-IDE | Claude Code + Cursor + Codex. One harness.yaml, three target-native renders. Existing Cursor rules / Aider config / Copilot instructions get absorbed on first run — no manual port. |
How it fits your project
SENSE → DECIDE → RENDER → EVOLVE
Four steps. Each is a question harness-maker answers for you — once, then continuously.
1. SENSE — What kind of project is this?
The profiler scans concrete signals before asking you anything. Every detection is tagged with a confidence (HIGH / MEDIUM / LOW) that decides whether the default is applied silently, prompted, or skipped.
| Signal | Detection source |
|---|---|
| Stack (12+) | pyproject.toml, package.json, Cargo.toml, go.mod, pubspec.yaml, … + extension counts |
| Scale | Total source files, monorepo depth, presence of apps/ or packages/ |
| Lifecycle | Git commit velocity, branch protection rules, CI workflow presence |
| Frameworks | React / Vue / FastAPI / Django / Tauri / Zephyr / Next.js / Tailwind / Pydantic … (dependency-parsed, not keyword-guessed) |
| Package manager | uv · poetry · pnpm · yarn · cargo · go mod · gradle · maven |
| CI provider | GitHub Actions · GitLab CI · CircleCI · Bitbucket Pipelines |
| Foreign AI config | Pre-existing .cursor/rules/, AGENTS.md, CLAUDE.md, .continue/, .aider.conf.yml, .github/copilot-instructions.md |
Result: A 24-hour cached
ProjectProfileyou can inspect withharness-maker profile . --jsonbefore any interview runs.
2. DECIDE — What harness does this project need?
A short interview locks the dimensions that shape every downstream render. Re-runs silently reuse prior answers; explicit --reinterview re-prompts.
| Dimension | Choices | What it changes |
|---|---|---|
| Preset | Side · Production |
Reviewer count (1 vs 5), workflow stage count, security gate depth, verify-required flag |
| Dev mode | task-driven · spec-driven |
Whether SPEC stage is mandatory; whether plan stages chain into execute |
| Targets | claude-code · cursor · codex (multi-select) |
Which IDE-native asset trees are rendered |
| Locale | en · ko · any tag |
Interview text + user-facing error messages |
| Workflows | Fused sequences from atomic stages | Which /hm:<name> slash commands appear |
| Reviewers / skills | Preset defaults + overrides | Which agents and skills install |
| Ref folders | Path + glob pairs | Which external docs are searchable via refdocs-search skill |
| Sibling repos | Relative paths | Which adjacent repos share the same harness session |
| Second Brain | Obsidian vault path + project_id | Where cross-session memory writes |
| Recommended model | claude-opus-4-7 default |
The model frontmatter on every generated agent |
Result:
.claude/harness.yaml— a single source of truth that survives upgrades.
3. RENDER — How does this become a working harness?
One source tree, three IDE-native renders, all from a single harness.yaml:
.claude/ ← single source: agents, skills, commands/hm/, hooks, memory, observability
├── .cursor/ ← + Cursor-native rules, hooks, mcp.json (if cursor target)
├── .codex/ ← + Codex-native config.toml, agent TOMLs (if codex target)
├── AGENTS.md ← + Codex root instructions (if codex target)
└── .agents/skills/ ← + Codex skill paths (if codex target)
Layered on top of the base render:
- Domain packs —
--add-domain python(ornode,rust) grafts stack-specific standards, agents, and skills onto the harness. Custom domains scaffold as stubs. - Foreign config absorption — pre-existing Cursor rules, Aider config, Copilot instructions get LLM-translated into
harness.yamlaxes.@hm:harness:*inverted markers keep them synced on re-renders. - Sibling repos — frontend + backend + library share one harness session via relative-path bindings in
harness.yaml. Cross-machine portable. - Ref folders + Second Brain — registered project docs + Obsidian vault notes become searchable from any stage.
Result: Every IDE you use sees the same agents, the same skills, the same workflows — natively. Zero manual porting.
4. EVOLVE — How does the harness stay useful?
Three independent feedback loops keep the harness aligned with your project as it grows.
A. Your edits survive upgrades.
Hand-tune agents/code-reviewer.md. Add a custom skill. Edit a CLAUDE.md section. All survive harness-maker make --update because content hashes per file plus @hm:user:* block-merge markers separate your edits from template-owned regions. @hm:harness:* inverted markers do the opposite (for foreign config: preserve outside, replace inside).
B. The harness learns your project.
.claude/memory/wiki.md— patterns, conventions, gotchas. Each/hm:wrapupappends new entries..claude/memory/failures.md— recurring mistakes, deduplicated by slug + count..claude/memory/session/<date>.md— non-obvious decisions per workday.- When any failure slug reaches count ≥ 3, wrapup writes a proposal to
pending-proposals.md— a new skill, rule, or hook that would have prevented the recurrence. The user reviews and decides whether to ingest.
C. Defaults adapt to your overrides.
Override telemetry (100% local) tracks every time you change a default. /hm:health scores three layers — detection→recommendation conversion, override stability, audit cadence — and surfaces a Bronze → Silver → Gold → Platinum tier with ranked action items. Drift the harness too far from your real usage and the audit will tell you exactly which default is wrong.
Result: The harness improves with the project, not against it. No silent re-litigation of decisions you've already made.
For the mechanics behind each step — full procedures, decision paths, internal invariants — see docs/HOW-IT-WORKS.md.
Table of Contents
- Why harness-maker?
- How it fits your project
- Quickstart
- Requirements
- Features
- How it works
- Slash commands the harness exposes
- How it compares
- Configuration
- Targets
- Reconcile rules
- Observability
- Marketplace
- FAQ
- Roadmap
- Development
- Contributing
- License
Quickstart
Universal Bootstrap Prompt
Paste into your AI agent — it detects whether you're in Claude Code, Cursor, or Codex, runs the right plugin install for that IDE, and bootstraps the harness end-to-end. One copy, every supported IDE.
Install harness-maker for this project and bootstrap the harness end-to-end.
You are an AI agent running inside one of: Claude Code, Cursor, or Codex CLI.
Detect which one (silently — do NOT ask me), run the matching plugin install,
then drive /harness-maker:make + /hm:health.
Step 1 — Detect the host IDE (silent):
- Claude Code → $CLAUDE_CODE is set, or ~/.claude/ exists, or you have
access to slash commands like /plugin
- Cursor → $CURSOR_SESSION is set, or ~/.cursor/ exists, or you have
access to Cursor's plugin manager
- Codex CLI → $CODEX_SESSION is set, or ~/.codex/ exists, or the
`codex` command is on PATH
Step 2 — Install harness-maker as a plugin (skip if `/harness-maker:make` is
already available):
IF Claude Code:
/plugin marketplace add Ecro/harness-maker
/plugin install harness-maker@harness-maker
(reload Claude Code if prompted)
IF Codex CLI:
codex plugin marketplace add Ecro/harness-maker
# marketplace add IS install for Codex — no separate install step
# then run /plugins inside Codex to enable
IF Cursor (public marketplace does not yet support direct GitHub install):
git clone https://github.com/Ecro/harness-maker.git \
~/.cursor/plugins/local/harness-maker
# tell me to reload the window (Ctrl+Shift+P → "Reload Window")
IF you can't tell which IDE (or none of the above), STOP and ask me which
IDE you're in.
Step 3 — Run /harness-maker:make and follow the interview:
• Confirm preset (Side / Production), dev mode, target IDEs, and locale.
• Accept the recommended defaults unless I object.
Step 4 — Run /hm:health after the harness renders:
Produces a 3-section dashboard at .claude/observability/dashboard.md
(structural / external-risks / personalization). Tell me the personalization
tier and any high-priority action items.
Conduct the conversation in the language of my first reply; default to English
if you cannot tell.
Manual install — pick one for your IDE
Claude Code (plugin marketplace)
/plugin marketplace add Ecro/harness-maker
/plugin install harness-maker@harness-maker
The marketplace metadata and the plugin both ship in this repo, so two commands cover discovery + install. Reload Claude Code; /harness-maker:make becomes available.
Codex CLI (plugin marketplace)
codex plugin marketplace add Ecro/harness-maker
# pin a specific release for reproducibility:
codex plugin marketplace add Ecro/harness-maker --ref v0.14.3
marketplace add is the install — there is no separate install step in Codex. Then run /plugins inside Codex to enable.
Cursor (Team marketplace import OR local symlink)
Cursor's public plugin marketplace is curated (cursor.com/marketplace/publish, submit-and-review). Direct GitHub install is still on the roadmap as of Cursor 2.5+. Two working paths today:
A. Team marketplace (Team / Enterprise plans):
Dashboard → Settings → Plugins → Team Marketplaces → Import → paste https://github.com/Ecro/harness-maker → push to team members.
B. Local dev install (community pattern):
git clone https://github.com/Ecro/harness-maker.git ~/.cursor/plugins/local/harness-maker
Reload Cursor (Ctrl+Shift+P → "Reload Window") to pick up the new plugin.
PyPI (no IDE plugin — CI, headless, scripts)
For environments where no IDE plugin path applies (CI scripts, headless servers, automation, or strong preference for a Python CLI tool):
uv tool install harness-maker # POSIX / macOS / WSL
# Native Windows / PowerShell:
irm https://astral.sh/uv/install.ps1 | iex
uv tool install harness-maker
Then:
cd your-project
harness-maker profile . --json
harness-maker make . --preset Production --locale en --targets claude-code,cursor
This path is fine for one-off renders but skips the in-IDE slash-command surface — /hm:health, /hm:plan, /hm:execute, etc. only appear after the IDE plugin loads them.
After the harness is rendered, re-run with flags to evolve it:
harness-maker make . --audit # score the existing .claude/ against the rubric
harness-maker make . --add NAME # graft on a single skill/agent/command
harness-maker make . --remove NAME # surgically remove one component
harness-maker make . --promote NAME # move an ad-hoc artifact into the harness
Requirements
| Dependency | Notes |
|---|---|
Python 3.12+ and uv |
Required wherever Claude Code runs against your project — even if your project's primary language is Rust, Node, or Go. Hooks invoke uv run python -m harness_maker.gates.*; without uv they are silent no-ops. |
| Claude Code CLI (plugin + hook support) | Optional for plugin mode (claude --plugin-dir /path/to/harness-maker). Not required — the harness-maker CLI can generate harnesses independently. |
| Cursor IDE 2.4+ (3.2+ recommended) | Optional. Reads .claude/agents/, .claude/skills/, and .claude/commands/hm/*.md natively (verified empirically in 0.6.2, re-confirmed 0.7.1 — see tests/cursor-compat/results-2026-05-08.md). Hooks render to a separate .cursor/hooks.json with Cursor-native schema; both files are emitted when targets includes cursor. Cursor 3.0+ adds native /worktree and /best-of-n which coexist safely with harness-maker's prefix-matched cleanup. |
| OpenAI Codex CLI | Optional. When targets includes codex, harness-maker renders Codex-native .codex/ config, AGENTS.md, and .agents/skills/ assets from the same workflow definitions. |
| Git | Required for worktree isolation (every /hm:execute and /hm:loop run). |
Features
Grouped by what they do for your project, not by component.
🎯 Personalization — fits your project
- Project-shaped harness. Profiler + 10-dimension interview produce structurally different harnesses for Side experiments vs Production services. Reviewer count, workflow stages, security gate depth, mechanical-check enforcement — all derived from your answers.
- 12+ stack detection. Python · Node · Rust · Java · Kotlin · Swift · Dart · Ruby · PHP · C# · Elixir · Scala · C/C++ · Zig · Haskell. Framework + package-manager + CI provider parsed from manifests, not keyword-guessed. 24h cache ceiling on manifest mtime.
- Confidence-bucketed defaults. Every detection declares HIGH/MEDIUM/LOW confidence. HIGH → silent default with
# detected:provenance. MEDIUM → explicit interview prompt. LOW → skipped (you decide). Regression test guards against surprise silent-default changes between minor versions. - Adaptive personalization tier.
/hm:healthscores three layers (detection→recommendation conversion, override stability, audit cadence) and reports Bronze → Silver → Gold → Platinum tier. Auto-proposes default changes when one axis has been overridden 5+ times. 100% local telemetry; nothing leaves the project. - Domain packs.
--add-domain python(ornode,rust) grafts stack-specific standards, agents, and skills. Custom domains scaffold as stubs — teams add their own without forking harness-maker. - Single command, no subcommand sprawl.
/harness-maker:makeis the only entry point. Everything else is a flag (--audit,--add,--remove,--promote,--add-domain,--reinterview,--update).
🛡️ Trust — grade-gated work
- Grade-based auto-fix loop.
/hm:reviewdoesn't just report. It applies consensus-passed fixes → re-reviews (selectively, only reviewers whose scope was touched) → regrades, until grade ≥ threshold (default A) ormax_review_roundsis exhausted. Weak-consensus and manual-only findings are never auto-applied. - Mechanical pre-checks before any LLM token. Lint clean + tests green are enforced before reviewers spawn. First non-zero exit emits
## MECHANICAL_BLOCK: <cmd> exit=<N>and halts.--no-auto-fixdoes not skip this gate. - Conditional reviewer routing.
.envchange → security-reviewer./perf/→ performance-reviewer..tsx→ ux-reviewer. Async/locking → concurrency-reviewer. 10× cheaper than fanning out to every reviewer on every diff. - 2-pass redaction (+47pp precision). Pass 1 strips PR metadata so findings can't anchor on author/title. Pass 2 restores full context — reviewers must validate or drop each Pass 1 finding. Ablation-measured precision gain on anchoring-prone diffs.
- 7 security gates.
secrets·permissions(settings.json over-grant) ·hook-injection(AST scan forrm -rf,curl|sh,eval) ·CVEs(OSV.dev) ·hallucination(AST scan for non-existent imports) ·prod-name guard·prompt-injection. Findings stay local in.claude/observability/security/. - Privilege separation. Reviewer agents deny
Write/Edit/ interpreterBashcalls. Executor agents allow only.worktrees/**writes + paired Edit/Write denies on system paths. Defense-in-depth survives prompt injection, agent compromise, and tool_input poisoning. - Worktree isolation per run. Every execute runs in a fresh
git worktree. Failed runs preserve evidence; successful runs auto-cleanup with prefix-match (Cursor's own worktrees never touched).
🌱 Self-evolving — grows with your project
- Block-merge preservation. Hand-tune any agent, skill, CLAUDE.md section. Survives
--updatebecause content hashes per file plus@hm:user:*markers separate your edits from template-owned regions.@hm:harness:*inverted markers do the opposite for foreign config absorption. - Three-tier memory accumulation.
wiki.md(patterns) ·failures.md(recurring mistakes deduplicated by slug) ·session/<date>.md(non-obvious decisions). Wrapup writes these automatically; every stage reads them automatically. - Self-improving failure proposals. When a
[fail:*]slug recurs 3× across sessions, wrapup writes a proposal topending-proposals.md— a new skill, rule, or hook that would have prevented the recurrence. You review and decide whether to ingest. - ADR system as binding execute constraints. Architecture Decision Records promoted during
/hm:planare hard constraints on/hm:execute. Conflicts surface as blockers, never silently proceed. Future sessions don't re-litigate settled decisions. - Refdocs search. Register architecture docs, API specs, design docs in
harness.yaml.refdocs-searchskill gives lossless full-text search — no chunking, no RAG index. - Optional Second Brain. Obsidian vault integration. Allowlisted write folders by note type (decision · preference · failure · project · reference · journal). Cross-session memory survives plugin reinstalls and machine moves.
- Brownfield-safe upgrades.
Reconcilerhashes existing.claude/and offers per-conflict keep/replace/both. Apply is ADD-only with timestamped backups. User edits never silently overwritten.
🌀 Anti-rot — survives the ecosystem
- 4-source weekly crawl. Anthropic blog/changelog ·
anthropics/claude-codeGitHub releases · arxiv cs.SE/CL/CR · OSV.dev CVEs. Manifested aspendingitems in/hm:health. - Adaptive relevance filter. Threshold starts at 0.7, adjusts ±0.05 based on your accept/reject history per source. Always manual-confirmed — no
--auto-applypath exists. - Unified health audit.
/hm:healthcomposites three orthogonal layers — Structural (70% deterministic + 25% LLM rubric + 5% cache diagnostic), External Risks (anti-rot pending queue), Personalization (Bronze→Platinum tier). One 3-section dashboard at.claude/observability/dashboard.md. No auto-apply ever (ADR-001). - SessionStart drift reminder. Hook fires on every session open and warns if running plugin version differs from the version that rendered the harness — so you notice when
/plugin updateneeds a re-render. Detector compares against latest cached plugin version (not just imported__version__). - Cache-miss classification. Prompt-cache diagnostic reports
min_threshold·invalidation·ttl·first(cold start). 5% weight in AI-readiness distinguishes cold-start misses (benign) from structural misses (actionable).
🎛️ Multi-IDE — one source, every IDE
- Three targets from one harness. Claude Code + Cursor (2.4+, 3.0+ recommended) + Codex CLI. Single
.claude/source tree; Cursor reads it natively plus its own.cursor/for hooks and rules; Codex reads.codex/,AGENTS.md, and.agents/skills/. All rendered from oneharness.yaml. - Foreign AI config migration. Detects 6 known foreign configs (
.cursor/rules/,AGENTS.md,CLAUDE.md,.continue/config.json,.aider.conf.yml,.github/copilot-instructions.md). LLM-translates them into harness.yaml axes.@hm:harness:*inverted markers keep them synced across re-renders. - Sibling repos. Frontend + backend + library bind into one harness session via relative paths in harness.yaml. Cross-machine portable (relative paths survive
git clone).
🔁 Workflow primitives — the rest of the toolchain
- Deep interview before every implementation.
/hm:specruns a 6-category interview (Intent → Outcomes → In-Scope Scenarios → Non-Goals → Constraints → Verification) scored for completeness./hm:planruns a 9-category interview (scope → architecture → contract → risk → testing → phasing → dependencies → failure handling → observability) in impact order. Every settled decision promotes to a binding ADR. - Autoloop with adaptive interview + 4-gate convergence.
/hm:loopruns time-and-iteration-bounded loops.autoloop-driverreads the goal, asks only what's missing, locks intensity + exit checklist, then requires mechanical checks + LLM judgment + regression comparison + 2-iter convergence streak before accepting completion. - 3-tier context loading + compaction recovery. Hot tier (today's session) · Warm tier (failures + wiki first 60/40 lines) · Cold tier (git log / PLANs on demand).
PreCompacthook flushes session before context compaction; next turn detects the marker and resumes from the last in-progress phase. - Cross-process memory safety.
.claude/memory/writers serialise via re-entrant POSIX flock. Telemetry hooks append atomically via rawos.write()onO_APPEND(single-syscall, ≤PIPE_BUF) so concurrent Claude Code + Cursor sessions cannot interleave JSONL lines.
For the complete mechanics — all procedures, decision paths, internal invariants — see docs/HOW-IT-WORKS.md.
How it works
flowchart TD
A["/harness-maker:make"] --> B["Profile\n(stack, scale, lifecycle)"]
B --> C["Interview\n(preset + 10 dims + targets)"]
C --> D["Synthesize\n(deterministic Blueprint)"]
D --> E["Render\n(Jinja2 + provenance frontmatter)"]
E --> F{Brownfield?}
F -- No --> G["Write .claude/ directly"]
F -- Yes --> H["Reconcile\n(hash-based keep/replace/both)"]
H --> G
G --> I["Extra targets?\nRender .cursor/ and/or .codex/ assets"]
I --> J["User runs /hm:* commands"]
J --> K["Weekly /hm:health (Step 2)\n4-source anti-rot crawl\n→ manual confirm"]
14 mechanisms (M1-M14) back every feature. See docs/ARCHITECTURE.md for the full breakdown including the privilege-separation model, security gate triggers, and reconcile invariants.
Since 0.12.0 the synthesis pipeline also threads through a typed Recommendation registry: every recommend_<axis>(profile, project_dir) function declares per-detection confidence (ADR-007), and the interview.py dispatcher routes by bucket. Detection results land in ~/.cache/harness-maker/profile-<repo-hash>.json with manifest-mtime + 24h-ceiling invalidation. Foreign AI config files detected at /hm:configure time can be imported into harness.yaml and re-rendered single-source with @hm:harness:* inverted block markers (ADR-009).
Slash commands the harness exposes
After install, the rendered harness exposes commands under /hm:*:
Atomic stages (always available)
| Command | Purpose |
|---|---|
/hm:research |
Gather facts, user-workflow signals, best practices, and options |
/hm:spec |
Write acceptance criteria from research |
/hm:plan |
Decompose spec into phases with exit criteria |
/hm:execute |
Implement with TDD + worktree isolation |
/hm:review |
Multi-reviewer consensus (mechanical pre-check → conditional routing) |
/hm:wrapup |
Clean, document, commit |
/hm:verify |
6-check gate before completion |
Fused workflows (preset-generated, user-renameable)
Fused workflows combine atomic stages into a single command. The interview generates a starter set; you can add, remove, or rename them in harness.yaml.
| Preset | Default fused workflows |
|---|---|
| Side | /hm:plan-exec-rev · /hm:exec-rev · /hm:exec-rev-wrap (default) |
| Production | /hm:exec-rev-wrap-ver (default) · /hm:exec-rev-wrap · /hm:plan-exec-rev · /hm:exec-rev · /hm:res-spec-plan |
Utility commands
| Command | Purpose |
|---|---|
/hm:loop "<goal>" |
Autoloop driver — feature or improve mode, time/iter-bounded |
/hm:ai-readiness |
3-layer readiness score + P0/P1/P2 ranked actions |
/hm:personalization-audit |
Composite-score rubric (Bronze/Silver/Gold/Platinum) from telemetry + harness.yaml + ProjectProfile. Reads .claude/observability/adaptive/overrides.jsonl; outputs ranked ActionItem list with evidence schema. ADR-011 (v0); calibration deferred to 30+ project sample. |
/hm:refresh |
Anti-rot crawl — manual confirm required |
How it compares
Other Claude Code harnesses pick a niche; harness-maker is the meta-tool that builds them — and then keeps them current.
| Project | Scope | What harness-maker adds |
|---|---|---|
| ohmyclaudecode | Curated commands/agents bundle | Project-tailored synthesis (preset + 10 override dims), brownfield reconcile, provenance frontmatter, anti-rot pipeline |
| superpowers | Powerful sub-agents and workflows | Single-command entry, AI-readiness scoring, worktree isolation by default, privilege-separated reviewer/executor |
| Archon | Knowledge-base + RAG-backed planning | Stack/scale/lifecycle profiler, atomic+fused workflow engine, conditional reviewer routing, 7 security gates |
| aider | Terminal pair programmer, LLM-agnostic | Claude Code / Cursor / Codex native assets; harness output is the runtime (not the session); anti-rot keeps it current |
| ouroboros | Autonomous self-bootstrapping AI software factory | Project-shaped interview (not one pipeline for all), grade-gated reviews with mechanical pre-checks, anti-rot pipeline, brownfield reconcile, multi-target support |
Hand-rolled .claude/ |
Full control, zero automation | Drift detection via provenance hash, weekly anti-rot crawl, AI-readiness scoring, worktree isolation — without writing it yourself |
The core difference: harness-maker generates and owns the lifecycle of your .claude/ directory. A static tool gives you a starting point. harness-maker gives you a starting point that knows who it was created for and can be updated without losing your changes.
Configuration
The interview writes answers to .claude/harness.yaml. Key dimensions:
preset: Production # Side | Production
locale: en # en | ko | <any — unknown falls back to en>
dev_mode: spec-driven # spec-driven | task-driven
targets: # which runtimes to drive
- claude-code
- cursor
- codex
recommended_model: claude-opus-4-7
ref_folders:
- path: ../architecture-docs
glob: "**/*.{md,txt,pdf}"
second_brain:
enabled: true
backend: filesystem
project_id: my-app
vault_path: ../obsidian-vault
trusted_allowlist: true # configured write folders need no confirmation/backup
required_frontmatter: [type, created, updated, tags, links]
folders:
- path: Projects/my-app
read: true
write: true
note_types: [decision, preference, failure, project, reference, journal]
sibling_repos:
- ../backend
reviewers:
enabled: [code, security, performance, ux, concurrency]
routing: conditional # conditional | always-all
mechanical_checks: # pre-LLM stop-on-first gate (optional)
- ruff check .
- uv run pytest tests/unit -x -q
worktree:
scope: [execute, plan] # which stages run in a fresh worktree
cleanup: on_success # on_success | always | never
anti_rot:
enabled: true
threshold: 0.7 # adaptive — adjusts ±0.05 based on accept/reject ratio
context_lint:
strict: true # warn on overrun | block
memory:
files: [failures.md, wiki.md]
adaptive:
disable_telemetry: false # opt-out per ADR-005; 100% local capture
audit_session_threshold: 30 # SessionStart hint after N axis overrides
audit_days_threshold: 14 # SessionStart hint after N days without audit
All adaptive features are 100% local. tests/unit/test_no_network.py asserts no socket call during telemetry emit, audit, or SessionStart hook execution (ADR-005 positive obligation).
Run /harness-maker:make again and choose Update (same settings, pick up template improvements) or Full reconfigure to change any dimension.
Obsidian Second Brain
second_brain connects a Markdown/Obsidian vault as a typed knowledge graph for
stage-aware memory. hm-research, hm-plan, hm-review, and hm-wrapup use
different note types (reference, project, decision, preference,
failure, journal) instead of loading the whole vault. Writes are full
Markdown writes inside configured write: true folders only. Configure narrow
folders: the allowlist is trusted completely, with no per-write confirmation or
backup. For multiple projects sharing one vault, give every project a distinct
project_id and put writable folders under a path segment with the same value
(for example Projects/my-app). harness-maker rejects writable Second Brain
folders that do not include the configured project_id, which keeps one
project from writing into another project's note namespace.
Setup walkthrough
- At interview time — when prompted for
vault_path, supply the absolute path to your Obsidian vault root (or a not-yet-created subfolder of it; the harness creates the subfolder on first write iff the parent has.obsidian/). The interview then asks forproject_idand a writable folder, defaulting to99_HM/{project_id}/(matches the99_*/01_*numeric-prefix organization style). - Post-install adjustment — run
/hm:configureto revisit Second Brain settings. The slash command dispatches toharness-maker configure-second-brain --check, which inspects state and returns guidance JSON so the slash command can prompt only when something is missing. To add a folder non-interactively, callharness-maker configure-second-brain --add-folder 99_HM/my-project/. - Existing harnesses upgraded from a pre-fix release — the loader now
tolerates
folders: [](logs a one-shot warning + remediation hint pointing at/hm:configure); writes raiseSecondBrainErrorwith the same hint instead of an obscure "not under a configured write folder" message.
Internals: the rendered harness.yaml carries a provenance frontmatter block;
all readers route through harness_maker.io_utils.load_harness_yaml (the
canonical multi-document-tolerant loader) to avoid the parser-strategy drift
that produced the original bug.
Targets
targets is a multi-select. Choose claude-code, cursor, codex, or any combination. Claude Code is the default; Cursor and Codex add runtime-specific assets while preserving the same preset, workflows, skills, reviewers, and safety model.
Cursor target
Run /harness-maker:make and pick targets: [cursor] or [claude-code, cursor] at the interview. The renderer adds:
.cursor/rules/harness.mdc— always-on workflow rules with Cursor-legal frontmatter (description/globs: []/alwaysApply: true).cursor/hooks.json— Cursor-native hooks schema (lowercase camelCase keys,version: 1, flat{matcher, command}). Deliberately different from.claude/hooks/hooks.json(PascalCase, nested{hooks:[…], matcher}); each IDE reads only its own file. Don't try to collapse them — Cursor will silently stop firing hooks. Seetests/cursor-compat/results-2026-05-08.mdfor the kairos 0.5.7 forensic that proved this empirically..cursor/mcp.json— Cursor MCP server config. Populated fromharness.yaml.mcp_servers(0.6.2+); defaults to{"mcpServers": {}}when no servers are configured. Inner shape (command,args,env) is type-validated on parse with a warning log when entries are dropped.
.claude/agents/, .claude/skills/, and .claude/commands/hm/ are single-source — Cursor 2.4+ reads them natively (forensic-verified). Hooks are the only asset that requires per-IDE rendering because the schemas diverge by design.
Recommended model
harness.yaml.recommended_model defaults to claude-opus-4-7 and propagates to agent frontmatter. Cursor users may override model selection in their IDE. The harness does not rewrite prompts to be model-agnostic — <thinking> blocks and Claude-specific patterns are preserved deliberately.
Verification
Per-release Cursor compatibility is tracked in tests/cursor-compat/:
MANUAL_CHECKLIST.md— A1–A4 (agent dispatch, hook fire, skill auto-discovery, slash command + Q&A loop) covering both IDEsRESULTS.md— PASS/FAIL/PARTIAL grid you fill while running the checklistresults-2026-05-08.md— kairos 0.5.7 forensic that resolved Q-A (hooks discovery) and Q-B (commands discovery) without an IDE-driven manual run; future Cursor verifications append a new datedresults-*.mdfixture/— minimal.claude/for opening directly in either IDE
Automated CI guards the dual-schema invariants regardless of manual fixture runs:
test_cursor_hooks_uses_lowercase_native_schema— fails if.cursor/hooks.jsonaccidentally adopts Claude PascalCasetest_no_cursor_commands_rendered— fails if a future change starts emitting.cursor/commands/hm-*.mdmirrors (Cursor reads.claude/commands/natively)test_render_agents_have_structured_permissions_frontmatter— fails if any agent template loses itspermissions.allow/denyblock (Cursor 2.5+ subagent permission inheritance gap)
Codex target
Run /harness-maker:make and pick targets: [codex] or include codex with the other targets. The renderer adds:
AGENTS.md— Codex's top-level project instruction file, rendered without YAML frontmatter so Codex displays clean instructions..codex/config.tomland.codex/agents/*.toml— Codex-native configuration and agent registrations..codex/hooks.json— Codex hook wiring, including Codex-specific permission events and file-edit tool matchers..agents/skills/*/SKILL.md— the existing harness skills plus stage, workflow, and loop trigger skills in the layout Codex discovers.
Codex TOML files are rendered as pure TOML, not markdown-with-frontmatter. AGENTS.md uses block-merge markers so user additions survive re-renders, while .codex/*.toml files are regenerated from the selected target configuration.
Structured questions in Codex: Stage and loop skills instruct the agent to use the request_user_input tool for interview questions. This tool is available in Plan mode by default. To enable it in Code mode, add default_mode_request_user_input = true under [features] in .codex/config.toml. If the tool is unavailable at runtime, the agent falls back to asking in its response text.
Reconcile rules (re-rendering an existing harness)
Re-running /harness-maker:make on an existing harness uses a hash-driven KEEP rule: if a file's content_hash frontmatter matches the new template's hash, it's "ours" — safe to overwrite. If it differs (user-edited), it's "theirs" — kept.
Trade-off: when harness-maker bumps a template on a minor release, the existing file's hash no longer matches, so reconcile picks KEEP even if you didn't edit the file.
To pick up template updates after a version bump:
rm .claude/harness.yaml
/harness-maker:make
# (previous .claude/ is auto-backed up to .backup-<ISO>/)
.cursor/rules/*.mdc follow the same KEEP behavior. A future phase introduces a sidecar .hm-meta.yaml so harness-maker can hash-track Cursor assets without polluting Cursor frontmatter.
@hm:harness:* inverted markers (0.12.0) — for foreign-AI-config files we generate post-import (.cursor/rules/*.mdc, AGENTS.md, CLAUDE.md, etc.), content INSIDE @hm:harness:<id> markers is harness-owned (replaced on every render); content OUTSIDE is user-owned (byte-for-byte preserved). The block-merge parser dispatches per file extension: HTML comments for .md/.mdc, # @hm:harness: for .yml/.yaml, top-level _hm_harness JSON key for .json. 0.11.x files (frontmatter generated_by: harness-maker + zero @hm:harness:* markers) are treated as wholly harness-owned on first encounter post-upgrade and re-rendered into the new marker family (ADR-009 amendment).
Observability
All observability is 100% local — nothing is transmitted externally.
| File | Contents |
|---|---|
.claude/observability/dashboard.md |
AI-readiness score, dimension breakdown, ranked action items |
.claude/observability/metrics-YYYY-MM-DD.jsonl |
Per-turn telemetry (cache hit %, tool calls, durations) — date-rotated daily (ADR-103, 0.7.1). Pre-0.7.1 metrics.jsonl is read as the trailing legacy shard. |
.claude/observability/refresh/raw-*.jsonl |
Anti-rot crawl evidence (accepted / rejected items) |
.claude/observability/security/findings-*.jsonl |
7-gate security scan findings |
.claude/observability/adaptive/overrides.jsonl |
harness_yaml_override events with schema_version: 1, dual capture sites (/hm:configure exit primary + SessionStart secondary), dedup-keyed |
.claude/observability/adaptive/last-audit.txt |
Last /hm:personalization-audit run timestamp |
Run /hm:ai-readiness to regenerate the dashboard on demand.
Marketplace
Marketplace manifests are present for each runtime:
.claude-plugin/plugin.json— Claude Code plugin spec.cursor-plugin/plugin.json— Cursor Marketplace spec.codex-plugin/plugin.json— Codex plugin spec
Listings are pending. Until then, install locally:
# Any IDE — CLI install (recommended)
uv tool install harness-maker # from PyPI (when published)
uv tool install /path/to/harness-maker # pre-PyPI, from clone
harness-maker make . --targets claude-code,cursor,codex
# Claude Code — plugin mode (optional, enables /harness-maker:make command)
claude --plugin-dir /path/to/harness-maker
# Cursor — open the repo folder directly in Cursor as a workspace plugin
# Codex — render Codex-native assets in your project harness
harness-maker make . --targets codex
FAQ
Q: Why Python? My project is Rust / Node / Go.
The hooks (permission_gate, worktree_gate, telemetry) call uv run python -m harness_maker.* at PreToolUse / PostToolUse boundaries. This doesn't touch your project's toolchain — uv and harness_maker need to be on the path, but only to run hooks. Your project's build system is untouched.
Q: Why does it require uv?
uv gives a hermetic, fast Python environment without polluting the system or your project's virtualenv. Hooks run in milliseconds without activating anything.
Q: Will harness-maker overwrite my hand-edits when I re-render?
No. Every generated file carries a content_hash in its provenance frontmatter. Re-render compares the new template's hash against the file on disk. If they differ — meaning you edited the file — it keeps yours. See Reconcile rules.
Q: What's the difference between Side and Production?
Side is lean: 1 reviewer (code), verify-before-completion optional, worktree scope [execute]. Production is thorough: 5 reviewers, verify required, worktree scope [execute, plan], security on high-finding = block. Both share the same anti-rot and caching defaults.
Q: Does anti-rot ever auto-apply?
Never. Every anti-rot item surfaces via a structured question (AskQuestion in Cursor, AskUserQuestion in Claude Code) in /hm:refresh. There is no --auto-apply flag and no plan to add one. The rationale: a wrong patch is worse than a stale harness.
Q: Can I use only Claude Code? Only Cursor? Only Codex?
Yes. targets is a multi-select at the interview. Claude Code uses .claude/; Cursor reuses most .claude/ assets and adds .cursor/; Codex adds AGENTS.md, .codex/, and .agents/skills/.
Q: Do my prompts or telemetry leave my machine?
No. metrics.jsonl, dashboard, and security findings are written to .claude/observability/ locally. Anti-rot crawls read public sources (Anthropic blog, arxiv, GitHub, OSV.dev) but never uploads anything.
Q: How do I pick up template improvements after a /plugin update?
Run /harness-maker:make → choose Update. For files where your hash matches the old template, the new version is applied. For files you edited (hash mismatch), yours is kept. To force a full refresh, rm .claude/harness.yaml and re-run.
Q: What are mechanical_checks and when should I use them?
Shell commands listed under reviewers.mechanical_checks in harness.yaml run at the start of every /hm:review — before any LLM reviewer spawns. The first command that exits non-zero emits ## MECHANICAL_BLOCK: <cmd> exit=<N> and halts review immediately (CHANGES_REQUESTED). Use them for fast, deterministic gates (lint, type-check, unit test) that shouldn't waste LLM tokens when the basics are broken. The list is user-managed; harness-maker never populates it automatically. --no-auto-fix does not skip mechanical checks — they are a hard gate, not part of the fix loop.
Q: Why doesn't harness-maker rewrite prompts to be model-agnostic?
The prompts are tuned for claude-opus-4-7 — <thinking> blocks, role framing, chain-of-thought structure. Rewriting for model-neutrality would degrade quality on the recommended model for hypothetical gains on others. Override recommended_model in harness.yaml if you want a different model; the prompts remain as-is.
Personalization Architecture
harness-maker 0.12.0 introduces three tracks of personalization depth:
-
Detection Depth (Track A) —
harness_maker.profile.profile()detects stack (12+ languages), framework, package manager, and CI provider. Results land inProjectProfileand feed the recommendation framework. Cache at~/.cache/harness-maker/profile-<repo-hash>.json(manifest-mtime invalidation + 24h TTL). -
Foreign AI Config Migration (Track D) —
/hm:configuredetects.cursor/rules/,AGENTS.md,CLAUDE.md,.continue/config.json,.aider.conf.yml, and.github/copilot-instructions.md. With user confirmation, harness-maker imports the foreign config's intent intoharness.yamland re-generates the foreign file as part of the harness on every render. The@hm:harness:*block markers protect user customizations outside the marked regions.Cursor power-user constraint (ADR-003) — single-source means harness re-generates
.cursor/rules/on every render. If you prefer Cursor-only ownership of those rules, the only opt-out today is to drop thecursortarget fromharness.yaml.targets. A dedicated opt-out flag is TODO for a future PLAN. -
Adaptive (Track B start) —
harness.yaml.adaptive.disable_telemetry: false(opt-out per ADR-005) enables override telemetry./hm:personalization-auditscores your harness composite (0–100; Bronze < 40 < Silver < 65 < Gold < 85 ≤ Platinum) and surfaces ranked action items with evidence. SessionStart drift hint fires after 30 overrides or 14 days without an audit.
All adaptive features run 100% locally — no network calls
(asserted by tests/unit/test_no_network.py).
0.12.1 patch extended CACHED_MANIFESTS with literal filenames from STACK_GLOB_MANIFESTS (stack.yaml, package.yaml) so Haskell projects now properly invalidate the profile cache on those manifests' mtime bump.
Roadmap
Done (0.12.0–0.12.1):
- Track A (Detection Depth): 12+ stacks/frameworks/pkg-mgr/CI
- Track D (Foreign AI Config Migration): 6 config types, single-source re-render,
@hm:harness:*markers, 0.11.x migration - Track B-start (Adaptive): override telemetry,
/hm:personalization-audit, SessionStart drift surface
Next (0.13.0 — Track B completion + Cursor opt-out):
- Track B-extra: B2 permission-frequency capture + B3 reviewer-signal aggregation
harness.yaml.cursor.opt_out_renderflag for Cursor power-users (ADR-003 documented constraint)github/spec-kitexternal e2e fixture vendoring (currently pytest.skip with TODO)
Future (0.14.0+ — Track C strategic expansion, ranked by user value):
- C2 privacy/regulated tier (HIPAA/PCI/GDPR — enterprise entry barrier)
- C1 team-profile axis (solo vs small-team vs large-team)
- C3 code-style detection (docstring conventions, naming patterns)
- C4 cross-project user-defaults
- C5 per-developer overlay in shared harness
Deferred-by-data:
- Rubric v0 calibration after 30+ projects accumulate
/hm:personalization-auditruns (passive trigger)
Standing items:
- PyPI publish — remove the editable-from-clone requirement.
- Claude Code + Cursor + Codex Marketplace listings — submit all plugin manifests.
.hm-meta.yamlsidecar for Cursor assets — enable hash-tracking of.cursor/rules/*.mdcwithout polluting Cursor frontmatter.- User-configurable anti-rot repo list —
harness.yaml.anti_rot.github_reposto track additional Claude Code ecosystem repos beyond the default. - Demo screencast — record a first-install +
/hm:loopsession. Enterprisepreset — stricter security gates, mandatory spec-driven dev mode.
Development
uv sync
uv run pytest # full suite
uv run ruff check src/ tests/ # lint
uv run ruff format src/ tests/ # format
uv run mypy --strict src/ # type check
bash .claude-verify.sh all # phase-by-phase exit criteria + final acceptance
Contributing
See docs/CONTRIBUTING.md for adding skills/agents/presets, test patterns, and the PR checklist (including the 4-file version bump invariant).
See docs/ARCHITECTURE.md for the 14 mechanisms (M1-M14) behind the system.
License
MIT — see LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file harness_maker-0.17.0.tar.gz.
File metadata
- Download URL: harness_maker-0.17.0.tar.gz
- Upload date:
- Size: 389.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
50ec33963b3253d9dae9d922907e562aa5f62bee054084c1ccfee6245afb03df
|
|
| MD5 |
dcac21cfd915df20285df13897ba8c91
|
|
| BLAKE2b-256 |
a14deff76091421de40ac9378b0f6c5c50ab49e181a750c7ff02c48eabdda141
|
File details
Details for the file harness_maker-0.17.0-py3-none-any.whl.
File metadata
- Download URL: harness_maker-0.17.0-py3-none-any.whl
- Upload date:
- Size: 480.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ca405dcd1c8a7d5ece7dc6e5cf589f914828b00e40c33d7d6bdf6f5530bfdf8
|
|
| MD5 |
001d0c8142175edd194fc5d76ba83926
|
|
| BLAKE2b-256 |
0b1a1897f89d398d5714303b641d0ea719c110436a3f744f4625c169c9e7f19c
|