Skip to main content

Project-tailored Claude Code / Cursor / Codex harness generator. Grade-gated review + auto-fix, multi-reviewer consensus, edit-preserving block-merge upgrades, anti-rot crawlers, worktree isolation, version-drift detection, and a 3-layer ai-readiness rubric with ranked improvement actions.

Project description

harness-maker

harness-maker

License: MIT Python 3.12+ Claude Code plugin Cursor 2.4+ (3.2+ rec) Built with uv

English · 한국어

A harness that knows your project — and stays that way.

Interview-shaped. Grade-gated. Self-evolving. Multi-IDE.

Why · How it fits · Quickstart · Features · How it works · Slash commands · Comparison · Configuration · Targets · FAQ · Roadmap · Deep dive


Why harness-maker?

Most AI coding setups start from a generic template and drift from day one — same reviewer set on every project, same prompts on every stack, same defaults that nobody ever revisits. harness-maker takes the opposite stance: the harness is shaped by your project, and it keeps that shape as the project moves.

Principle What this means in practice
🎯 Personalized Profiler reads 12+ stack/framework/CI signals. Interview locks 10+ dimensions. A Side experiment and a Production service get structurally different harnesses — different reviewer sets, different workflow stages, different security gates. No generic defaults silently shipped.
🛡️ Trusted Every /hm:execute runs in a fresh worktree with a TDD loop. /hm:review doesn't just report — it applies consensus-passed fixes and re-reviews until grade ≥ A. Mechanical checks (lint/tests) gate the LLM reviewer before any token is spent.
🌱 Self-evolving Hand-edit any agent, skill, CLAUDE.md — block-merge markers preserve your edits across --update. Memory accumulates project-specific patterns; recurring failures auto-propose new guardrails.
🌀 Anti-rot Weekly crawl across 4 sources (Anthropic, GitHub releases, arXiv, OSV CVEs). Adaptive relevance filter learns from your accept/reject history. Always manual-confirmed — no silent auto-apply path exists.
🎛️ Multi-IDE Claude Code + Cursor + Codex. One harness.yaml, three target-native renders. Existing Cursor rules / Aider config / Copilot instructions get absorbed on first run — no manual port.

How it fits your project

SENSE  →  DECIDE  →  RENDER  →  EVOLVE

Four steps. Each is a question harness-maker answers for you — once, then continuously.

1. SENSE — What kind of project is this?

The profiler scans concrete signals before asking you anything. Every detection is tagged with a confidence (HIGH / MEDIUM / LOW) that decides whether the default is applied silently, prompted, or skipped.

Signal Detection source
Stack (12+) pyproject.toml, package.json, Cargo.toml, go.mod, pubspec.yaml, … + extension counts
Scale Total source files, monorepo depth, presence of apps/ or packages/
Lifecycle Git commit velocity, branch protection rules, CI workflow presence
Frameworks React / Vue / FastAPI / Django / Tauri / Zephyr / Next.js / Tailwind / Pydantic … (dependency-parsed, not keyword-guessed)
Package manager uv · poetry · pnpm · yarn · cargo · go mod · gradle · maven
CI provider GitHub Actions · GitLab CI · CircleCI · Bitbucket Pipelines
Foreign AI config Pre-existing .cursor/rules/, AGENTS.md, CLAUDE.md, .continue/, .aider.conf.yml, .github/copilot-instructions.md

Result: A 24-hour cached ProjectProfile you can inspect with harness-maker profile . --json before any interview runs.


2. DECIDE — What harness does this project need?

A short interview locks the dimensions that shape every downstream render. Re-runs silently reuse prior answers; explicit --reinterview re-prompts.

Dimension Choices What it changes
Preset Side · Production Reviewer count (1 vs 5), workflow stage count, security gate depth, verify-required flag
Dev mode task-driven · spec-driven Whether SPEC stage is mandatory; whether plan stages chain into execute
Targets claude-code · cursor · codex (multi-select) Which IDE-native asset trees are rendered
Locale en · ko · any tag Interview text + user-facing error messages
Workflows Fused sequences from atomic stages Which /hm:<name> slash commands appear
Reviewers / skills Preset defaults + overrides Which agents and skills install
Ref folders Path + glob pairs Which external docs are searchable via refdocs-search skill
Sibling repos Relative paths Which adjacent repos share the same harness session
Second Brain Obsidian vault path + project_id Where cross-session memory writes
Recommended model claude-opus-4-7 default The model frontmatter on every generated agent

Result: .claude/harness.yaml — a single source of truth that survives upgrades.


3. RENDER — How does this become a working harness?

One source tree, three IDE-native renders, all from a single harness.yaml:

.claude/                 ← single source: agents, skills, commands/hm/, hooks, memory, observability
├── .cursor/             ← + Cursor-native rules, hooks, mcp.json     (if cursor target)
├── .codex/              ← + Codex-native config.toml, agent TOMLs    (if codex target)
├── AGENTS.md            ← + Codex root instructions                  (if codex target)
└── .agents/skills/      ← + Codex skill paths                        (if codex target)

Layered on top of the base render:

  • Domain packs--add-domain python (or node, rust) grafts stack-specific standards, agents, and skills onto the harness. Custom domains scaffold as stubs.
  • Foreign config absorption — pre-existing Cursor rules, Aider config, Copilot instructions get LLM-translated into harness.yaml axes. @hm:harness:* inverted markers keep them synced on re-renders.
  • Sibling repos — frontend + backend + library share one harness session via relative-path bindings in harness.yaml. Cross-machine portable.
  • Ref folders + Second Brain — registered project docs + Obsidian vault notes become searchable from any stage.

Result: Every IDE you use sees the same agents, the same skills, the same workflows — natively. Zero manual porting.


4. EVOLVE — How does the harness stay useful?

Three independent feedback loops keep the harness aligned with your project as it grows.

A. Your edits survive upgrades. Hand-tune agents/code-reviewer.md. Add a custom skill. Edit a CLAUDE.md section. All survive harness-maker make --update because content hashes per file plus @hm:user:* block-merge markers separate your edits from template-owned regions. @hm:harness:* inverted markers do the opposite (for foreign config: preserve outside, replace inside).

B. The harness learns your project.

  • .claude/memory/wiki.md — patterns, conventions, gotchas. Each /hm:wrapup appends new entries.
  • .claude/memory/failures.md — recurring mistakes, deduplicated by slug + count.
  • .claude/memory/session/<date>.md — non-obvious decisions per workday.
  • When any failure slug reaches count ≥ 3, wrapup writes a proposal to pending-proposals.md — a new skill, rule, or hook that would have prevented the recurrence. The user reviews and decides whether to ingest.

C. Defaults adapt to your overrides. Override telemetry (100% local) tracks every time you change a default. /hm:health scores three layers — detection→recommendation conversion, override stability, audit cadence — and surfaces a Bronze → Silver → Gold → Platinum tier with ranked action items. Drift the harness too far from your real usage and the audit will tell you exactly which default is wrong.

Result: The harness improves with the project, not against it. No silent re-litigation of decisions you've already made.

For the mechanics behind each step — full procedures, decision paths, internal invariants — see docs/HOW-IT-WORKS.md.


Table of Contents


Quickstart

Universal Bootstrap Prompt

Paste into your AI agent — it detects whether you're in Claude Code, Cursor, or Codex, runs the right plugin install for that IDE, and bootstraps the harness end-to-end. One copy, every supported IDE.

Install harness-maker for this project and bootstrap the harness end-to-end.

You are an AI agent running inside one of: Claude Code, Cursor, or Codex CLI.
Detect which one (silently — do NOT ask me), run the matching plugin install,
then drive /harness-maker:make + /hm:health.

Step 1 — Detect the host IDE (silent):
  - Claude Code  → $CLAUDE_CODE is set, or ~/.claude/ exists, or you have
                   access to slash commands like /plugin
  - Cursor       → $CURSOR_SESSION is set, or ~/.cursor/ exists, or you have
                   access to Cursor's plugin manager
  - Codex CLI    → $CODEX_SESSION is set, or ~/.codex/ exists, or the
                   `codex` command is on PATH

Step 2 — Install harness-maker as a plugin (skip if `/harness-maker:make` is
already available):

  IF Claude Code:
    /plugin marketplace add Ecro/harness-maker
    /plugin install harness-maker@harness-maker
    (reload Claude Code if prompted)

  IF Codex CLI:
    codex plugin marketplace add Ecro/harness-maker
    # marketplace add IS install for Codex — no separate install step
    # then run /plugins inside Codex to enable

  IF Cursor (public marketplace does not yet support direct GitHub install):
    git clone https://github.com/Ecro/harness-maker.git \
      ~/.cursor/plugins/local/harness-maker
    # tell me to reload the window (Ctrl+Shift+P → "Reload Window")

  IF you can't tell which IDE (or none of the above), STOP and ask me which
  IDE you're in.

Step 3 — Run /harness-maker:make and follow the interview:
  • Confirm preset (Side / Production), dev mode, target IDEs, and locale.
  • Accept the recommended defaults unless I object.

Step 4 — Run /hm:health after the harness renders:
  Produces a 3-section dashboard at .claude/observability/dashboard.md
  (structural / external-risks / personalization). Tell me the personalization
  tier and any high-priority action items.

Conduct the conversation in the language of my first reply; default to English
if you cannot tell.
Manual install — pick one for your IDE

Claude Code (plugin marketplace)

/plugin marketplace add Ecro/harness-maker
/plugin install harness-maker@harness-maker

The marketplace metadata and the plugin both ship in this repo, so two commands cover discovery + install. Reload Claude Code; /harness-maker:make becomes available.

Codex CLI (plugin marketplace)

codex plugin marketplace add Ecro/harness-maker
# pin a specific release for reproducibility:
codex plugin marketplace add Ecro/harness-maker --ref v0.14.3

marketplace add is the install — there is no separate install step in Codex. Then run /plugins inside Codex to enable.

Cursor (Team marketplace import OR local symlink)

Cursor's public plugin marketplace is curated (cursor.com/marketplace/publish, submit-and-review). Direct GitHub install is still on the roadmap as of Cursor 2.5+. Two working paths today:

A. Team marketplace (Team / Enterprise plans): Dashboard → Settings → Plugins → Team Marketplaces → Import → paste https://github.com/Ecro/harness-maker → push to team members.

B. Local dev install (community pattern):

git clone https://github.com/Ecro/harness-maker.git ~/.cursor/plugins/local/harness-maker

Reload Cursor (Ctrl+Shift+P → "Reload Window") to pick up the new plugin.

PyPI (no IDE plugin — CI, headless, scripts)

For environments where no IDE plugin path applies (CI scripts, headless servers, automation, or strong preference for a Python CLI tool):

uv tool install harness-maker          # POSIX / macOS / WSL
# Native Windows / PowerShell:
irm https://astral.sh/uv/install.ps1 | iex
uv tool install harness-maker

Then:

cd your-project
harness-maker profile . --json
harness-maker make . --preset Production --locale en --targets claude-code,cursor

This path is fine for one-off renders but skips the in-IDE slash-command surface — /hm:health, /hm:plan, /hm:execute, etc. only appear after the IDE plugin loads them.

After the harness is rendered, re-run with flags to evolve it:

harness-maker make . --audit           # score the existing .claude/ against the rubric
harness-maker make . --add NAME        # graft on a single skill/agent/command
harness-maker make . --remove NAME     # surgically remove one component
harness-maker make . --promote NAME    # move an ad-hoc artifact into the harness

Requirements

Dependency Notes
Python 3.12+ and uv Required wherever Claude Code runs against your project — even if your project's primary language is Rust, Node, or Go. Hooks invoke uv run python -m harness_maker.gates.*; without uv they are silent no-ops.
Claude Code CLI (plugin + hook support) Optional for plugin mode (claude --plugin-dir /path/to/harness-maker). Not required — the harness-maker CLI can generate harnesses independently.
Cursor IDE 2.4+ (3.2+ recommended) Optional. Reads .claude/agents/, .claude/skills/, and .claude/commands/hm/*.md natively (verified empirically in 0.6.2, re-confirmed 0.7.1 — see tests/cursor-compat/results-2026-05-08.md). Hooks render to a separate .cursor/hooks.json with Cursor-native schema; both files are emitted when targets includes cursor. Cursor 3.0+ adds native /worktree and /best-of-n which coexist safely with harness-maker's prefix-matched cleanup.
OpenAI Codex CLI Optional. When targets includes codex, harness-maker renders Codex-native .codex/ config, AGENTS.md, and .agents/skills/ assets from the same workflow definitions.
Git Required for worktree isolation (every /hm:execute and /hm:loop run).

Features

Grouped by what they do for your project, not by component.

🎯 Personalization — fits your project

  • Project-shaped harness. Profiler + 10-dimension interview produce structurally different harnesses for Side experiments vs Production services. Reviewer count, workflow stages, security gate depth, mechanical-check enforcement — all derived from your answers.
  • 12+ stack detection. Python · Node · Rust · Java · Kotlin · Swift · Dart · Ruby · PHP · C# · Elixir · Scala · C/C++ · Zig · Haskell. Framework + package-manager + CI provider parsed from manifests, not keyword-guessed. 24h cache ceiling on manifest mtime.
  • Confidence-bucketed defaults. Every detection declares HIGH/MEDIUM/LOW confidence. HIGH → silent default with # detected: provenance. MEDIUM → explicit interview prompt. LOW → skipped (you decide). Regression test guards against surprise silent-default changes between minor versions.
  • Adaptive personalization tier. /hm:health scores three layers (detection→recommendation conversion, override stability, audit cadence) and reports Bronze → Silver → Gold → Platinum tier. Auto-proposes default changes when one axis has been overridden 5+ times. 100% local telemetry; nothing leaves the project.
  • Domain packs. --add-domain python (or node, rust) grafts stack-specific standards, agents, and skills. Custom domains scaffold as stubs — teams add their own without forking harness-maker.
  • Single command, no subcommand sprawl. /harness-maker:make is the only entry point. Everything else is a flag (--audit, --add, --remove, --promote, --add-domain, --reinterview, --update).

🛡️ Trust — grade-gated work

  • Grade-based auto-fix loop. /hm:review doesn't just report. It applies consensus-passed fixes → re-reviews (selectively, only reviewers whose scope was touched) → regrades, until grade ≥ threshold (default A) or max_review_rounds is exhausted. Weak-consensus and manual-only findings are never auto-applied.
  • Mechanical pre-checks before any LLM token. Lint clean + tests green are enforced before reviewers spawn. First non-zero exit emits ## MECHANICAL_BLOCK: <cmd> exit=<N> and halts. --no-auto-fix does not skip this gate.
  • Conditional reviewer routing. .env change → security-reviewer. /perf/ → performance-reviewer. .tsx → ux-reviewer. Async/locking → concurrency-reviewer. 10× cheaper than fanning out to every reviewer on every diff.
  • 2-pass redaction (+47pp precision). Pass 1 strips PR metadata so findings can't anchor on author/title. Pass 2 restores full context — reviewers must validate or drop each Pass 1 finding. Ablation-measured precision gain on anchoring-prone diffs.
  • 7 security gates. secrets · permissions (settings.json over-grant) · hook-injection (AST scan for rm -rf, curl|sh, eval) · CVEs (OSV.dev) · hallucination (AST scan for non-existent imports) · prod-name guard · prompt-injection. Findings stay local in .claude/observability/security/.
  • Privilege separation. Reviewer agents deny Write / Edit / interpreter Bash calls. Executor agents allow only .worktrees/** writes + paired Edit/Write denies on system paths. Defense-in-depth survives prompt injection, agent compromise, and tool_input poisoning.
  • Worktree isolation per run. Every execute runs in a fresh git worktree. Failed runs preserve evidence; successful runs auto-cleanup with prefix-match (Cursor's own worktrees never touched).

🌱 Self-evolving — grows with your project

  • Block-merge preservation. Hand-tune any agent, skill, CLAUDE.md section. Survives --update because content hashes per file plus @hm:user:* markers separate your edits from template-owned regions. @hm:harness:* inverted markers do the opposite for foreign config absorption.
  • Three-tier memory accumulation. wiki.md (patterns) · failures.md (recurring mistakes deduplicated by slug) · session/<date>.md (non-obvious decisions). Wrapup writes these automatically; every stage reads them automatically.
  • Self-improving failure proposals. When a [fail:*] slug recurs 3× across sessions, wrapup writes a proposal to pending-proposals.md — a new skill, rule, or hook that would have prevented the recurrence. You review and decide whether to ingest.
  • ADR system as binding execute constraints. Architecture Decision Records promoted during /hm:plan are hard constraints on /hm:execute. Conflicts surface as blockers, never silently proceed. Future sessions don't re-litigate settled decisions.
  • Refdocs search. Register architecture docs, API specs, design docs in harness.yaml. refdocs-search skill gives lossless full-text search — no chunking, no RAG index.
  • Optional Second Brain. Obsidian vault integration. Allowlisted write folders by note type (decision · preference · failure · project · reference · journal). Cross-session memory survives plugin reinstalls and machine moves.
  • Brownfield-safe upgrades. Reconciler hashes existing .claude/ and offers per-conflict keep/replace/both. Apply is ADD-only with timestamped backups. User edits never silently overwritten.

🌀 Anti-rot — survives the ecosystem

  • 4-source weekly crawl. Anthropic blog/changelog · anthropics/claude-code GitHub releases · arxiv cs.SE/CL/CR · OSV.dev CVEs. Manifested as pending items in /hm:health.
  • Adaptive relevance filter. Threshold starts at 0.7, adjusts ±0.05 based on your accept/reject history per source. Always manual-confirmed — no --auto-apply path exists.
  • Unified health audit. /hm:health composites three orthogonal layers — Structural (70% deterministic + 25% LLM rubric + 5% cache diagnostic), External Risks (anti-rot pending queue), Personalization (Bronze→Platinum tier). One 3-section dashboard at .claude/observability/dashboard.md. No auto-apply ever (ADR-001).
  • SessionStart drift reminder. Hook fires on every session open and warns if running plugin version differs from the version that rendered the harness — so you notice when /plugin update needs a re-render. Detector compares against latest cached plugin version (not just imported __version__).
  • Cache-miss classification. Prompt-cache diagnostic reports min_threshold · invalidation · ttl · first (cold start). 5% weight in AI-readiness distinguishes cold-start misses (benign) from structural misses (actionable).

🎛️ Multi-IDE — one source, every IDE

  • Three targets from one harness. Claude Code + Cursor (2.4+, 3.0+ recommended) + Codex CLI. Single .claude/ source tree; Cursor reads it natively plus its own .cursor/ for hooks and rules; Codex reads .codex/, AGENTS.md, and .agents/skills/. All rendered from one harness.yaml.
  • Foreign AI config migration. Detects 6 known foreign configs (.cursor/rules/, AGENTS.md, CLAUDE.md, .continue/config.json, .aider.conf.yml, .github/copilot-instructions.md). LLM-translates them into harness.yaml axes. @hm:harness:* inverted markers keep them synced across re-renders.
  • Sibling repos. Frontend + backend + library bind into one harness session via relative paths in harness.yaml. Cross-machine portable (relative paths survive git clone).

🔁 Workflow primitives — the rest of the toolchain

  • Deep interview before every implementation. /hm:spec runs a 6-category interview (Intent → Outcomes → In-Scope Scenarios → Non-Goals → Constraints → Verification) scored for completeness. /hm:plan runs a 9-category interview (scope → architecture → contract → risk → testing → phasing → dependencies → failure handling → observability) in impact order. Every settled decision promotes to a binding ADR.
  • Autoloop with adaptive interview + 4-gate convergence. /hm:loop runs time-and-iteration-bounded loops. autoloop-driver reads the goal, asks only what's missing, locks intensity + exit checklist, then requires mechanical checks + LLM judgment + regression comparison + 2-iter convergence streak before accepting completion.
  • 3-tier context loading + compaction recovery. Hot tier (today's session) · Warm tier (failures + wiki first 60/40 lines) · Cold tier (git log / PLANs on demand). PreCompact hook flushes session before context compaction; next turn detects the marker and resumes from the last in-progress phase.
  • Cross-process memory safety. .claude/memory/ writers serialise via re-entrant POSIX flock. Telemetry hooks append atomically via raw os.write() on O_APPEND (single-syscall, ≤PIPE_BUF) so concurrent Claude Code + Cursor sessions cannot interleave JSONL lines.

For the complete mechanics — all procedures, decision paths, internal invariants — see docs/HOW-IT-WORKS.md.


How it works

flowchart TD
    A["/harness-maker:make"] --> B["Profile\n(stack, scale, lifecycle)"]
    B --> C["Interview\n(preset + 10 dims + targets)"]
    C --> D["Synthesize\n(deterministic Blueprint)"]
    D --> E["Render\n(Jinja2 + provenance frontmatter)"]
    E --> F{Brownfield?}
    F -- No --> G["Write .claude/ directly"]
    F -- Yes --> H["Reconcile\n(hash-based keep/replace/both)"]
    H --> G
    G --> I["Extra targets?\nRender .cursor/ and/or .codex/ assets"]
    I --> J["User runs /hm:* commands"]
    J --> K["Weekly /hm:health (Step 2)\n4-source anti-rot crawl\n→ manual confirm"]

14 mechanisms (M1-M14) back every feature. See docs/ARCHITECTURE.md for the full breakdown including the privilege-separation model, security gate triggers, and reconcile invariants.

Since 0.12.0 the synthesis pipeline also threads through a typed Recommendation registry: every recommend_<axis>(profile, project_dir) function declares per-detection confidence (ADR-007), and the interview.py dispatcher routes by bucket. Detection results land in ~/.cache/harness-maker/profile-<repo-hash>.json with manifest-mtime + 24h-ceiling invalidation. Foreign AI config files detected at /hm:configure time can be imported into harness.yaml and re-rendered single-source with @hm:harness:* inverted block markers (ADR-009).


Slash commands the harness exposes

After install, the rendered harness exposes commands under /hm:*:

Atomic stages (always available)

Command Purpose
/hm:research Gather facts, user-workflow signals, best practices, and options
/hm:spec Write acceptance criteria from research
/hm:plan Decompose spec into phases with exit criteria
/hm:execute Implement with TDD + worktree isolation
/hm:review Multi-reviewer consensus (mechanical pre-check → conditional routing)
/hm:wrapup Clean, document, commit
/hm:verify 6-check gate before completion

Fused workflows (preset-generated, user-renameable)

Fused workflows combine atomic stages into a single command. The interview generates a starter set; you can add, remove, or rename them in harness.yaml.

Preset Default fused workflows
Side /hm:plan-exec-rev · /hm:exec-rev · /hm:exec-rev-wrap (default)
Production /hm:exec-rev-wrap-ver (default) · /hm:exec-rev-wrap · /hm:plan-exec-rev · /hm:exec-rev · /hm:res-spec-plan

Utility commands

Command Purpose
/hm:loop "<goal>" Autoloop driver — feature or improve mode, time/iter-bounded
/hm:ai-readiness 3-layer readiness score + P0/P1/P2 ranked actions
/hm:personalization-audit Composite-score rubric (Bronze/Silver/Gold/Platinum) from telemetry + harness.yaml + ProjectProfile. Reads .claude/observability/adaptive/overrides.jsonl; outputs ranked ActionItem list with evidence schema. ADR-011 (v0); calibration deferred to 30+ project sample.
/hm:refresh Anti-rot crawl — manual confirm required

How it compares

Other Claude Code harnesses pick a niche; harness-maker is the meta-tool that builds them — and then keeps them current.

Project Scope What harness-maker adds
ohmyclaudecode Curated commands/agents bundle Project-tailored synthesis (preset + 10 override dims), brownfield reconcile, provenance frontmatter, anti-rot pipeline
superpowers Powerful sub-agents and workflows Single-command entry, AI-readiness scoring, worktree isolation by default, privilege-separated reviewer/executor
Archon Knowledge-base + RAG-backed planning Stack/scale/lifecycle profiler, atomic+fused workflow engine, conditional reviewer routing, 7 security gates
aider Terminal pair programmer, LLM-agnostic Claude Code / Cursor / Codex native assets; harness output is the runtime (not the session); anti-rot keeps it current
ouroboros Autonomous self-bootstrapping AI software factory Project-shaped interview (not one pipeline for all), grade-gated reviews with mechanical pre-checks, anti-rot pipeline, brownfield reconcile, multi-target support
Hand-rolled .claude/ Full control, zero automation Drift detection via provenance hash, weekly anti-rot crawl, AI-readiness scoring, worktree isolation — without writing it yourself

The core difference: harness-maker generates and owns the lifecycle of your .claude/ directory. A static tool gives you a starting point. harness-maker gives you a starting point that knows who it was created for and can be updated without losing your changes.


Configuration

The interview writes answers to .claude/harness.yaml. Key dimensions:

preset: Production           # Side | Production
locale: en                   # en | ko | <any — unknown falls back to en>
dev_mode: spec-driven        # spec-driven | task-driven
targets:                     # which runtimes to drive
  - claude-code
  - cursor
  - codex
recommended_model: claude-opus-4-7

ref_folders:
  - path: ../architecture-docs
    glob: "**/*.{md,txt,pdf}"

second_brain:
  enabled: true
  backend: filesystem
  project_id: my-app
  vault_path: ../obsidian-vault
  trusted_allowlist: true       # configured write folders need no confirmation/backup
  required_frontmatter: [type, created, updated, tags, links]
  folders:
    - path: Projects/my-app
      read: true
      write: true
      note_types: [decision, preference, failure, project, reference, journal]

sibling_repos:
  - ../backend

reviewers:
  enabled: [code, security, performance, ux, concurrency]
  routing: conditional       # conditional | always-all
  mechanical_checks:         # pre-LLM stop-on-first gate (optional)
    - ruff check .
    - uv run pytest tests/unit -x -q

worktree:
  scope: [execute, plan]     # which stages run in a fresh worktree
  cleanup: on_success        # on_success | always | never

anti_rot:
  enabled: true
  threshold: 0.7             # adaptive — adjusts ±0.05 based on accept/reject ratio

context_lint:
  strict: true               # warn on overrun | block

memory:
  files: [failures.md, wiki.md]

adaptive:
  disable_telemetry: false   # opt-out per ADR-005; 100% local capture
  audit_session_threshold: 30  # SessionStart hint after N axis overrides
  audit_days_threshold: 14     # SessionStart hint after N days without audit

All adaptive features are 100% local. tests/unit/test_no_network.py asserts no socket call during telemetry emit, audit, or SessionStart hook execution (ADR-005 positive obligation).

Run /harness-maker:make again and choose Update (same settings, pick up template improvements) or Full reconfigure to change any dimension.

Obsidian Second Brain

second_brain connects a Markdown/Obsidian vault as a typed knowledge graph for stage-aware memory. hm-research, hm-plan, hm-review, and hm-wrapup use different note types (reference, project, decision, preference, failure, journal) instead of loading the whole vault. Writes are full Markdown writes inside configured write: true folders only. Configure narrow folders: the allowlist is trusted completely, with no per-write confirmation or backup. For multiple projects sharing one vault, give every project a distinct project_id and put writable folders under a path segment with the same value (for example Projects/my-app). harness-maker rejects writable Second Brain folders that do not include the configured project_id, which keeps one project from writing into another project's note namespace.

Setup walkthrough

  1. At interview time — when prompted for vault_path, supply the absolute path to your Obsidian vault root (or a not-yet-created subfolder of it; the harness creates the subfolder on first write iff the parent has .obsidian/). The interview then asks for project_id and a writable folder, defaulting to 99_HM/{project_id}/ (matches the 99_*/01_* numeric-prefix organization style).
  2. Post-install adjustment — run /hm:configure to revisit Second Brain settings. The slash command dispatches to harness-maker configure-second-brain --check, which inspects state and returns guidance JSON so the slash command can prompt only when something is missing. To add a folder non-interactively, call harness-maker configure-second-brain --add-folder 99_HM/my-project/.
  3. Existing harnesses upgraded from a pre-fix release — the loader now tolerates folders: [] (logs a one-shot warning + remediation hint pointing at /hm:configure); writes raise SecondBrainError with the same hint instead of an obscure "not under a configured write folder" message.

Internals: the rendered harness.yaml carries a provenance frontmatter block; all readers route through harness_maker.io_utils.load_harness_yaml (the canonical multi-document-tolerant loader) to avoid the parser-strategy drift that produced the original bug.


Targets

targets is a multi-select. Choose claude-code, cursor, codex, or any combination. Claude Code is the default; Cursor and Codex add runtime-specific assets while preserving the same preset, workflows, skills, reviewers, and safety model.

Cursor target

Run /harness-maker:make and pick targets: [cursor] or [claude-code, cursor] at the interview. The renderer adds:

  • .cursor/rules/harness.mdc — always-on workflow rules with Cursor-legal frontmatter (description / globs: [] / alwaysApply: true)
  • .cursor/hooks.json — Cursor-native hooks schema (lowercase camelCase keys, version: 1, flat {matcher, command}). Deliberately different from .claude/hooks/hooks.json (PascalCase, nested {hooks:[…], matcher}); each IDE reads only its own file. Don't try to collapse them — Cursor will silently stop firing hooks. See tests/cursor-compat/results-2026-05-08.md for the kairos 0.5.7 forensic that proved this empirically.
  • .cursor/mcp.json — Cursor MCP server config. Populated from harness.yaml.mcp_servers (0.6.2+); defaults to {"mcpServers": {}} when no servers are configured. Inner shape (command, args, env) is type-validated on parse with a warning log when entries are dropped.

.claude/agents/, .claude/skills/, and .claude/commands/hm/ are single-source — Cursor 2.4+ reads them natively (forensic-verified). Hooks are the only asset that requires per-IDE rendering because the schemas diverge by design.

Recommended model

harness.yaml.recommended_model defaults to claude-opus-4-7 and propagates to agent frontmatter. Cursor users may override model selection in their IDE. The harness does not rewrite prompts to be model-agnostic — <thinking> blocks and Claude-specific patterns are preserved deliberately.

Verification

Per-release Cursor compatibility is tracked in tests/cursor-compat/:

  • MANUAL_CHECKLIST.md — A1–A4 (agent dispatch, hook fire, skill auto-discovery, slash command + Q&A loop) covering both IDEs
  • RESULTS.md — PASS/FAIL/PARTIAL grid you fill while running the checklist
  • results-2026-05-08.md — kairos 0.5.7 forensic that resolved Q-A (hooks discovery) and Q-B (commands discovery) without an IDE-driven manual run; future Cursor verifications append a new dated results-*.md
  • fixture/ — minimal .claude/ for opening directly in either IDE

Automated CI guards the dual-schema invariants regardless of manual fixture runs:

  • test_cursor_hooks_uses_lowercase_native_schema — fails if .cursor/hooks.json accidentally adopts Claude PascalCase
  • test_no_cursor_commands_rendered — fails if a future change starts emitting .cursor/commands/hm-*.md mirrors (Cursor reads .claude/commands/ natively)
  • test_render_agents_have_structured_permissions_frontmatter — fails if any agent template loses its permissions.allow/deny block (Cursor 2.5+ subagent permission inheritance gap)

Codex target

Run /harness-maker:make and pick targets: [codex] or include codex with the other targets. The renderer adds:

  • AGENTS.md — Codex's top-level project instruction file, rendered without YAML frontmatter so Codex displays clean instructions.
  • .codex/config.toml and .codex/agents/*.toml — Codex-native configuration and agent registrations.
  • .codex/hooks.json — Codex hook wiring, including Codex-specific permission events and file-edit tool matchers.
  • .agents/skills/*/SKILL.md — the existing harness skills plus stage, workflow, and loop trigger skills in the layout Codex discovers.

Codex TOML files are rendered as pure TOML, not markdown-with-frontmatter. AGENTS.md uses block-merge markers so user additions survive re-renders, while .codex/*.toml files are regenerated from the selected target configuration.

Structured questions in Codex: Stage and loop skills instruct the agent to use the request_user_input tool for interview questions. This tool is available in Plan mode by default. To enable it in Code mode, add default_mode_request_user_input = true under [features] in .codex/config.toml. If the tool is unavailable at runtime, the agent falls back to asking in its response text.


Reconcile rules (re-rendering an existing harness)

Re-running /harness-maker:make on an existing harness uses a hash-driven KEEP rule: if a file's content_hash frontmatter matches the new template's hash, it's "ours" — safe to overwrite. If it differs (user-edited), it's "theirs" — kept.

Trade-off: when harness-maker bumps a template on a minor release, the existing file's hash no longer matches, so reconcile picks KEEP even if you didn't edit the file.

To pick up template updates after a version bump:

rm .claude/harness.yaml
/harness-maker:make
# (previous .claude/ is auto-backed up to .backup-<ISO>/)

.cursor/rules/*.mdc follow the same KEEP behavior. A future phase introduces a sidecar .hm-meta.yaml so harness-maker can hash-track Cursor assets without polluting Cursor frontmatter.

@hm:harness:* inverted markers (0.12.0) — for foreign-AI-config files we generate post-import (.cursor/rules/*.mdc, AGENTS.md, CLAUDE.md, etc.), content INSIDE @hm:harness:<id> markers is harness-owned (replaced on every render); content OUTSIDE is user-owned (byte-for-byte preserved). The block-merge parser dispatches per file extension: HTML comments for .md/.mdc, # @hm:harness: for .yml/.yaml, top-level _hm_harness JSON key for .json. 0.11.x files (frontmatter generated_by: harness-maker + zero @hm:harness:* markers) are treated as wholly harness-owned on first encounter post-upgrade and re-rendered into the new marker family (ADR-009 amendment).


Observability

All observability is 100% local — nothing is transmitted externally.

File Contents
.claude/observability/dashboard.md AI-readiness score, dimension breakdown, ranked action items
.claude/observability/metrics-YYYY-MM-DD.jsonl Per-turn telemetry (cache hit %, tool calls, durations) — date-rotated daily (ADR-103, 0.7.1). Pre-0.7.1 metrics.jsonl is read as the trailing legacy shard.
.claude/observability/refresh/raw-*.jsonl Anti-rot crawl evidence (accepted / rejected items)
.claude/observability/security/findings-*.jsonl 7-gate security scan findings
.claude/observability/adaptive/overrides.jsonl harness_yaml_override events with schema_version: 1, dual capture sites (/hm:configure exit primary + SessionStart secondary), dedup-keyed
.claude/observability/adaptive/last-audit.txt Last /hm:personalization-audit run timestamp

Run /hm:ai-readiness to regenerate the dashboard on demand.


Marketplace

Marketplace manifests are present for each runtime:

  • .claude-plugin/plugin.json — Claude Code plugin spec
  • .cursor-plugin/plugin.json — Cursor Marketplace spec
  • .codex-plugin/plugin.json — Codex plugin spec

Listings are pending. Until then, install locally:

# Any IDE — CLI install (recommended)
uv tool install harness-maker          # from PyPI (when published)
uv tool install /path/to/harness-maker # pre-PyPI, from clone
harness-maker make . --targets claude-code,cursor,codex

# Claude Code — plugin mode (optional, enables /harness-maker:make command)
claude --plugin-dir /path/to/harness-maker

# Cursor — open the repo folder directly in Cursor as a workspace plugin

# Codex — render Codex-native assets in your project harness
harness-maker make . --targets codex

FAQ

Q: Why Python? My project is Rust / Node / Go. The hooks (permission_gate, worktree_gate, telemetry) call uv run python -m harness_maker.* at PreToolUse / PostToolUse boundaries. This doesn't touch your project's toolchain — uv and harness_maker need to be on the path, but only to run hooks. Your project's build system is untouched.

Q: Why does it require uv? uv gives a hermetic, fast Python environment without polluting the system or your project's virtualenv. Hooks run in milliseconds without activating anything.

Q: Will harness-maker overwrite my hand-edits when I re-render? No. Every generated file carries a content_hash in its provenance frontmatter. Re-render compares the new template's hash against the file on disk. If they differ — meaning you edited the file — it keeps yours. See Reconcile rules.

Q: What's the difference between Side and Production? Side is lean: 1 reviewer (code), verify-before-completion optional, worktree scope [execute]. Production is thorough: 5 reviewers, verify required, worktree scope [execute, plan], security on high-finding = block. Both share the same anti-rot and caching defaults.

Q: Does anti-rot ever auto-apply? Never. Every anti-rot item surfaces via a structured question (AskQuestion in Cursor, AskUserQuestion in Claude Code) in /hm:refresh. There is no --auto-apply flag and no plan to add one. The rationale: a wrong patch is worse than a stale harness.

Q: Can I use only Claude Code? Only Cursor? Only Codex? Yes. targets is a multi-select at the interview. Claude Code uses .claude/; Cursor reuses most .claude/ assets and adds .cursor/; Codex adds AGENTS.md, .codex/, and .agents/skills/.

Q: Do my prompts or telemetry leave my machine? No. metrics.jsonl, dashboard, and security findings are written to .claude/observability/ locally. Anti-rot crawls read public sources (Anthropic blog, arxiv, GitHub, OSV.dev) but never uploads anything.

Q: How do I pick up template improvements after a /plugin update? Run /harness-maker:make → choose Update. For files where your hash matches the old template, the new version is applied. For files you edited (hash mismatch), yours is kept. To force a full refresh, rm .claude/harness.yaml and re-run.

Q: What are mechanical_checks and when should I use them? Shell commands listed under reviewers.mechanical_checks in harness.yaml run at the start of every /hm:review — before any LLM reviewer spawns. The first command that exits non-zero emits ## MECHANICAL_BLOCK: <cmd> exit=<N> and halts review immediately (CHANGES_REQUESTED). Use them for fast, deterministic gates (lint, type-check, unit test) that shouldn't waste LLM tokens when the basics are broken. The list is user-managed; harness-maker never populates it automatically. --no-auto-fix does not skip mechanical checks — they are a hard gate, not part of the fix loop.

Q: Why doesn't harness-maker rewrite prompts to be model-agnostic? The prompts are tuned for claude-opus-4-7<thinking> blocks, role framing, chain-of-thought structure. Rewriting for model-neutrality would degrade quality on the recommended model for hypothetical gains on others. Override recommended_model in harness.yaml if you want a different model; the prompts remain as-is.


Personalization Architecture

harness-maker 0.12.0 introduces three tracks of personalization depth:

  • Detection Depth (Track A)harness_maker.profile.profile() detects stack (12+ languages), framework, package manager, and CI provider. Results land in ProjectProfile and feed the recommendation framework. Cache at ~/.cache/harness-maker/profile-<repo-hash>.json (manifest-mtime invalidation + 24h TTL).

  • Foreign AI Config Migration (Track D)/hm:configure detects .cursor/rules/, AGENTS.md, CLAUDE.md, .continue/config.json, .aider.conf.yml, and .github/copilot-instructions.md. With user confirmation, harness-maker imports the foreign config's intent into harness.yaml and re-generates the foreign file as part of the harness on every render. The @hm:harness:* block markers protect user customizations outside the marked regions.

    Cursor power-user constraint (ADR-003) — single-source means harness re-generates .cursor/rules/ on every render. If you prefer Cursor-only ownership of those rules, the only opt-out today is to drop the cursor target from harness.yaml.targets. A dedicated opt-out flag is TODO for a future PLAN.

  • Adaptive (Track B start)harness.yaml.adaptive.disable_telemetry: false (opt-out per ADR-005) enables override telemetry. /hm:personalization-audit scores your harness composite (0–100; Bronze < 40 < Silver < 65 < Gold < 85 ≤ Platinum) and surfaces ranked action items with evidence. SessionStart drift hint fires after 30 overrides or 14 days without an audit.

All adaptive features run 100% locally — no network calls (asserted by tests/unit/test_no_network.py).

0.12.1 patch extended CACHED_MANIFESTS with literal filenames from STACK_GLOB_MANIFESTS (stack.yaml, package.yaml) so Haskell projects now properly invalidate the profile cache on those manifests' mtime bump.


Roadmap

Done (0.12.0–0.12.1):

  • Track A (Detection Depth): 12+ stacks/frameworks/pkg-mgr/CI
  • Track D (Foreign AI Config Migration): 6 config types, single-source re-render, @hm:harness:* markers, 0.11.x migration
  • Track B-start (Adaptive): override telemetry, /hm:personalization-audit, SessionStart drift surface

Next (0.13.0 — Track B completion + Cursor opt-out):

  • Track B-extra: B2 permission-frequency capture + B3 reviewer-signal aggregation
  • harness.yaml.cursor.opt_out_render flag for Cursor power-users (ADR-003 documented constraint)
  • github/spec-kit external e2e fixture vendoring (currently pytest.skip with TODO)

Future (0.14.0+ — Track C strategic expansion, ranked by user value):

  • C2 privacy/regulated tier (HIPAA/PCI/GDPR — enterprise entry barrier)
  • C1 team-profile axis (solo vs small-team vs large-team)
  • C3 code-style detection (docstring conventions, naming patterns)
  • C4 cross-project user-defaults
  • C5 per-developer overlay in shared harness

Deferred-by-data:

  • Rubric v0 calibration after 30+ projects accumulate /hm:personalization-audit runs (passive trigger)

Standing items:

  • PyPI publish — remove the editable-from-clone requirement.
  • Claude Code + Cursor + Codex Marketplace listings — submit all plugin manifests.
  • .hm-meta.yaml sidecar for Cursor assets — enable hash-tracking of .cursor/rules/*.mdc without polluting Cursor frontmatter.
  • User-configurable anti-rot repo list — harness.yaml.anti_rot.github_repos to track additional Claude Code ecosystem repos beyond the default.
  • Demo screencast — record a first-install + /hm:loop session.
  • Enterprise preset — stricter security gates, mandatory spec-driven dev mode.

Development

uv sync
uv run pytest                       # full suite
uv run ruff check src/ tests/       # lint
uv run ruff format src/ tests/      # format
uv run mypy --strict src/           # type check
bash .claude-verify.sh all          # phase-by-phase exit criteria + final acceptance

Contributing

See docs/CONTRIBUTING.md for adding skills/agents/presets, test patterns, and the PR checklist (including the 4-file version bump invariant).

See docs/ARCHITECTURE.md for the 14 mechanisms (M1-M14) behind the system.


License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

harness_maker-0.17.0.tar.gz (389.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

harness_maker-0.17.0-py3-none-any.whl (480.5 kB view details)

Uploaded Python 3

File details

Details for the file harness_maker-0.17.0.tar.gz.

File metadata

  • Download URL: harness_maker-0.17.0.tar.gz
  • Upload date:
  • Size: 389.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for harness_maker-0.17.0.tar.gz
Algorithm Hash digest
SHA256 50ec33963b3253d9dae9d922907e562aa5f62bee054084c1ccfee6245afb03df
MD5 dcac21cfd915df20285df13897ba8c91
BLAKE2b-256 a14deff76091421de40ac9378b0f6c5c50ab49e181a750c7ff02c48eabdda141

See more details on using hashes here.

File details

Details for the file harness_maker-0.17.0-py3-none-any.whl.

File metadata

  • Download URL: harness_maker-0.17.0-py3-none-any.whl
  • Upload date:
  • Size: 480.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for harness_maker-0.17.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0ca405dcd1c8a7d5ece7dc6e5cf589f914828b00e40c33d7d6bdf6f5530bfdf8
MD5 001d0c8142175edd194fc5d76ba83926
BLAKE2b-256 0b1a1897f89d398d5714303b641d0ea719c110436a3f744f4625c169c9e7f19c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page