Project-tailored Claude Code / Cursor / Codex harness generator. Grade-gated review + auto-fix, multi-reviewer consensus, edit-preserving block-merge upgrades, anti-rot crawlers, worktree isolation, version-drift detection, and a 3-layer ai-readiness rubric with ranked improvement actions.
Project description
harness-maker
A harness that knows your project — and stays that way.
Interview-born. Grade-gated. Anti-rot by design. Multi-target.
Why · Quickstart · Features · How it works · Slash commands · Comparison · Configuration · Targets · FAQ · Roadmap · Deep dive
Why harness-maker?
Most Claude Code setups start from a generic template and drift from day one. harness-maker takes a different stance — it builds a harness that is shaped around your project and keeps that shape over time.
Four principles drive every design decision:
1. Personalized — Interview-born, not template-pasted.
The profiler reads your stack, scale, and lifecycle. The interview locks in preset, dev mode, target runtimes, and reviewer depth. A Side experiment and a Production service get structurally different harnesses — different reviewer sets, different workflow stage counts, different security gates. No generic defaults silently shipped.
2. Trusted — Grade-gated, not hope-based.
Every /hm:execute runs in a fresh git worktree and follows a TDD loop. /hm:review doesn't just report findings — it applies consensus-passed fixes and re-reviews until the grade meets the configured threshold (default A). Pre-LLM mechanical checks (lint, tests) gate the review before any reviewer agent spawns. Reviews you trust don't need second-guessing.
3. Anti-rot — Built-in, not retrofitted. The Claude Code, Cursor, and Codex ecosystem moves fast. harness-maker ships a weekly crawl across 4 sources (Anthropic blog, GitHub releases, arxiv cs.SE/CL/CR, OSV.dev), scores each item for relevance, and surfaces only what matters — always manual-confirmed, never auto-applied. The failure-memory system proposes new skills and rules when the same failure recurs 3× across sessions.
4. Evolving — Refresh cycles, not rewrites.
/harness-maker:make --update picks up new template improvements without touching your edits (hash-based KEEP/REPLACE per file). The session memory and wiki build up project-specific conventions. Failure proposals turn recurring stumbles into permanent fixes. The harness improves with the project, not against it.
Table of Contents
- Why harness-maker?
- Quickstart
- Requirements
- Features
- How it works
- Slash commands the harness exposes
- How it compares
- Configuration
- Targets
- Reconcile rules
- Observability
- Marketplace
- FAQ
- Roadmap
- Development
- Contributing
- License
Quickstart
Universal Bootstrap Prompt
Paste into any LLM agent — it installs harness-maker and generates your project harness automatically. Works in Claude Code, Cursor, Codex CLI, or any chat with shell access. The agent detects your OS (Linux / macOS / Windows / WSL) and IDE on its own; you do not branch by hand.
Install and bootstrap harness-maker for this project, end-to-end.
You are an LLM agent with shell access. Detect the environment silently, install
harness-maker, and render the harness. Ask me ONE question (target IDE confirmation)
and otherwise proceed autonomously. Conduct the conversation in the language of my
first reply; default to English if you cannot tell.
Step 1 — Environment detection (silent — do NOT ask me):
- OS: try `uname -s` (POSIX → Linux / Darwin / CYGWIN* / MINGW*). If `uname` fails,
returns empty, or returns a value outside that set, treat the host as native
Windows.
- WSL: when `uname -s` returns Linux AND `/proc/version` contains "microsoft" or
"Microsoft" (case-insensitive), treat as WSL — use the POSIX path below.
- IDE: pick the first match — `$CLAUDE_CODE` or `~/.claude/` → Claude Code;
`$CURSOR_SESSION` or `~/.cursor/` → Cursor; `$CODEX_SESSION` or `~/.codex/` → Codex;
none → plain terminal (CLI mode).
- Python: run `python3 --version` (POSIX) or `python --version` (Windows). Need 3.12+.
If missing, install via `uv python install 3.12` after Step 2.
- uv: run `uv --version`. If missing, install:
• POSIX / macOS / WSL: `curl -LsSf https://astral.sh/uv/install.sh | sh`
• Native Windows (PowerShell): `irm https://astral.sh/uv/install.ps1 | iex`
Step 2 — Install harness-maker (skip if harness-maker is already on PATH):
uv tool install harness-maker
Step 3 — Profile the project:
# POSIX / macOS / WSL:
harness-maker profile "$(pwd)" --json
# PowerShell (Windows native) — quote-expanded for paths with spaces:
harness-maker profile "$((Get-Location).Path)" --json
Read stack / scale / lifecycle. Tell me: "Your project looks like {stack}, {scale} scale,
{lifecycle} lifecycle." Recommend a preset (Side or Production) based on the profile.
Step 4 — Confirm targets (ask me ONCE):
Based on Step 1's IDE detection, suggest one of:
• Claude Code only → targets=claude-code
• Cursor only → targets=cursor
• Codex only → targets=codex
• Multiple → comma-separated (e.g. claude-code,cursor)
Wait for my confirmation, then continue.
Step 5 — Generate the harness:
# POSIX / macOS / WSL:
harness-maker make "$(pwd)" --preset <PRESET> --locale en \
--targets <TARGETS> --autoloop
# PowerShell (Windows native):
harness-maker make "$((Get-Location).Path)" --preset <PRESET> --locale en `
--targets <TARGETS> --autoloop
Step 6 — IDE reload:
• Claude Code: run `/harness-maker:make` to verify the plugin loads.
• Cursor: reload the window (Ctrl+Shift+P → "Reload Window") so agents, skills, and
commands under .claude/ are discovered.
• Codex: AGENTS.md and .codex/ are read on next session start.
Tell me the harness is ready and suggest: /hm:health
Manual install (step-by-step, no LLM)
# 1. Install harness-maker CLI from PyPI
uv tool install harness-maker
# 2. Profile and generate the harness
cd your-project
harness-maker profile . --json # inspect your stack/scale/lifecycle
harness-maker make . --preset Production --locale en --targets claude-code,cursor
# 3. (Optional) Load as Claude Code plugin for /harness-maker:make command
claude --plugin-dir /path/to/harness-maker
Native Windows / PowerShell equivalents:
# Install uv if you don't have it
irm https://astral.sh/uv/install.ps1 | iex
# Install harness-maker and run
uv tool install harness-maker
harness-maker profile . --json
harness-maker make . --preset Production --locale en --targets claude-code,cursor
The interview asks for locale first, then preset (Side or Production), dev mode, and target runtime. The setup preview explains generated roots, backups, preserved user blocks, target-specific leftovers, review trade-offs, and how to continue advanced Second Brain setup with /hm:configure. A fully-rendered harness is ready in one turn.
Re-run with flags to evolve the harness:
harness-maker make . --audit # score the existing .claude/ against the rubric
harness-maker make . --add NAME # graft on a single skill/agent/command
harness-maker make . --remove NAME # surgically remove one component
harness-maker make . --promote NAME # move an ad-hoc artifact into the harness
Requirements
| Dependency | Notes |
|---|---|
Python 3.12+ and uv |
Required wherever Claude Code runs against your project — even if your project's primary language is Rust, Node, or Go. Hooks invoke uv run python -m harness_maker.gates.*; without uv they are silent no-ops. |
| Claude Code CLI (plugin + hook support) | Optional for plugin mode (claude --plugin-dir /path/to/harness-maker). Not required — the harness-maker CLI can generate harnesses independently. |
| Cursor IDE 2.4+ (3.2+ recommended) | Optional. Reads .claude/agents/, .claude/skills/, and .claude/commands/hm/*.md natively (verified empirically in 0.6.2, re-confirmed 0.7.1 — see tests/cursor-compat/results-2026-05-08.md). Hooks render to a separate .cursor/hooks.json with Cursor-native schema; both files are emitted when targets includes cursor. Cursor 3.0+ adds native /worktree and /best-of-n which coexist safely with harness-maker's prefix-matched cleanup. |
| OpenAI Codex CLI | Optional. When targets includes codex, harness-maker renders Codex-native .codex/ config, AGENTS.md, and .agents/skills/ assets from the same workflow definitions. |
| Git | Required for worktree isolation (every /hm:execute and /hm:loop run). |
Features
-
Single command, no subcommand sprawl.
/harness-maker:makeis the only entry point. Everything else is a flag (--audit,--add,--remove,--promote). No muscle-memory tax. -
Two presets, ten override dimensions.
Side(1 reviewer, lean, fast) andProduction(5 reviewers, verify-required, secure) cover ~90% of teams. The remaining 10% comes from override dimensions surfaced in the interview: workflow naming, models, autoloop, anti-rot, worktree, security, context-lint, memory, caching. -
Three targets from one harness. Claude Code gets the native
.claude/runtime. Cursor reuses.claude/agents/,.claude/skills/, and.claude/commands/hm/while receiving Cursor-specific hooks and rules. Codex receivesAGENTS.md,.codex/config.toml,.codex/hooks.json, agent TOML files, and.agents/skills/so the same stage and workflow model runs in Codex without hand-porting prompts. -
Unified health audit.
/hm:healthruns a 3-layer composite: structural (ai-readiness deterministic + LLM rubric, 70%/25%/5%), external risks (4-source anti-rot crawl + LLM relevance filter), and personalization (ADR-011 rubric — Bronze/Silver/Gold/Platinum). Outputs a 3-section.claude/observability/dashboard.md. Every item routes throughaccept/reject/defer— no auto-apply, ever (ADR-001). All telemetry stays local. Replaces 0.12.x's separate/hm:ai-readiness,/hm:refresh, and/hm:personalization-auditcommands. -
Anti-rot pipeline. Weekly crawl across 4 sources: Anthropic blog/changelog, GitHub releases (
anthropics/claude-code), arxiv (cs.SE / cs.CL / cs.CR), OSV.dev CVEs. Each item is LLM-scored for relevance with an adaptive threshold (starts at 0.7, adapts ±0.05 based on your accept/reject history). Always manual-confirmed — there is no--auto-applypath. -
Worktree isolation per run. Every
/hm:executeruns in a freshgit worktreeunder.worktrees/./hm:loopallocates one worktree for the whole loop, shared across iterations to reduce branch churn.sibling_reposcan opt related repos into the same isolation session for split frontend/backend or app/library work. Successful runs clean up; failed runs preserve evidence. Prefix-match cleanup never touches Cursor-managed worktrees in the same directory. -
7 security gates.
secrets(regex + entropy, gitleaks-style),permissions(settings.jsonover-grant detection),hook injection(hooks.jsonAST scan forrm -rf,curl | sh,eval),dependency CVEs(OSV.dev),hallucination(AST scan for non-existent imports — pure-filesystem check, no execution of LLM-generated code),prod-name guard(cross-tool sequence detection for production-targeting patterns),prompt injection(hidden-instruction pattern detection + privilege separation, regex + LLM second pass). Findings go to.claude/observability/security/findings-*.jsonl— never transmitted. -
Privilege separation. Reviewer agents get
permissions.deny: [Write(*), Edit(*), Bash(rm:*), Bash(curl:*), Bash(npm:*), Bash(eval *), Bash(python:*), Bash(node:*), Bash(sh:*), Bash(bash:*)]— interpreter denies block subprocess-bypass attempts (0.6.2 hardening). Executor agents getpermissions.allow: [Write(.worktrees/**), Edit(.worktrees/**), Bash(uv run:*), Bash(pytest:*), …]plus paired Edit/Write denies on system paths (/etc,~/.ssh,~/.aws). Combined with worktree isolation and the 0.7.1 telemetry tool-input whitelist + secret redaction (ADR-107), this gives defense-in-depth: even a prompt-injected reviewer cannot write to disk or shell out via interpreters; even a compromised executor cannot write outside the active worktree or touch system credentials; even a poisonedtool_inputpayload cannot leak credentials into the metrics log. -
Brownfield-safe.
Reconcilerindexes existing.claude/, computes hash-based ours/theirs decisions via provenance frontmatter, and offers per-conflict keep/replace/both. Apply is ADD-only with timestamped backups. User edits are never silently overwritten. -
Mechanical pre-checks in review. Add shell commands to
harness.yaml.reviewers.mechanical_checks(e.g.,ruff check .,uv run pytest tests/unit -x -q). These run before any LLM reviewer — stop-on-first. If a command fails,/hm:reviewemits## MECHANICAL_BLOCK: <cmd> exit=<N>and halts immediately: no grade, no auto-fix loop, no wasted reviewer tokens on broken code. The list is preserved across re-renders. -
Deep interview before every implementation.
/hm:specruns a 6-category interview (Intent → Outcomes → In-Scope Scenarios → Non-Goals → Constraints → Verification) scored for completeness; incomplete categories trigger follow-up questions, not silent gaps./hm:planruns a 9-category interview (scope → architecture → contract → risk → testing → phasing → dependencies → failure handling → observability) in impact order. Each decision that changes a component boundary, introduces a new contract, or rejects a viable alternative is promoted to a formal Architecture Decision Record (ADR) and becomes a binding constraint for/hm:execute— not advisory. Aplan-validatoragent gates the plan before it is written to disk: NEEDS_REVISION triggers targeted follow-up rounds; MAJOR_REVISION escalates to the user. The interview ends with a "no deferred decisions" scan — any "Accept?/OK?/Verify?" phrasing is a missed interview round, not a plan checkpoint. -
Autoloop with adaptive interview and 4-gate convergence.
/hm:loopruns time-and-iteration-bounded loops. Before the first iteration,autoloop-driverreads the goal description, extracts already-answered dimensions (purpose / invariants / priority / stopping_criteria), and asks only what's missing — no fixed question script. It locks a loop intensity and explicit exit checklist, then requires mechanical checks, per-criterion LLM judgment, regression comparison, and a two-iteration convergence streak before accepting completion. A single worktree is shared across all iterations, and failed iterations preserve the worktree for inspection. -
Refdocs search skill. Register your project's reference folders (architecture docs, API specs, design docs) in
harness.yaml./harness-maker:makebuilds a localdocs_index.yaml, and therefdocs-searchskill gives the LLM lossless full-text search across registered folders — no chunking, no RAG index. -
SessionStart drift reminder. A hook fires on every session open and warns if the running harness-maker version differs from the version that rendered the harness — so you notice when a
/plugin updateneeds a re-render. The detector comparesharness.yaml.harness_maker_versionagainst the latest plugin version cached on disk (not just the imported__version__), so/hm:healthand SessionStart agree even when the slash command runs against a pinned older version (0.6.2). -
Memory tier with cross-process safety.
.claude/memory/holdsepisodic/(per-day JSONL),semantic.jsonl(queryable index),profile.json,wiki.md, andfailures.md. Concurrent writers from parallel sessions are serialised via a re-entrant POSIX flock — same thread can re-acquire without deadlock, different threads block normally (ADR-106, 0.7.1). Telemetry hooks append atomically via rawos.write()onO_APPEND(single-syscall, ≤ PIPE_BUF) so concurrent Claude Code + Cursor hooks cannot interleave JSONL lines. -
3-tier context loading + compaction recovery. Every stage opens memory in tier order — Hot (
session/<today>.md), Warm (failures.md+wiki.mdfirst 60/40 lines), Cold (git log / PLANs on demand).PreCompacthook flushes the session tosession/<today>.mdwith acheckpoint:compactionmarker before Claude compacts context; the next turn detects the marker and resumes from the last in-progress phase without losing progress. -
2-pass redaction for review precision (+47 pp). Reviewers run twice: Pass 1 strips PR title, author, and commit message so findings aren't anchored to metadata; Pass 2 restores full context and each reviewer validates or drops their Pass 1 findings. Findings absent from Pass 2 are dropped (CP10 contract). Ablation showed a +47 percentage-point precision gain on anchoring-prone diffs.
-
Self-improving failure memory. Every stage appends failure patterns (wrong API usage, broken convention, unexpected build failure) to
failures.mdwith a count. When any failure slug reaches count ≥ 3, a proposal is automatically appended topending-proposals.md— suggesting a new skill, rule, or hook that would have prevented the recurrence. The user reviews and decides whether to ingest. -
ADR system as binding execute constraints. Architecture Decision Records promoted during
/hm:planare hard constraints —/hm:executemust not violate them. If a PLAN phase conflicts with an ADR, execute surfaces it as a blocker rather than silently proceeding. ADRs capture rejected alternatives and the reasoning, so future sessions don't re-litigate settled decisions. -
Cache miss classification (4 reasons). The prompt-cache diagnostic layer reports why a cache miss occurred:
min_threshold(content too short),invalidation(context changed),ttl(5-min TTL expired), orfirst(cold start). The 5% weight in the AI-readiness score distinguishes cold-start misses (benign) from structural misses (actionable), so you fix the right thing. -
Grade-based auto-fix loop.
/hm:reviewcomputes a grade (A–F) fromconsensus-passedP0/P1 findings and loops: apply fixes → re-review (selective — only re-spawn reviewers whose scope was touched) → regrade, until grade meetsgrade_threshold(default A) ormax_review_roundsis exhausted. Failed fixes that break the build are automatically reverted and logged. Weak-consensus and manual-only findings are never auto-applied. -
Pre-LLM mechanical checks gate. Add shell commands to
harness.yamlonce and they run at the start of every/hm:review— before any reviewer agent spawns. Stop-on-first: the first non-zero exit emits## MECHANICAL_BLOCK: <cmd> exit=<N>, halts the review, and exitsCHANGES_REQUESTED. Lint clean and tests green are enforced mechanically, not by reviewer prompt.--no-auto-fixdoes not skip mechanical checks. -
Detection Depth — 12+ stack/framework granularity (python/node/rust + java/kotlin/swift/dart/ruby/php/csharp/elixir/scala/c-cpp/zig/haskell), framework dep parsing, package_manager + ci_provider detection, manifest-mtime cache with 24h ceiling.
-
Foreign AI Config Migration — detect 6 known foreign configs (
.cursor/rules/,AGENTS.md,CLAUDE.md,.continue/config.json,.aider.conf.yml,.github/copilot-instructions.md); LLM-driven import;@hm:harness:*inverted markers preserve user content across re-renders. -
Adaptive Personalization —
/hm:healthStep 3 composite-score rubric (Bronze/Silver/Gold/Platinum) with evidence-bearing ActionItems; 100% local telemetry; SessionStart drift hint after 30 axis overrides or 14 days. (Absorbs the 0.12.x/hm:personalization-auditcommand; ADR-011 rubric unchanged.) -
Confidence-Bucketed Recommendation UI — every detection declares its own confidence (HIGH/MEDIUM/LOW); high → silent yaml comment, medium → explicit prompt, low → no surface. Backward-compat regression test guards 0.11.x users from surprise silent-default changes.
For the complete mechanics behind each feature — all procedures, decision paths, and internal invariants — see docs/HOW-IT-WORKS.md.
How it works
flowchart TD
A["/harness-maker:make"] --> B["Profile\n(stack, scale, lifecycle)"]
B --> C["Interview\n(preset + 10 dims + targets)"]
C --> D["Synthesize\n(deterministic Blueprint)"]
D --> E["Render\n(Jinja2 + provenance frontmatter)"]
E --> F{Brownfield?}
F -- No --> G["Write .claude/ directly"]
F -- Yes --> H["Reconcile\n(hash-based keep/replace/both)"]
H --> G
G --> I["Extra targets?\nRender .cursor/ and/or .codex/ assets"]
I --> J["User runs /hm:* commands"]
J --> K["Weekly /hm:health (Step 2)\n4-source anti-rot crawl\n→ manual confirm"]
14 mechanisms (M1-M14) back every feature. See docs/ARCHITECTURE.md for the full breakdown including the privilege-separation model, security gate triggers, and reconcile invariants.
Since 0.12.0 the synthesis pipeline also threads through a typed Recommendation registry: every recommend_<axis>(profile, project_dir) function declares per-detection confidence (ADR-007), and the interview.py dispatcher routes by bucket. Detection results land in ~/.cache/harness-maker/profile-<repo-hash>.json with manifest-mtime + 24h-ceiling invalidation. Foreign AI config files detected at /hm:configure time can be imported into harness.yaml and re-rendered single-source with @hm:harness:* inverted block markers (ADR-009).
Slash commands the harness exposes
After install, the rendered harness exposes commands under /hm:*:
Atomic stages (always available)
| Command | Purpose |
|---|---|
/hm:research |
Gather facts, user-workflow signals, best practices, and options |
/hm:spec |
Write acceptance criteria from research |
/hm:plan |
Decompose spec into phases with exit criteria |
/hm:execute |
Implement with TDD + worktree isolation |
/hm:review |
Multi-reviewer consensus (mechanical pre-check → conditional routing) |
/hm:wrapup |
Clean, document, commit |
/hm:verify |
6-check gate before completion |
Fused workflows (preset-generated, user-renameable)
Fused workflows combine atomic stages into a single command. The interview generates a starter set; you can add, remove, or rename them in harness.yaml.
| Preset | Default fused workflows |
|---|---|
| Side | /hm:plan-exec-rev · /hm:exec-rev · /hm:exec-rev-wrap (default) |
| Production | /hm:exec-rev-wrap-ver (default) · /hm:exec-rev-wrap · /hm:plan-exec-rev · /hm:exec-rev · /hm:res-spec-plan |
Utility commands
| Command | Purpose |
|---|---|
/hm:loop "<goal>" |
Autoloop driver — feature or improve mode, time/iter-bounded |
/hm:ai-readiness |
3-layer readiness score + P0/P1/P2 ranked actions |
/hm:personalization-audit |
Composite-score rubric (Bronze/Silver/Gold/Platinum) from telemetry + harness.yaml + ProjectProfile. Reads .claude/observability/adaptive/overrides.jsonl; outputs ranked ActionItem list with evidence schema. ADR-011 (v0); calibration deferred to 30+ project sample. |
/hm:refresh |
Anti-rot crawl — manual confirm required |
How it compares
Other Claude Code harnesses pick a niche; harness-maker is the meta-tool that builds them — and then keeps them current.
| Project | Scope | What harness-maker adds |
|---|---|---|
| ohmyclaudecode | Curated commands/agents bundle | Project-tailored synthesis (preset + 10 override dims), brownfield reconcile, provenance frontmatter, anti-rot pipeline |
| superpowers | Powerful sub-agents and workflows | Single-command entry, AI-readiness scoring, worktree isolation by default, privilege-separated reviewer/executor |
| Archon | Knowledge-base + RAG-backed planning | Stack/scale/lifecycle profiler, atomic+fused workflow engine, conditional reviewer routing, 7 security gates |
| aider | Terminal pair programmer, LLM-agnostic | Claude Code / Cursor / Codex native assets; harness output is the runtime (not the session); anti-rot keeps it current |
| ouroboros | Autonomous self-bootstrapping AI software factory | Project-shaped interview (not one pipeline for all), grade-gated reviews with mechanical pre-checks, anti-rot pipeline, brownfield reconcile, multi-target support |
Hand-rolled .claude/ |
Full control, zero automation | Drift detection via provenance hash, weekly anti-rot crawl, AI-readiness scoring, worktree isolation — without writing it yourself |
The core difference: harness-maker generates and owns the lifecycle of your .claude/ directory. A static tool gives you a starting point. harness-maker gives you a starting point that knows who it was created for and can be updated without losing your changes.
Configuration
The interview writes answers to .claude/harness.yaml. Key dimensions:
preset: Production # Side | Production
locale: en # en | ko | <any — unknown falls back to en>
dev_mode: spec-driven # spec-driven | task-driven
targets: # which runtimes to drive
- claude-code
- cursor
- codex
recommended_model: claude-opus-4-7
ref_folders:
- path: ../architecture-docs
glob: "**/*.{md,txt,pdf}"
second_brain:
enabled: true
backend: filesystem
project_id: my-app
vault_path: ../obsidian-vault
trusted_allowlist: true # configured write folders need no confirmation/backup
required_frontmatter: [type, created, updated, tags, links]
folders:
- path: Projects/my-app
read: true
write: true
note_types: [decision, preference, failure, project, reference, journal]
sibling_repos:
- ../backend
reviewers:
enabled: [code, security, performance, ux, concurrency]
routing: conditional # conditional | always-all
mechanical_checks: # pre-LLM stop-on-first gate (optional)
- ruff check .
- uv run pytest tests/unit -x -q
worktree:
scope: [execute, plan] # which stages run in a fresh worktree
cleanup: on_success # on_success | always | never
anti_rot:
enabled: true
threshold: 0.7 # adaptive — adjusts ±0.05 based on accept/reject ratio
context_lint:
strict: true # warn on overrun | block
memory:
files: [failures.md, wiki.md]
adaptive:
disable_telemetry: false # opt-out per ADR-005; 100% local capture
audit_session_threshold: 30 # SessionStart hint after N axis overrides
audit_days_threshold: 14 # SessionStart hint after N days without audit
All adaptive features are 100% local. tests/unit/test_no_network.py asserts no socket call during telemetry emit, audit, or SessionStart hook execution (ADR-005 positive obligation).
Run /harness-maker:make again and choose Update (same settings, pick up template improvements) or Full reconfigure to change any dimension.
Obsidian Second Brain
second_brain connects a Markdown/Obsidian vault as a typed knowledge graph for
stage-aware memory. hm-research, hm-plan, hm-review, and hm-wrapup use
different note types (reference, project, decision, preference,
failure, journal) instead of loading the whole vault. Writes are full
Markdown writes inside configured write: true folders only. Configure narrow
folders: the allowlist is trusted completely, with no per-write confirmation or
backup. For multiple projects sharing one vault, give every project a distinct
project_id and put writable folders under a path segment with the same value
(for example Projects/my-app). harness-maker rejects writable Second Brain
folders that do not include the configured project_id, which keeps one
project from writing into another project's note namespace.
Setup walkthrough
- At interview time — when prompted for
vault_path, supply the absolute path to your Obsidian vault root (or a not-yet-created subfolder of it; the harness creates the subfolder on first write iff the parent has.obsidian/). The interview then asks forproject_idand a writable folder, defaulting to99_HM/{project_id}/(matches the99_*/01_*numeric-prefix organization style). - Post-install adjustment — run
/hm:configureto revisit Second Brain settings. The slash command dispatches toharness-maker configure-second-brain --check, which inspects state and returns guidance JSON so the slash command can prompt only when something is missing. To add a folder non-interactively, callharness-maker configure-second-brain --add-folder 99_HM/my-project/. - Existing harnesses upgraded from a pre-fix release — the loader now
tolerates
folders: [](logs a one-shot warning + remediation hint pointing at/hm:configure); writes raiseSecondBrainErrorwith the same hint instead of an obscure "not under a configured write folder" message.
Internals: the rendered harness.yaml carries a provenance frontmatter block;
all readers route through harness_maker.io_utils.load_harness_yaml (the
canonical multi-document-tolerant loader) to avoid the parser-strategy drift
that produced the original bug.
Targets
targets is a multi-select. Choose claude-code, cursor, codex, or any combination. Claude Code is the default; Cursor and Codex add runtime-specific assets while preserving the same preset, workflows, skills, reviewers, and safety model.
Cursor target
Run /harness-maker:make and pick targets: [cursor] or [claude-code, cursor] at the interview. The renderer adds:
.cursor/rules/harness.mdc— always-on workflow rules with Cursor-legal frontmatter (description/globs: []/alwaysApply: true).cursor/hooks.json— Cursor-native hooks schema (lowercase camelCase keys,version: 1, flat{matcher, command}). Deliberately different from.claude/hooks/hooks.json(PascalCase, nested{hooks:[…], matcher}); each IDE reads only its own file. Don't try to collapse them — Cursor will silently stop firing hooks. Seetests/cursor-compat/results-2026-05-08.mdfor the kairos 0.5.7 forensic that proved this empirically..cursor/mcp.json— Cursor MCP server config. Populated fromharness.yaml.mcp_servers(0.6.2+); defaults to{"mcpServers": {}}when no servers are configured. Inner shape (command,args,env) is type-validated on parse with a warning log when entries are dropped.
.claude/agents/, .claude/skills/, and .claude/commands/hm/ are single-source — Cursor 2.4+ reads them natively (forensic-verified). Hooks are the only asset that requires per-IDE rendering because the schemas diverge by design.
Recommended model
harness.yaml.recommended_model defaults to claude-opus-4-7 and propagates to agent frontmatter. Cursor users may override model selection in their IDE. The harness does not rewrite prompts to be model-agnostic — <thinking> blocks and Claude-specific patterns are preserved deliberately.
Verification
Per-release Cursor compatibility is tracked in tests/cursor-compat/:
MANUAL_CHECKLIST.md— A1–A4 (agent dispatch, hook fire, skill auto-discovery, slash command + Q&A loop) covering both IDEsRESULTS.md— PASS/FAIL/PARTIAL grid you fill while running the checklistresults-2026-05-08.md— kairos 0.5.7 forensic that resolved Q-A (hooks discovery) and Q-B (commands discovery) without an IDE-driven manual run; future Cursor verifications append a new datedresults-*.mdfixture/— minimal.claude/for opening directly in either IDE
Automated CI guards the dual-schema invariants regardless of manual fixture runs:
test_cursor_hooks_uses_lowercase_native_schema— fails if.cursor/hooks.jsonaccidentally adopts Claude PascalCasetest_no_cursor_commands_rendered— fails if a future change starts emitting.cursor/commands/hm-*.mdmirrors (Cursor reads.claude/commands/natively)test_render_agents_have_structured_permissions_frontmatter— fails if any agent template loses itspermissions.allow/denyblock (Cursor 2.5+ subagent permission inheritance gap)
Codex target
Run /harness-maker:make and pick targets: [codex] or include codex with the other targets. The renderer adds:
AGENTS.md— Codex's top-level project instruction file, rendered without YAML frontmatter so Codex displays clean instructions..codex/config.tomland.codex/agents/*.toml— Codex-native configuration and agent registrations..codex/hooks.json— Codex hook wiring, including Codex-specific permission events and file-edit tool matchers..agents/skills/*/SKILL.md— the existing harness skills plus stage, workflow, and loop trigger skills in the layout Codex discovers.
Codex TOML files are rendered as pure TOML, not markdown-with-frontmatter. AGENTS.md uses block-merge markers so user additions survive re-renders, while .codex/*.toml files are regenerated from the selected target configuration.
Structured questions in Codex: Stage and loop skills instruct the agent to use the request_user_input tool for interview questions. This tool is available in Plan mode by default. To enable it in Code mode, add default_mode_request_user_input = true under [features] in .codex/config.toml. If the tool is unavailable at runtime, the agent falls back to asking in its response text.
Reconcile rules (re-rendering an existing harness)
Re-running /harness-maker:make on an existing harness uses a hash-driven KEEP rule: if a file's content_hash frontmatter matches the new template's hash, it's "ours" — safe to overwrite. If it differs (user-edited), it's "theirs" — kept.
Trade-off: when harness-maker bumps a template on a minor release, the existing file's hash no longer matches, so reconcile picks KEEP even if you didn't edit the file.
To pick up template updates after a version bump:
rm .claude/harness.yaml
/harness-maker:make
# (previous .claude/ is auto-backed up to .backup-<ISO>/)
.cursor/rules/*.mdc follow the same KEEP behavior. A future phase introduces a sidecar .hm-meta.yaml so harness-maker can hash-track Cursor assets without polluting Cursor frontmatter.
@hm:harness:* inverted markers (0.12.0) — for foreign-AI-config files we generate post-import (.cursor/rules/*.mdc, AGENTS.md, CLAUDE.md, etc.), content INSIDE @hm:harness:<id> markers is harness-owned (replaced on every render); content OUTSIDE is user-owned (byte-for-byte preserved). The block-merge parser dispatches per file extension: HTML comments for .md/.mdc, # @hm:harness: for .yml/.yaml, top-level _hm_harness JSON key for .json. 0.11.x files (frontmatter generated_by: harness-maker + zero @hm:harness:* markers) are treated as wholly harness-owned on first encounter post-upgrade and re-rendered into the new marker family (ADR-009 amendment).
Observability
All observability is 100% local — nothing is transmitted externally.
| File | Contents |
|---|---|
.claude/observability/dashboard.md |
AI-readiness score, dimension breakdown, ranked action items |
.claude/observability/metrics-YYYY-MM-DD.jsonl |
Per-turn telemetry (cache hit %, tool calls, durations) — date-rotated daily (ADR-103, 0.7.1). Pre-0.7.1 metrics.jsonl is read as the trailing legacy shard. |
.claude/observability/refresh/raw-*.jsonl |
Anti-rot crawl evidence (accepted / rejected items) |
.claude/observability/security/findings-*.jsonl |
7-gate security scan findings |
.claude/observability/adaptive/overrides.jsonl |
harness_yaml_override events with schema_version: 1, dual capture sites (/hm:configure exit primary + SessionStart secondary), dedup-keyed |
.claude/observability/adaptive/last-audit.txt |
Last /hm:personalization-audit run timestamp |
Run /hm:ai-readiness to regenerate the dashboard on demand.
Marketplace
Marketplace manifests are present for each runtime:
.claude-plugin/plugin.json— Claude Code plugin spec.cursor-plugin/plugin.json— Cursor Marketplace spec.codex-plugin/plugin.json— Codex plugin spec
Listings are pending. Until then, install locally:
# Any IDE — CLI install (recommended)
uv tool install harness-maker # from PyPI (when published)
uv tool install /path/to/harness-maker # pre-PyPI, from clone
harness-maker make . --targets claude-code,cursor,codex
# Claude Code — plugin mode (optional, enables /harness-maker:make command)
claude --plugin-dir /path/to/harness-maker
# Cursor — open the repo folder directly in Cursor as a workspace plugin
# Codex — render Codex-native assets in your project harness
harness-maker make . --targets codex
FAQ
Q: Why Python? My project is Rust / Node / Go.
The hooks (permission_gate, worktree_gate, telemetry) call uv run python -m harness_maker.* at PreToolUse / PostToolUse boundaries. This doesn't touch your project's toolchain — uv and harness_maker need to be on the path, but only to run hooks. Your project's build system is untouched.
Q: Why does it require uv?
uv gives a hermetic, fast Python environment without polluting the system or your project's virtualenv. Hooks run in milliseconds without activating anything.
Q: Will harness-maker overwrite my hand-edits when I re-render?
No. Every generated file carries a content_hash in its provenance frontmatter. Re-render compares the new template's hash against the file on disk. If they differ — meaning you edited the file — it keeps yours. See Reconcile rules.
Q: What's the difference between Side and Production?
Side is lean: 1 reviewer (code), verify-before-completion optional, worktree scope [execute]. Production is thorough: 5 reviewers, verify required, worktree scope [execute, plan], security on high-finding = block. Both share the same anti-rot and caching defaults.
Q: Does anti-rot ever auto-apply?
Never. Every anti-rot item surfaces via a structured question (AskQuestion in Cursor, AskUserQuestion in Claude Code) in /hm:refresh. There is no --auto-apply flag and no plan to add one. The rationale: a wrong patch is worse than a stale harness.
Q: Can I use only Claude Code? Only Cursor? Only Codex?
Yes. targets is a multi-select at the interview. Claude Code uses .claude/; Cursor reuses most .claude/ assets and adds .cursor/; Codex adds AGENTS.md, .codex/, and .agents/skills/.
Q: Do my prompts or telemetry leave my machine?
No. metrics.jsonl, dashboard, and security findings are written to .claude/observability/ locally. Anti-rot crawls read public sources (Anthropic blog, arxiv, GitHub, OSV.dev) but never uploads anything.
Q: How do I pick up template improvements after a /plugin update?
Run /harness-maker:make → choose Update. For files where your hash matches the old template, the new version is applied. For files you edited (hash mismatch), yours is kept. To force a full refresh, rm .claude/harness.yaml and re-run.
Q: What are mechanical_checks and when should I use them?
Shell commands listed under reviewers.mechanical_checks in harness.yaml run at the start of every /hm:review — before any LLM reviewer spawns. The first command that exits non-zero emits ## MECHANICAL_BLOCK: <cmd> exit=<N> and halts review immediately (CHANGES_REQUESTED). Use them for fast, deterministic gates (lint, type-check, unit test) that shouldn't waste LLM tokens when the basics are broken. The list is user-managed; harness-maker never populates it automatically. --no-auto-fix does not skip mechanical checks — they are a hard gate, not part of the fix loop.
Q: Why doesn't harness-maker rewrite prompts to be model-agnostic?
The prompts are tuned for claude-opus-4-7 — <thinking> blocks, role framing, chain-of-thought structure. Rewriting for model-neutrality would degrade quality on the recommended model for hypothetical gains on others. Override recommended_model in harness.yaml if you want a different model; the prompts remain as-is.
Personalization Architecture
harness-maker 0.12.0 introduces three tracks of personalization depth:
-
Detection Depth (Track A) —
harness_maker.profile.profile()detects stack (12+ languages), framework, package manager, and CI provider. Results land inProjectProfileand feed the recommendation framework. Cache at~/.cache/harness-maker/profile-<repo-hash>.json(manifest-mtime invalidation + 24h TTL). -
Foreign AI Config Migration (Track D) —
/hm:configuredetects.cursor/rules/,AGENTS.md,CLAUDE.md,.continue/config.json,.aider.conf.yml, and.github/copilot-instructions.md. With user confirmation, harness-maker imports the foreign config's intent intoharness.yamland re-generates the foreign file as part of the harness on every render. The@hm:harness:*block markers protect user customizations outside the marked regions.Cursor power-user constraint (ADR-003) — single-source means harness re-generates
.cursor/rules/on every render. If you prefer Cursor-only ownership of those rules, the only opt-out today is to drop thecursortarget fromharness.yaml.targets. A dedicated opt-out flag is TODO for a future PLAN. -
Adaptive (Track B start) —
harness.yaml.adaptive.disable_telemetry: false(opt-out per ADR-005) enables override telemetry./hm:personalization-auditscores your harness composite (0–100; Bronze < 40 < Silver < 65 < Gold < 85 ≤ Platinum) and surfaces ranked action items with evidence. SessionStart drift hint fires after 30 overrides or 14 days without an audit.
All adaptive features run 100% locally — no network calls
(asserted by tests/unit/test_no_network.py).
0.12.1 patch extended CACHED_MANIFESTS with literal filenames from STACK_GLOB_MANIFESTS (stack.yaml, package.yaml) so Haskell projects now properly invalidate the profile cache on those manifests' mtime bump.
Roadmap
Done (0.12.0–0.12.1):
- Track A (Detection Depth): 12+ stacks/frameworks/pkg-mgr/CI
- Track D (Foreign AI Config Migration): 6 config types, single-source re-render,
@hm:harness:*markers, 0.11.x migration - Track B-start (Adaptive): override telemetry,
/hm:personalization-audit, SessionStart drift surface
Next (0.13.0 — Track B completion + Cursor opt-out):
- Track B-extra: B2 permission-frequency capture + B3 reviewer-signal aggregation
harness.yaml.cursor.opt_out_renderflag for Cursor power-users (ADR-003 documented constraint)github/spec-kitexternal e2e fixture vendoring (currently pytest.skip with TODO)
Future (0.14.0+ — Track C strategic expansion, ranked by user value):
- C2 privacy/regulated tier (HIPAA/PCI/GDPR — enterprise entry barrier)
- C1 team-profile axis (solo vs small-team vs large-team)
- C3 code-style detection (docstring conventions, naming patterns)
- C4 cross-project user-defaults
- C5 per-developer overlay in shared harness
Deferred-by-data:
- Rubric v0 calibration after 30+ projects accumulate
/hm:personalization-auditruns (passive trigger)
Standing items:
- PyPI publish — remove the editable-from-clone requirement.
- Claude Code + Cursor + Codex Marketplace listings — submit all plugin manifests.
.hm-meta.yamlsidecar for Cursor assets — enable hash-tracking of.cursor/rules/*.mdcwithout polluting Cursor frontmatter.- User-configurable anti-rot repo list —
harness.yaml.anti_rot.github_reposto track additional Claude Code ecosystem repos beyond the default. - Demo screencast — record a first-install +
/hm:loopsession. Enterprisepreset — stricter security gates, mandatory spec-driven dev mode.
Development
uv sync
uv run pytest # full suite
uv run ruff check src/ tests/ # lint
uv run ruff format src/ tests/ # format
uv run mypy --strict src/ # type check
bash .claude-verify.sh all # phase-by-phase exit criteria + final acceptance
Contributing
See docs/CONTRIBUTING.md for adding skills/agents/presets, test patterns, and the PR checklist (including the 4-file version bump invariant).
See docs/ARCHITECTURE.md for the 14 mechanisms (M1-M14) behind the system.
License
MIT — see LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file harness_maker-0.14.0.tar.gz.
File metadata
- Download URL: harness_maker-0.14.0.tar.gz
- Upload date:
- Size: 363.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c9f96e94c1c35c1fb8180443aefbd1b9fbba91df3650dcab3abcf3bd6b3cfb4
|
|
| MD5 |
c7a85086c167e7dba3e9c762071adcd7
|
|
| BLAKE2b-256 |
6be1f2ad1e68a68baa6e6c75e381b2f12c96c6f01817d19bab4508ff3acda423
|
File details
Details for the file harness_maker-0.14.0-py3-none-any.whl.
File metadata
- Download URL: harness_maker-0.14.0-py3-none-any.whl
- Upload date:
- Size: 451.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a890723293631d51f24472a595d370f0eb69c6a6d0ae5d009380d7d1782e0bb2
|
|
| MD5 |
d07adaea72071a36f526b496926b8ba4
|
|
| BLAKE2b-256 |
585ee851a44f604ab6f79fb9070c9f7556313143e3639548964bdb2555711a16
|