Project-tailored Claude Code / Cursor / Codex harness generator. Grade-gated review + auto-fix, multi-reviewer consensus, edit-preserving block-merge upgrades, anti-rot crawlers, worktree isolation, version-drift detection, and a 3-layer ai-readiness rubric with ranked improvement actions.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

0xecro1

These details have not been verified by PyPI

Project description

harness-maker

A harness that knows your project — and stays that way.

Interview-born. Grade-gated. Anti-rot by design. Multi-target.

Why · Quickstart · Features · How it works · Slash commands · Comparison · Configuration · Targets · FAQ · Roadmap · Deep dive

Why harness-maker?

Most Claude Code setups start from a generic template and drift from day one. harness-maker takes a different stance — it builds a harness that is shaped around your project and keeps that shape over time.

Four principles drive every design decision:

1. Personalized — Interview-born, not template-pasted. The profiler reads your stack, scale, and lifecycle. The interview locks in preset, dev mode, target runtimes, and reviewer depth. A Side experiment and a Production service get structurally different harnesses — different reviewer sets, different workflow stage counts, different security gates. No generic defaults silently shipped.

2. Trusted — Grade-gated, not hope-based. Every /hm:execute runs in a fresh git worktree and follows a TDD loop. /hm:review doesn't just report findings — it applies consensus-passed fixes and re-reviews until the grade meets the configured threshold (default A). Pre-LLM mechanical checks (lint, tests) gate the review before any reviewer agent spawns. Reviews you trust don't need second-guessing.

3. Anti-rot — Built-in, not retrofitted. The Claude Code, Cursor, and Codex ecosystem moves fast. harness-maker ships a weekly crawl across 4 sources (Anthropic blog, GitHub releases, arxiv cs.SE/CL/CR, OSV.dev), scores each item for relevance, and surfaces only what matters — always manual-confirmed, never auto-applied. The failure-memory system proposes new skills and rules when the same failure recurs 3× across sessions.

4. Evolving — Refresh cycles, not rewrites. /harness-maker:make --update picks up new template improvements without touching your edits (hash-based KEEP/REPLACE per file). The session memory and wiki build up project-specific conventions. Failure proposals turn recurring stumbles into permanent fixes. The harness improves with the project, not against it.

Why harness-maker?
Quickstart
Requirements
Features
How it works
Slash commands the harness exposes
How it compares
Configuration
Targets
Reconcile rules
Observability
Marketplace
FAQ
Roadmap
Development
Contributing
License

Quickstart

Universal Bootstrap Prompt

Paste into any LLM agent — it installs harness-maker and generates your project harness automatically. Works in Claude Code, Cursor, Codex CLI, or any chat with shell access. The agent detects your OS (Linux / macOS / Windows / WSL) and IDE on its own; you do not branch by hand.

Install and bootstrap harness-maker for this project, end-to-end.

You are an LLM agent with shell access. Detect the environment silently, install
harness-maker, and render the harness. Ask me ONE question (target IDE confirmation)
and otherwise proceed autonomously. Conduct the conversation in the language of my
first reply; default to English if you cannot tell.

Step 1 — Environment detection (silent — do NOT ask me):
  - OS: try `uname -s` (POSIX → Linux / Darwin / CYGWIN* / MINGW*). If `uname` fails,
        returns empty, or returns a value outside that set, treat the host as native
        Windows.
  - WSL: when `uname -s` returns Linux AND `/proc/version` contains "microsoft" or
        "Microsoft" (case-insensitive), treat as WSL — use the POSIX path below.
  - IDE: pick the first match — `$CLAUDE_CODE` or `~/.claude/` → Claude Code;
        `$CURSOR_SESSION` or `~/.cursor/` → Cursor; `$CODEX_SESSION` or `~/.codex/` → Codex;
        none → plain terminal (CLI mode).
  - Python: run `python3 --version` (POSIX) or `python --version` (Windows). Need 3.12+.
        If missing, install via `uv python install 3.12` after Step 2.
  - uv: run `uv --version`. If missing, install:
        • POSIX / macOS / WSL: `curl -LsSf https://astral.sh/uv/install.sh | sh`
        • Native Windows (PowerShell): `irm https://astral.sh/uv/install.ps1 | iex`

Step 2 — Install harness-maker (skip if harness-maker is already on PATH):
  uv tool install harness-maker

Step 3 — Profile the project:
  # POSIX / macOS / WSL:
  harness-maker profile "$(pwd)" --json
  # PowerShell (Windows native) — quote-expanded for paths with spaces:
  harness-maker profile "$((Get-Location).Path)" --json
  Read stack / scale / lifecycle. Tell me: "Your project looks like {stack}, {scale} scale,
  {lifecycle} lifecycle." Recommend a preset (Side or Production) based on the profile.

Step 4 — Confirm targets (ask me ONCE):
  Based on Step 1's IDE detection, suggest one of:
    • Claude Code only → targets=claude-code
    • Cursor only       → targets=cursor
    • Codex only        → targets=codex
    • Multiple          → comma-separated (e.g. claude-code,cursor)
  Wait for my confirmation, then continue.

Step 5 — Generate the harness:
  # POSIX / macOS / WSL:
  harness-maker make "$(pwd)" --preset <PRESET> --locale en \
    --targets <TARGETS> --autoloop
  # PowerShell (Windows native):
  harness-maker make "$((Get-Location).Path)" --preset <PRESET> --locale en `
    --targets <TARGETS> --autoloop

Step 6 — IDE reload:
  • Claude Code: run `/harness-maker:make` to verify the plugin loads.
  • Cursor: reload the window (Ctrl+Shift+P → "Reload Window") so agents, skills, and
    commands under .claude/ are discovered.
  • Codex: AGENTS.md and .codex/ are read on next session start.
  Tell me the harness is ready and suggest: /hm:health

Manual install (step-by-step, no LLM)

# 1. Install harness-maker CLI from PyPI
uv tool install harness-maker

# 2. Profile and generate the harness
cd your-project
harness-maker profile . --json         # inspect your stack/scale/lifecycle
harness-maker make . --preset Production --locale en --targets claude-code,cursor

# 3. (Optional) Load as Claude Code plugin for /harness-maker:make command
claude --plugin-dir /path/to/harness-maker

Native Windows / PowerShell equivalents:

# Install uv if you don't have it
irm https://astral.sh/uv/install.ps1 | iex

# Install harness-maker and run
uv tool install harness-maker
harness-maker profile . --json
harness-maker make . --preset Production --locale en --targets claude-code,cursor

The interview asks for locale first, then preset (Side or Production), dev mode, and target runtime. The setup preview explains generated roots, backups, preserved user blocks, target-specific leftovers, review trade-offs, and how to continue advanced Second Brain setup with /hm:configure. A fully-rendered harness is ready in one turn.

Re-run with flags to evolve the harness:

harness-maker make . --audit           # score the existing .claude/ against the rubric
harness-maker make . --add NAME        # graft on a single skill/agent/command
harness-maker make . --remove NAME     # surgically remove one component
harness-maker make . --promote NAME    # move an ad-hoc artifact into the harness

Requirements

Dependency	Notes
Python 3.12+ and `uv`	Required wherever Claude Code runs against your project — even if your project's primary language is Rust, Node, or Go. Hooks invoke `uv run python -m harness_maker.gates.*`; without `uv` they are silent no-ops.
Claude Code CLI (plugin + hook support)	Optional for plugin mode (`claude --plugin-dir /path/to/harness-maker`). Not required — the `harness-maker` CLI can generate harnesses independently.
Cursor IDE 2.4+ (3.2+ recommended)	Optional. Reads `.claude/agents/`, `.claude/skills/`, and `.claude/commands/hm/*.md` natively (verified empirically in 0.6.2, re-confirmed 0.7.1 — see `tests/cursor-compat/results-2026-05-08.md`). Hooks render to a separate `.cursor/hooks.json` with Cursor-native schema; both files are emitted when `targets` includes `cursor`. Cursor 3.0+ adds native `/worktree` and `/best-of-n` which coexist safely with harness-maker's prefix-matched cleanup.
OpenAI Codex CLI	Optional. When `targets` includes `codex`, harness-maker renders Codex-native `.codex/` config, `AGENTS.md`, and `.agents/skills/` assets from the same workflow definitions.
Git	Required for worktree isolation (every `/hm:execute` and `/hm:loop` run).

Features

Single command, no subcommand sprawl. /harness-maker:make is the only entry point. Everything else is a flag (--audit, --add, --remove, --promote). No muscle-memory tax.
Two presets, ten override dimensions. Side (1 reviewer, lean, fast) and Production (5 reviewers, verify-required, secure) cover ~90% of teams. The remaining 10% comes from override dimensions surfaced in the interview: workflow naming, models, autoloop, anti-rot, worktree, security, context-lint, memory, caching.
Three targets from one harness. Claude Code gets the native .claude/ runtime. Cursor reuses .claude/agents/, .claude/skills/, and .claude/commands/hm/ while receiving Cursor-specific hooks and rules. Codex receives AGENTS.md, .codex/config.toml, .codex/hooks.json, agent TOML files, and .agents/skills/ so the same stage and workflow model runs in Codex without hand-porting prompts.
Unified health audit. /hm:health runs a 3-layer composite: structural (ai-readiness deterministic + LLM rubric, 70%/25%/5%), external risks (4-source anti-rot crawl + LLM relevance filter), and personalization (ADR-011 rubric — Bronze/Silver/Gold/Platinum). Outputs a 3-section .claude/observability/dashboard.md. Every item routes through accept / reject / defer — no auto-apply, ever (ADR-001). All telemetry stays local. Replaces 0.12.x's separate /hm:ai-readiness, /hm:refresh, and /hm:personalization-audit commands.
Anti-rot pipeline. Weekly crawl across 4 sources: Anthropic blog/changelog, GitHub releases (anthropics/claude-code), arxiv (cs.SE / cs.CL / cs.CR), OSV.dev CVEs. Each item is LLM-scored for relevance with an adaptive threshold (starts at 0.7, adapts ±0.05 based on your accept/reject history). Always manual-confirmed — there is no --auto-apply path.
Worktree isolation per run. Every /hm:execute runs in a fresh git worktree under .worktrees/. /hm:loop allocates one worktree for the whole loop, shared across iterations to reduce branch churn. sibling_repos can opt related repos into the same isolation session for split frontend/backend or app/library work. Successful runs clean up; failed runs preserve evidence. Prefix-match cleanup never touches Cursor-managed worktrees in the same directory.
7 security gates. secrets (regex + entropy, gitleaks-style), permissions (settings.json over-grant detection), hook injection (hooks.json AST scan for rm -rf, curl | sh, eval), dependency CVEs (OSV.dev), hallucination (AST scan for non-existent imports — pure-filesystem check, no execution of LLM-generated code), prod-name guard (cross-tool sequence detection for production-targeting patterns), prompt injection (hidden-instruction pattern detection + privilege separation, regex + LLM second pass). Findings go to .claude/observability/security/findings-*.jsonl — never transmitted.
Privilege separation. Reviewer agents get permissions.deny: [Write(*), Edit(*), Bash(rm:*), Bash(curl:*), Bash(npm:*), Bash(eval *), Bash(python:*), Bash(node:*), Bash(sh:*), Bash(bash:*)] — interpreter denies block subprocess-bypass attempts (0.6.2 hardening). Executor agents get permissions.allow: [Write(.worktrees/**), Edit(.worktrees/**), Bash(uv run:*), Bash(pytest:*), …] plus paired Edit/Write denies on system paths (/etc, ~/.ssh, ~/.aws). Combined with worktree isolation and the 0.7.1 telemetry tool-input whitelist + secret redaction (ADR-107), this gives defense-in-depth: even a prompt-injected reviewer cannot write to disk or shell out via interpreters; even a compromised executor cannot write outside the active worktree or touch system credentials; even a poisoned tool_input payload cannot leak credentials into the metrics log.
Brownfield-safe. Reconciler indexes existing .claude/, computes hash-based ours/theirs decisions via provenance frontmatter, and offers per-conflict keep/replace/both. Apply is ADD-only with timestamped backups. User edits are never silently overwritten.
Mechanical pre-checks in review. Add shell commands to harness.yaml.reviewers.mechanical_checks (e.g., ruff check ., uv run pytest tests/unit -x -q). These run before any LLM reviewer — stop-on-first. If a command fails, /hm:review emits ## MECHANICAL_BLOCK: <cmd> exit=<N> and halts immediately: no grade, no auto-fix loop, no wasted reviewer tokens on broken code. The list is preserved across re-renders.
Deep interview before every implementation. /hm:spec runs a 6-category interview (Intent → Outcomes → In-Scope Scenarios → Non-Goals → Constraints → Verification) scored for completeness; incomplete categories trigger follow-up questions, not silent gaps. /hm:plan runs a 9-category interview (scope → architecture → contract → risk → testing → phasing → dependencies → failure handling → observability) in impact order. Each decision that changes a component boundary, introduces a new contract, or rejects a viable alternative is promoted to a formal Architecture Decision Record (ADR) and becomes a binding constraint for /hm:execute — not advisory. A plan-validator agent gates the plan before it is written to disk: NEEDS_REVISION triggers targeted follow-up rounds; MAJOR_REVISION escalates to the user. The interview ends with a "no deferred decisions" scan — any "Accept?/OK?/Verify?" phrasing is a missed interview round, not a plan checkpoint.
Autoloop with adaptive interview and 4-gate convergence. /hm:loop runs time-and-iteration-bounded loops. Before the first iteration, autoloop-driver reads the goal description, extracts already-answered dimensions (purpose / invariants / priority / stopping_criteria), and asks only what's missing — no fixed question script. It locks a loop intensity and explicit exit checklist, then requires mechanical checks, per-criterion LLM judgment, regression comparison, and a two-iteration convergence streak before accepting completion. A single worktree is shared across all iterations, and failed iterations preserve the worktree for inspection.
Refdocs search skill. Register your project's reference folders (architecture docs, API specs, design docs) in harness.yaml. /harness-maker:make builds a local docs_index.yaml, and the refdocs-search skill gives the LLM lossless full-text search across registered folders — no chunking, no RAG index.
SessionStart drift reminder. A hook fires on every session open and warns if the running harness-maker version differs from the version that rendered the harness — so you notice when a /plugin update needs a re-render. The detector compares harness.yaml.harness_maker_version against the latest plugin version cached on disk (not just the imported __version__), so /hm:health and SessionStart agree even when the slash command runs against a pinned older version (0.6.2).
Memory tier with cross-process safety. .claude/memory/ holds episodic/ (per-day JSONL), semantic.jsonl (queryable index), profile.json, wiki.md, and failures.md. Concurrent writers from parallel sessions are serialised via a re-entrant POSIX flock — same thread can re-acquire without deadlock, different threads block normally (ADR-106, 0.7.1). Telemetry hooks append atomically via raw os.write() on O_APPEND (single-syscall, ≤ PIPE_BUF) so concurrent Claude Code + Cursor hooks cannot interleave JSONL lines.
3-tier context loading + compaction recovery. Every stage opens memory in tier order — Hot (session/<today>.md), Warm (failures.md + wiki.md first 60/40 lines), Cold (git log / PLANs on demand). PreCompact hook flushes the session to session/<today>.md with a checkpoint:compaction marker before Claude compacts context; the next turn detects the marker and resumes from the last in-progress phase without losing progress.
2-pass redaction for review precision (+47 pp). Reviewers run twice: Pass 1 strips PR title, author, and commit message so findings aren't anchored to metadata; Pass 2 restores full context and each reviewer validates or drops their Pass 1 findings. Findings absent from Pass 2 are dropped (CP10 contract). Ablation showed a +47 percentage-point precision gain on anchoring-prone diffs.
Self-improving failure memory. Every stage appends failure patterns (wrong API usage, broken convention, unexpected build failure) to failures.md with a count. When any failure slug reaches count ≥ 3, a proposal is automatically appended to pending-proposals.md — suggesting a new skill, rule, or hook that would have prevented the recurrence. The user reviews and decides whether to ingest.
ADR system as binding execute constraints. Architecture Decision Records promoted during /hm:plan are hard constraints — /hm:execute must not violate them. If a PLAN phase conflicts with an ADR, execute surfaces it as a blocker rather than silently proceeding. ADRs capture rejected alternatives and the reasoning, so future sessions don't re-litigate settled decisions.
Cache miss classification (4 reasons). The prompt-cache diagnostic layer reports why a cache miss occurred: min_threshold (content too short), invalidation (context changed), ttl (5-min TTL expired), or first (cold start). The 5% weight in the AI-readiness score distinguishes cold-start misses (benign) from structural misses (actionable), so you fix the right thing.
Grade-based auto-fix loop. /hm:review computes a grade (A–F) from consensus-passed P0/P1 findings and loops: apply fixes → re-review (selective — only re-spawn reviewers whose scope was touched) → regrade, until grade meets grade_threshold (default A) or max_review_rounds is exhausted. Failed fixes that break the build are automatically reverted and logged. Weak-consensus and manual-only findings are never auto-applied.
Pre-LLM mechanical checks gate. Add shell commands to harness.yaml once and they run at the start of every /hm:review — before any reviewer agent spawns. Stop-on-first: the first non-zero exit emits ## MECHANICAL_BLOCK: <cmd> exit=<N>, halts the review, and exits CHANGES_REQUESTED. Lint clean and tests green are enforced mechanically, not by reviewer prompt. --no-auto-fix does not skip mechanical checks.
Detection Depth — 12+ stack/framework granularity (python/node/rust + java/kotlin/swift/dart/ruby/php/csharp/elixir/scala/c-cpp/zig/haskell), framework dep parsing, package_manager + ci_provider detection, manifest-mtime cache with 24h ceiling.
Foreign AI Config Migration — detect 6 known foreign configs (.cursor/rules/, AGENTS.md, CLAUDE.md, .continue/config.json, .aider.conf.yml, .github/copilot-instructions.md); LLM-driven import; @hm:harness:* inverted markers preserve user content across re-renders.
Adaptive Personalization — /hm:health Step 3 composite-score rubric (Bronze/Silver/Gold/Platinum) with evidence-bearing ActionItems; 100% local telemetry; SessionStart drift hint after 30 axis overrides or 14 days. (Absorbs the 0.12.x /hm:personalization-audit command; ADR-011 rubric unchanged.)
Confidence-Bucketed Recommendation UI — every detection declares its own confidence (HIGH/MEDIUM/LOW); high → silent yaml comment, medium → explicit prompt, low → no surface. Backward-compat regression test guards 0.11.x users from surprise silent-default changes.

For the complete mechanics behind each feature — all procedures, decision paths, and internal invariants — see docs/HOW-IT-WORKS.md.

How it works

flowchart TD
    A["/harness-maker:make"] --> B["Profile\n(stack, scale, lifecycle)"]
    B --> C["Interview\n(preset + 10 dims + targets)"]
    C --> D["Synthesize\n(deterministic Blueprint)"]
    D --> E["Render\n(Jinja2 + provenance frontmatter)"]
    E --> F{Brownfield?}
    F -- No --> G["Write .claude/ directly"]
    F -- Yes --> H["Reconcile\n(hash-based keep/replace/both)"]
    H --> G
    G --> I["Extra targets?\nRender .cursor/ and/or .codex/ assets"]
    I --> J["User runs /hm:* commands"]
    J --> K["Weekly /hm:health (Step 2)\n4-source anti-rot crawl\n→ manual confirm"]

14 mechanisms (M1-M14) back every feature. See docs/ARCHITECTURE.md for the full breakdown including the privilege-separation model, security gate triggers, and reconcile invariants.

Since 0.12.0 the synthesis pipeline also threads through a typed Recommendation registry: every recommend_<axis>(profile, project_dir) function declares per-detection confidence (ADR-007), and the interview.py dispatcher routes by bucket. Detection results land in ~/.cache/harness-maker/profile-<repo-hash>.json with manifest-mtime + 24h-ceiling invalidation. Foreign AI config files detected at /hm:configure time can be imported into harness.yaml and re-rendered single-source with @hm:harness:* inverted block markers (ADR-009).

Slash commands the harness exposes

After install, the rendered harness exposes commands under /hm:*:

Atomic stages (always available)

Command	Purpose
`/hm:research`	Gather facts, user-workflow signals, best practices, and options
`/hm:spec`	Write acceptance criteria from research
`/hm:plan`	Decompose spec into phases with exit criteria
`/hm:execute`	Implement with TDD + worktree isolation
`/hm:review`	Multi-reviewer consensus (mechanical pre-check → conditional routing)
`/hm:wrapup`	Clean, document, commit
`/hm:verify`	6-check gate before completion

Fused workflows (preset-generated, user-renameable)

Fused workflows combine atomic stages into a single command. The interview generates a starter set; you can add, remove, or rename them in harness.yaml.

Preset	Default fused workflows
Side	`/hm:plan-exec-rev` · `/hm:exec-rev` · `/hm:exec-rev-wrap` (default)
Production	`/hm:exec-rev-wrap-ver` (default) · `/hm:exec-rev-wrap` · `/hm:plan-exec-rev` · `/hm:exec-rev` · `/hm:res-spec-plan`

Utility commands

Command	Purpose
`/hm:loop "<goal>"`	Autoloop driver — `feature` or `improve` mode, time/iter-bounded
`/hm:ai-readiness`	3-layer readiness score + P0/P1/P2 ranked actions
`/hm:personalization-audit`	Composite-score rubric (Bronze/Silver/Gold/Platinum) from telemetry + harness.yaml + ProjectProfile. Reads `.claude/observability/adaptive/overrides.jsonl`; outputs ranked ActionItem list with evidence schema. ADR-011 (v0); calibration deferred to 30+ project sample.
`/hm:refresh`	Anti-rot crawl — manual confirm required

How it compares

Other Claude Code harnesses pick a niche; harness-maker is the meta-tool that builds them — and then keeps them current.

Project	Scope	What harness-maker adds
ohmyclaudecode	Curated commands/agents bundle	Project-tailored synthesis (preset + 10 override dims), brownfield reconcile, provenance frontmatter, anti-rot pipeline
superpowers	Powerful sub-agents and workflows	Single-command entry, AI-readiness scoring, worktree isolation by default, privilege-separated reviewer/executor
Archon	Knowledge-base + RAG-backed planning	Stack/scale/lifecycle profiler, atomic+fused workflow engine, conditional reviewer routing, 7 security gates
aider	Terminal pair programmer, LLM-agnostic	Claude Code / Cursor / Codex native assets; harness output is the runtime (not the session); anti-rot keeps it current
ouroboros	Autonomous self-bootstrapping AI software factory	Project-shaped interview (not one pipeline for all), grade-gated reviews with mechanical pre-checks, anti-rot pipeline, brownfield reconcile, multi-target support
Hand-rolled `.claude/`	Full control, zero automation	Drift detection via provenance hash, weekly anti-rot crawl, AI-readiness scoring, worktree isolation — without writing it yourself

The core difference: harness-maker generates and owns the lifecycle of your .claude/ directory. A static tool gives you a starting point. harness-maker gives you a starting point that knows who it was created for and can be updated without losing your changes.

Configuration

The interview writes answers to .claude/harness.yaml. Key dimensions:

preset: Production           # Side | Production
locale: en                   # en | ko | <any — unknown falls back to en>
dev_mode: spec-driven        # spec-driven | task-driven
targets:                     # which runtimes to drive
  - claude-code
  - cursor
  - codex
recommended_model: claude-opus-4-7

ref_folders:
  - path: ../architecture-docs
    glob: "**/*.{md,txt,pdf}"

second_brain:
  enabled: true
  backend: filesystem
  project_id: my-app
  vault_path: ../obsidian-vault
  trusted_allowlist: true       # configured write folders need no confirmation/backup
  required_frontmatter: [type, created, updated, tags, links]
  folders:
    - path: Projects/my-app
      read: true
      write: true
      note_types: [decision, preference, failure, project, reference, journal]

sibling_repos:
  - ../backend

reviewers:
  enabled: [code, security, performance, ux, concurrency]
  routing: conditional       # conditional | always-all
  mechanical_checks:         # pre-LLM stop-on-first gate (optional)
    - ruff check .
    - uv run pytest tests/unit -x -q

worktree:
  scope: [execute, plan]     # which stages run in a fresh worktree
  cleanup: on_success        # on_success | always | never

anti_rot:
  enabled: true
  threshold: 0.7             # adaptive — adjusts ±0.05 based on accept/reject ratio

context_lint:
  strict: true               # warn on overrun | block

memory:
  files: [failures.md, wiki.md]

adaptive:
  disable_telemetry: false   # opt-out per ADR-005; 100% local capture
  audit_session_threshold: 30  # SessionStart hint after N axis overrides
  audit_days_threshold: 14     # SessionStart hint after N days without audit

All adaptive features are 100% local. tests/unit/test_no_network.py asserts no socket call during telemetry emit, audit, or SessionStart hook execution (ADR-005 positive obligation).

Run /harness-maker:make again and choose Update (same settings, pick up template improvements) or Full reconfigure to change any dimension.

Obsidian Second Brain

second_brain connects a Markdown/Obsidian vault as a typed knowledge graph for stage-aware memory. hm-research, hm-plan, hm-review, and hm-wrapup use different note types (reference, project, decision, preference, failure, journal) instead of loading the whole vault. Writes are full Markdown writes inside configured write: true folders only. Configure narrow folders: the allowlist is trusted completely, with no per-write confirmation or backup. For multiple projects sharing one vault, give every project a distinct project_id and put writable folders under a path segment with the same value (for example Projects/my-app). harness-maker rejects writable Second Brain folders that do not include the configured project_id, which keeps one project from writing into another project's note namespace.

Setup walkthrough

At interview time — when prompted for vault_path, supply the absolute path to your Obsidian vault root (or a not-yet-created subfolder of it; the harness creates the subfolder on first write iff the parent has .obsidian/). The interview then asks for project_id and a writable folder, defaulting to 99_HM/{project_id}/ (matches the 99_*/01_* numeric-prefix organization style).
Post-install adjustment — run /hm:configure to revisit Second Brain settings. The slash command dispatches to harness-maker configure-second-brain --check, which inspects state and returns guidance JSON so the slash command can prompt only when something is missing. To add a folder non-interactively, call harness-maker configure-second-brain --add-folder 99_HM/my-project/.
Existing harnesses upgraded from a pre-fix release — the loader now tolerates folders: [] (logs a one-shot warning + remediation hint pointing at /hm:configure); writes raise SecondBrainError with the same hint instead of an obscure "not under a configured write folder" message.

Internals: the rendered harness.yaml carries a provenance frontmatter block; all readers route through harness_maker.io_utils.load_harness_yaml (the canonical multi-document-tolerant loader) to avoid the parser-strategy drift that produced the original bug.

Targets

targets is a multi-select. Choose claude-code, cursor, codex, or any combination. Claude Code is the default; Cursor and Codex add runtime-specific assets while preserving the same preset, workflows, skills, reviewers, and safety model.

Cursor target

Run /harness-maker:make and pick targets: [cursor] or [claude-code, cursor] at the interview. The renderer adds:

.cursor/rules/harness.mdc — always-on workflow rules with Cursor-legal frontmatter (description / globs: [] / alwaysApply: true)
.cursor/hooks.json — Cursor-native hooks schema (lowercase camelCase keys, version: 1, flat {matcher, command}). Deliberately different from .claude/hooks/hooks.json (PascalCase, nested {hooks:[…], matcher}); each IDE reads only its own file. Don't try to collapse them — Cursor will silently stop firing hooks. See tests/cursor-compat/results-2026-05-08.md for the kairos 0.5.7 forensic that proved this empirically.
.cursor/mcp.json — Cursor MCP server config. Populated from harness.yaml.mcp_servers (0.6.2+); defaults to {"mcpServers": {}} when no servers are configured. Inner shape (command, args, env) is type-validated on parse with a warning log when entries are dropped.

.claude/agents/, .claude/skills/, and .claude/commands/hm/ are single-source — Cursor 2.4+ reads them natively (forensic-verified). Hooks are the only asset that requires per-IDE rendering because the schemas diverge by design.

Recommended model

harness.yaml.recommended_model defaults to claude-opus-4-7 and propagates to agent frontmatter. Cursor users may override model selection in their IDE. The harness does not rewrite prompts to be model-agnostic — <thinking> blocks and Claude-specific patterns are preserved deliberately.

Verification

Per-release Cursor compatibility is tracked in tests/cursor-compat/:

MANUAL_CHECKLIST.md — A1–A4 (agent dispatch, hook fire, skill auto-discovery, slash command + Q&A loop) covering both IDEs
RESULTS.md — PASS/FAIL/PARTIAL grid you fill while running the checklist
results-2026-05-08.md — kairos 0.5.7 forensic that resolved Q-A (hooks discovery) and Q-B (commands discovery) without an IDE-driven manual run; future Cursor verifications append a new dated results-*.md
fixture/ — minimal .claude/ for opening directly in either IDE

Automated CI guards the dual-schema invariants regardless of manual fixture runs:

test_cursor_hooks_uses_lowercase_native_schema — fails if .cursor/hooks.json accidentally adopts Claude PascalCase
test_no_cursor_commands_rendered — fails if a future change starts emitting .cursor/commands/hm-*.md mirrors (Cursor reads .claude/commands/ natively)
test_render_agents_have_structured_permissions_frontmatter — fails if any agent template loses its permissions.allow/deny block (Cursor 2.5+ subagent permission inheritance gap)

Codex target

Run /harness-maker:make and pick targets: [codex] or include codex with the other targets. The renderer adds:

AGENTS.md — Codex's top-level project instruction file, rendered without YAML frontmatter so Codex displays clean instructions.
.codex/config.toml and .codex/agents/*.toml — Codex-native configuration and agent registrations.
.codex/hooks.json — Codex hook wiring, including Codex-specific permission events and file-edit tool matchers.
.agents/skills/*/SKILL.md — the existing harness skills plus stage, workflow, and loop trigger skills in the layout Codex discovers.

Codex TOML files are rendered as pure TOML, not markdown-with-frontmatter. AGENTS.md uses block-merge markers so user additions survive re-renders, while .codex/*.toml files are regenerated from the selected target configuration.

Structured questions in Codex: Stage and loop skills instruct the agent to use the request_user_input tool for interview questions. This tool is available in Plan mode by default. To enable it in Code mode, add default_mode_request_user_input = true under [features] in .codex/config.toml. If the tool is unavailable at runtime, the agent falls back to asking in its response text.

Reconcile rules (re-rendering an existing harness)

Re-running /harness-maker:make on an existing harness uses a hash-driven KEEP rule: if a file's content_hash frontmatter matches the new template's hash, it's "ours" — safe to overwrite. If it differs (user-edited), it's "theirs" — kept.

Trade-off: when harness-maker bumps a template on a minor release, the existing file's hash no longer matches, so reconcile picks KEEP even if you didn't edit the file.

To pick up template updates after a version bump:

rm .claude/harness.yaml
/harness-maker:make
# (previous .claude/ is auto-backed up to .backup-<ISO>/)

.cursor/rules/*.mdc follow the same KEEP behavior. A future phase introduces a sidecar .hm-meta.yaml so harness-maker can hash-track Cursor assets without polluting Cursor frontmatter.

@hm:harness:* inverted markers (0.12.0) — for foreign-AI-config files we generate post-import (.cursor/rules/*.mdc, AGENTS.md, CLAUDE.md, etc.), content INSIDE @hm:harness:<id> markers is harness-owned (replaced on every render); content OUTSIDE is user-owned (byte-for-byte preserved). The block-merge parser dispatches per file extension: HTML comments for .md/.mdc, # @hm:harness: for .yml/.yaml, top-level _hm_harness JSON key for .json. 0.11.x files (frontmatter generated_by: harness-maker + zero @hm:harness:* markers) are treated as wholly harness-owned on first encounter post-upgrade and re-rendered into the new marker family (ADR-009 amendment).

Observability

All observability is 100% local — nothing is transmitted externally.

File	Contents
`.claude/observability/dashboard.md`	AI-readiness score, dimension breakdown, ranked action items
`.claude/observability/metrics-YYYY-MM-DD.jsonl`	Per-turn telemetry (cache hit %, tool calls, durations) — date-rotated daily (ADR-103, 0.7.1). Pre-0.7.1 `metrics.jsonl` is read as the trailing legacy shard.
`.claude/observability/refresh/raw-*.jsonl`	Anti-rot crawl evidence (accepted / rejected items)
`.claude/observability/security/findings-*.jsonl`	7-gate security scan findings
`.claude/observability/adaptive/overrides.jsonl`	`harness_yaml_override` events with `schema_version: 1`, dual capture sites (`/hm:configure` exit primary + SessionStart secondary), dedup-keyed
`.claude/observability/adaptive/last-audit.txt`	Last `/hm:personalization-audit` run timestamp

Run /hm:ai-readiness to regenerate the dashboard on demand.

Marketplace

Marketplace manifests are present for each runtime:

.claude-plugin/plugin.json — Claude Code plugin spec
.cursor-plugin/plugin.json — Cursor Marketplace spec
.codex-plugin/plugin.json — Codex plugin spec

Listings are pending. Until then, install locally:

# Any IDE — CLI install (recommended)
uv tool install harness-maker          # from PyPI (when published)
uv tool install /path/to/harness-maker # pre-PyPI, from clone
harness-maker make . --targets claude-code,cursor,codex

# Claude Code — plugin mode (optional, enables /harness-maker:make command)
claude --plugin-dir /path/to/harness-maker

# Cursor — open the repo folder directly in Cursor as a workspace plugin

# Codex — render Codex-native assets in your project harness
harness-maker make . --targets codex

FAQ

Q: Why Python? My project is Rust / Node / Go. The hooks (permission_gate, worktree_gate, telemetry) call uv run python -m harness_maker.* at PreToolUse / PostToolUse boundaries. This doesn't touch your project's toolchain — uv and harness_maker need to be on the path, but only to run hooks. Your project's build system is untouched.

Q: Why does it require uv? uv gives a hermetic, fast Python environment without polluting the system or your project's virtualenv. Hooks run in milliseconds without activating anything.

Q: Will harness-maker overwrite my hand-edits when I re-render? No. Every generated file carries a content_hash in its provenance frontmatter. Re-render compares the new template's hash against the file on disk. If they differ — meaning you edited the file — it keeps yours. See Reconcile rules.

Q: What's the difference between Side and Production? Side is lean: 1 reviewer (code), verify-before-completion optional, worktree scope [execute]. Production is thorough: 5 reviewers, verify required, worktree scope [execute, plan], security on high-finding = block. Both share the same anti-rot and caching defaults.

Q: Does anti-rot ever auto-apply? Never. Every anti-rot item surfaces via a structured question (AskQuestion in Cursor, AskUserQuestion in Claude Code) in /hm:refresh. There is no --auto-apply flag and no plan to add one. The rationale: a wrong patch is worse than a stale harness.

Q: Can I use only Claude Code? Only Cursor? Only Codex? Yes. targets is a multi-select at the interview. Claude Code uses .claude/; Cursor reuses most .claude/ assets and adds .cursor/; Codex adds AGENTS.md, .codex/, and .agents/skills/.

Q: Do my prompts or telemetry leave my machine? No. metrics.jsonl, dashboard, and security findings are written to .claude/observability/ locally. Anti-rot crawls read public sources (Anthropic blog, arxiv, GitHub, OSV.dev) but never uploads anything.

Q: How do I pick up template improvements after a /plugin update? Run /harness-maker:make → choose Update. For files where your hash matches the old template, the new version is applied. For files you edited (hash mismatch), yours is kept. To force a full refresh, rm .claude/harness.yaml and re-run.

Q: What are mechanical_checks and when should I use them? Shell commands listed under reviewers.mechanical_checks in harness.yaml run at the start of every /hm:review — before any LLM reviewer spawns. The first command that exits non-zero emits ## MECHANICAL_BLOCK: <cmd> exit=<N> and halts review immediately (CHANGES_REQUESTED). Use them for fast, deterministic gates (lint, type-check, unit test) that shouldn't waste LLM tokens when the basics are broken. The list is user-managed; harness-maker never populates it automatically. --no-auto-fix does not skip mechanical checks — they are a hard gate, not part of the fix loop.

Q: Why doesn't harness-maker rewrite prompts to be model-agnostic? The prompts are tuned for claude-opus-4-7 — <thinking> blocks, role framing, chain-of-thought structure. Rewriting for model-neutrality would degrade quality on the recommended model for hypothetical gains on others. Override recommended_model in harness.yaml if you want a different model; the prompts remain as-is.

Personalization Architecture

harness-maker 0.12.0 introduces three tracks of personalization depth:

Detection Depth (Track A) — harness_maker.profile.profile() detects stack (12+ languages), framework, package manager, and CI provider. Results land in ProjectProfile and feed the recommendation framework. Cache at ~/.cache/harness-maker/profile-<repo-hash>.json (manifest-mtime invalidation + 24h TTL).
Foreign AI Config Migration (Track D) — /hm:configure detects .cursor/rules/, AGENTS.md, CLAUDE.md, .continue/config.json, .aider.conf.yml, and .github/copilot-instructions.md. With user confirmation, harness-maker imports the foreign config's intent into harness.yaml and re-generates the foreign file as part of the harness on every render. The @hm:harness:* block markers protect user customizations outside the marked regions.

Cursor power-user constraint (ADR-003) — single-source means harness re-generates .cursor/rules/ on every render. If you prefer Cursor-only ownership of those rules, the only opt-out today is to drop the cursor target from harness.yaml.targets. A dedicated opt-out flag is TODO for a future PLAN.
Adaptive (Track B start) — harness.yaml.adaptive.disable_telemetry: false (opt-out per ADR-005) enables override telemetry. /hm:personalization-audit scores your harness composite (0–100; Bronze < 40 < Silver < 65 < Gold < 85 ≤ Platinum) and surfaces ranked action items with evidence. SessionStart drift hint fires after 30 overrides or 14 days without an audit.

All adaptive features run 100% locally — no network calls (asserted by tests/unit/test_no_network.py).

0.12.1 patch extended CACHED_MANIFESTS with literal filenames from STACK_GLOB_MANIFESTS (stack.yaml, package.yaml) so Haskell projects now properly invalidate the profile cache on those manifests' mtime bump.

Roadmap

Done (0.12.0–0.12.1):

Track A (Detection Depth): 12+ stacks/frameworks/pkg-mgr/CI
Track D (Foreign AI Config Migration): 6 config types, single-source re-render, @hm:harness:* markers, 0.11.x migration
Track B-start (Adaptive): override telemetry, /hm:personalization-audit, SessionStart drift surface

Next (0.13.0 — Track B completion + Cursor opt-out):

Track B-extra: B2 permission-frequency capture + B3 reviewer-signal aggregation
harness.yaml.cursor.opt_out_render flag for Cursor power-users (ADR-003 documented constraint)
github/spec-kit external e2e fixture vendoring (currently pytest.skip with TODO)

Future (0.14.0+ — Track C strategic expansion, ranked by user value):

C2 privacy/regulated tier (HIPAA/PCI/GDPR — enterprise entry barrier)
C1 team-profile axis (solo vs small-team vs large-team)
C3 code-style detection (docstring conventions, naming patterns)
C4 cross-project user-defaults
C5 per-developer overlay in shared harness

Deferred-by-data:

Rubric v0 calibration after 30+ projects accumulate /hm:personalization-audit runs (passive trigger)

Standing items:

PyPI publish — remove the editable-from-clone requirement.
Claude Code + Cursor + Codex Marketplace listings — submit all plugin manifests.
.hm-meta.yaml sidecar for Cursor assets — enable hash-tracking of .cursor/rules/*.mdc without polluting Cursor frontmatter.
User-configurable anti-rot repo list — harness.yaml.anti_rot.github_repos to track additional Claude Code ecosystem repos beyond the default.
Demo screencast — record a first-install + /hm:loop session.
Enterprise preset — stricter security gates, mandatory spec-driven dev mode.

Development

uv sync
uv run pytest                       # full suite
uv run ruff check src/ tests/       # lint
uv run ruff format src/ tests/      # format
uv run mypy --strict src/           # type check
bash .claude-verify.sh all          # phase-by-phase exit criteria + final acceptance

Contributing

See docs/CONTRIBUTING.md for adding skills/agents/presets, test patterns, and the PR checklist (including the 4-file version bump invariant).

See docs/ARCHITECTURE.md for the 14 mechanisms (M1-M14) behind the system.

License

MIT — see LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

0xecro1

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.17.0

May 18, 2026

0.15.3

May 18, 2026

0.15.0

May 18, 2026

0.14.3

May 17, 2026

0.14.2

May 17, 2026

0.14.1

May 17, 2026

This version

0.14.0

May 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

harness_maker-0.14.0.tar.gz (363.3 kB view details)

Uploaded May 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

harness_maker-0.14.0-py3-none-any.whl (451.4 kB view details)

Uploaded May 17, 2026 Python 3

File details

Details for the file harness_maker-0.14.0.tar.gz.

File metadata

Download URL: harness_maker-0.14.0.tar.gz
Upload date: May 17, 2026
Size: 363.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for harness_maker-0.14.0.tar.gz
Algorithm	Hash digest
SHA256	`2c9f96e94c1c35c1fb8180443aefbd1b9fbba91df3650dcab3abcf3bd6b3cfb4`
MD5	`c7a85086c167e7dba3e9c762071adcd7`
BLAKE2b-256	`6be1f2ad1e68a68baa6e6c75e381b2f12c96c6f01817d19bab4508ff3acda423`

See more details on using hashes here.

File details

Details for the file harness_maker-0.14.0-py3-none-any.whl.

File metadata

Download URL: harness_maker-0.14.0-py3-none-any.whl
Upload date: May 17, 2026
Size: 451.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for harness_maker-0.14.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a890723293631d51f24472a595d370f0eb69c6a6d0ae5d009380d7d1782e0bb2`
MD5	`d07adaea72071a36f526b496926b8ba4`
BLAKE2b-256	`585ee851a44f604ab6f79fb9070c9f7556313143e3639548964bdb2555711a16`

See more details on using hashes here.

harness-maker 0.14.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

harness-maker

Why harness-maker?

Table of Contents

Quickstart

Universal Bootstrap Prompt

Requirements

Features

How it works

Slash commands the harness exposes

Atomic stages (always available)

Fused workflows (preset-generated, user-renameable)

Utility commands

How it compares

Configuration

Obsidian Second Brain

Targets

Cursor target

Recommended model

Verification

Codex target

Reconcile rules (re-rendering an existing harness)

Observability

Marketplace

FAQ

Personalization Architecture

Roadmap

Development

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes