Skip to main content

Universal harness layer for AI coding agents — one command sets up your repo for any agent (Claude Code, Cursor, Codex, Gemini CLI, Aider, OpenHarness).

Project description

harnessforge

One command turns any repo into a project where Claude Code, Cursor, Codex, Gemini CLI, Aider, and any other coding agent show up already knowing the codebase.

uvx harnessforge init

PyPI License: MIT Python 3.11+ CI Tests


What you get

Run harness init in your repo. ~3 seconds later, you have:

your-repo/
├── AGENTS.md           ← every coding agent reads this (OpenAI Codex CLI convention)
├── SOUL.md             ← personality / tone for this project
├── TOOLS.md            ← which tools / MCPs to use
├── MEMORY.md           ← memory schemas
├── SKILLS/             ← anthropics/skills-compatible procedures
│   ├── chunk-and-embed/SKILL.md
│   ├── retrieve-and-rerank/SKILL.md
│   └── …
├── .claude/CLAUDE.md       ← Claude Code reads this automatically
├── .cursor/rules           ← Cursor reads this automatically
├── .continue/config.json   ← Continue reads this automatically
├── .windsurf/rules         ← Windsurf reads this automatically
├── harness.config.json     ← what blueprint this repo is bound to
└── .harness/
    ├── profile.yaml        ← the canonical machine-readable description
    ├── manifest.json       ← sha256 of every file for safe re-runs
    └── memory_schemas/     ← JSON Schemas the blueprint expects

These aren't placeholder stubs. Here's the first 25 lines of an AGENTS.md harness init --no-llm produced for a tiny stock-analysis repo:

# AGENTS.md

> _Generated by harnessforge v0.2.1 · blueprint `finance-agent` v1.0.0._

You are a **finance / market-data analyst agent** working in **portfoliowatch**.

This project is **read-only by default.** You fetch market data, compute
signals, surface insights. You **never** place orders, move money, or
modify positions without an **explicit per-action human-approval gate**
that the user typed "yes" through in this session.

---

## The analyst loop

fetch → compute → screen → flag


- `SKILLS/fetch-market-data` — get prices/quotes/fundamentals; respect rate limits.
- `SKILLS/compute-technicals` — RSI, SMA, MACD, etc. Vectorized; tested against canonical references.
- `SKILLS/screen-positions` — filter the universe by your declared criteria.
- `SKILLS/flag-attention` — surface what changed and why — calibrated, not alarmist.

A Claude Code session opened in that repo reads it and knows the loop, the safety contract, and which skills to invoke — without you typing any of it into the chat.


Install

# No install, run once:
uvx harnessforge init

# Or install globally:
pipx install harnessforge
harness init

# Or pip into a venv:
pip install harnessforge

--no-llm makes init fully deterministic — no API key, ~2 seconds:

uvx harnessforge init --no-llm

Optional extras add an LLM-based refinement step (pulls dependency only if you want it):

pip install "harnessforge[anthropic]"   # use Claude to refine the profile
pip install "harnessforge[openai]"      # use GPT
pip install "harnessforge[mcp]"         # expose harness itself as an MCP server

Why this exists

Every time a developer opens a new repo in Claude Code (or Cursor, Codex, Gemini CLI, Aider), they re-explain the project: what kind of code is this, what conventions, what's forbidden, what does done look like. Different IDE, same boilerplate. And every file is project-specific — you can't just copy yesterday's CLAUDE.md.

The fix is small: a deterministic walker that inspects the repo, picks a sensible agent blueprint based on the deps + structure, and emits the ground-truth files every major coding agent reads on startup. Run it once per repo, commit the output, every agent you use shows up smarter.

That's what harness init is. The thing it generates is the harness; the CLI is a 60-second way to author one.


Five blueprints in 0.2.x

Pick with --blueprint, or let the recommender choose based on inspection (yfinance deps → finance-agent; langchain/qdrant → rag-agent; airflow → workflow-agent; generic Python → python-cli-app).

Blueprint For Skills it ships
python-cli-app Build a Python CLI / library / web API — the default for greenfield Python work add-cli-command, add-unit-test, manage-dependency, check-style
finance-agent Market data + portfolio analysis fetch-market-data, compute-technicals, screen-positions, flag-attention + no_trades_without_gate validator that fails if generated code calls a broker function without an approval check
rag-agent Retrieval-augmented Q&A with citation enforcement chunk-and-embed, retrieve-and-rerank, answer-with-citations, eval-recall-precision + citation cross-checker
support-agent Customer support: intent → KB → ticket → escalate classify-intent, retrieve-kb-answer, file-ticket, escalate-if-unresolved + ticket-lineage validator
workflow-agent Multi-step orchestration (Zapier/n8n-style) decompose-task, call-tool-with-retry, check-result + tool-log + idempotency validators

Beyond the catalog you can author project-specific skills:

harness skills add fetch-portfolio-prices \
  --domain --description "Fetch live prices for tickers in positions.json from Polygon."

Domain skills land under SKILLS/domain/<name>/SKILL.md and surface in harness skills list alongside blueprint-shipped skills.


Validation that's actually a contract, not a vibe

harness verify runs blueprint-defined checks and emits a stable JSON contract — the same contract whether you call the CLI or the MCP tool. Your coding agent reads the JSON and self-corrects:

$ harness verify --json
{
  "schema_version": 1,
  "blueprint": "finance-agent",
  "checks": [
    {"name": "structure", "status": "pass", "duration_ms": 1, "messages": []},
    {"name": "tests",     "status": "pass", "duration_ms": 42, "messages": []},
    {"name": "no_trades_without_gate", "status": "pass", "duration_ms": 8, "messages": []}
  ],
  "summary": {"total": 3, "passed": 3, "failed": 0}
}

Exit codes: 0 all pass · 1 failures · 2 config error · 3 not a harness-bootstrapped repo. Drop in CI:

- run: pip install harnessforge
- run: harness sync --check    # fail if generated files drifted from manifest
- run: harness verify --json   # fail if blueprint contract broken

The no_trades_without_gate validator on finance-agent is the spicy one: it static-scans the repo for place_order(, buy(, sell(, etc., and fails the build if any of them sit behind only a config flag instead of a runtime user-approval gate. See docs/concepts/trust-model.md for the full trust boundary (profile.test_command and lint_command are executed as shell — same model as make test / npm test).


Commands

harness init [PATH]                       # inspect → profile → blueprint → render
harness sync [PATH] [--check]             # re-render adapters; --check = drift detect for CI
harness inspect [PATH] [--json|yaml]      # deterministic InspectionReport
harness verify [TARGET] [--json] [--tests|--lint]
harness blueprint {list, show, apply}
harness skills {list, show, add [--domain]}
harness doctor                            # diagnose env: provider keys, extras, repo health
harness mcp                               # run stdio MCP server (5 typed tools)
harness version

Use with your existing coding agent

Agent What it reads Setup
Claude Code .claude/CLAUDE.md nothing — automatic
Cursor .cursor/rules nothing
Codex CLI AGENTS.md nothing
Continue .continue/config.json nothing
Windsurf .windsurf/rules nothing
Gemini CLI / Aider AGENTS.md nothing
Anything that speaks MCP harness mcp exposes 5 typed tools add to client config

For the MCP path:

{
  "mcpServers": {
    "harness": {
      "command": "uvx",
      "args": ["--from", "harnessforge[mcp]", "harness", "mcp"]
    }
  }
}

Exposes harness_inspect, harness_blueprint_list, harness_skills_list, harness_verify, harness_profile_read as typed tools your agent can call.


Hero demos (reproducible from this repo)

Three end-to-end demos against real public repos, pinned to specific SHAs:

Demo Repo Blueprint Run
FastAPI + RAG fastapi/full-stack-fastapi-template rag-agent examples/hero/fastapi_rag/run.sh
Zulip + Support zulip/zulip support-agent examples/hero/zulip_support/run.sh
Airflow + Workflow apache/airflow workflow-agent examples/hero/airflow_workflow/run.sh

Each run.sh shallow-clones the upstream repo at the pinned SHA, runs harness init --no-llm, and runs harness verify to confirm everything passes. CI runs all three on every push.


How this compares to alternatives

The "agent infrastructure" space has runtimes (Hermes, OpenClaw, OpenHarness), SDKs (OpenAI Agents SDK, Mastra), and now provisioners (harnessforge). The honest version of the comparison — including when not to use harnessforge — lives in docs/concepts/vs-hermes-openclaw-openharness.md.

The shortest version: if you write your own CLAUDE.md for every repo you start, harnessforge replaces that file with one that's actually project-specific, plus everything the other coding agents you use need to read. That's the comparison most readers are actually making.


Quality bar

  • 186 tests across unit / golden-file / interop / integration tiers, all green on Python 3.11 / 3.12 / 3.13
  • 83% line coverage on src/harness/
  • mypy strict + ruff clean
  • mkdocs --strict builds clean
  • Fresh-venv install verifiedpip install harnessforge && harness version works on a clean machine (the v0.2.1 release was held until this passed)
  • Two rounds of real-agent A/B evaluation — Claude Code building the same stock-analysis agent WITH vs. WITHOUT the harness; the diff drove the v0.2 + v0.2.1 designs. See CHANGELOG.md for the per-fix-per-eval breakdown.

Status

harnessforge 0.2.1 — first public release. Following Semantic Versioning.

Feedback welcome via issues — see CONTRIBUTING.md and SECURITY.md. The 5 shipped blueprints are intentionally opinionated; PRs proposing a 6th are encouraged. Sales / browser blueprints land in 0.3 alongside auth-bearing MCP catalog entries.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

harnessforge-0.2.1.tar.gz (165.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

harnessforge-0.2.1-py3-none-any.whl (226.6 kB view details)

Uploaded Python 3

File details

Details for the file harnessforge-0.2.1.tar.gz.

File metadata

  • Download URL: harnessforge-0.2.1.tar.gz
  • Upload date:
  • Size: 165.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for harnessforge-0.2.1.tar.gz
Algorithm Hash digest
SHA256 d78a1a526365a0f9f480e1f202be642987b3ad64e1c4adec006d96a3a84370da
MD5 efce4010e50803cb765d7b926cf0930e
BLAKE2b-256 6771c2d0d6c248fb5cfb6b6bbe6333be0d0cfe888df57250f364b9d0c46caec0

See more details on using hashes here.

Provenance

The following attestation bundles were made for harnessforge-0.2.1.tar.gz:

Publisher: release.yml on jcaiagent7143-ui/harnessforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file harnessforge-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: harnessforge-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 226.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for harnessforge-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9263023f52d58d7bf0369d6cf750c5b99fbb0b032029aaa498022c0c95c7fb16
MD5 1c5ba09229e4fac5349cc0dae93b2721
BLAKE2b-256 ae6a77b397c9df8c2591809ab9827e489f4e352b4c9dd47d0021fa26f9ad437b

See more details on using hashes here.

Provenance

The following attestation bundles were made for harnessforge-0.2.1-py3-none-any.whl:

Publisher: release.yml on jcaiagent7143-ui/harnessforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page