Skip to main content

Forge your own agent harness — a config-to-code generator with no agent-framework lock-in.

Project description

HarnessSmith

HarnessSmith

Forge your own agent harness.

A config-to-code generator that produces a standalone, framework-free agent harness you fully own — no LangChain, no LangGraph, no ADK, and no dependency on HarnessSmith after generation.

PyPI version PyPI downloads Python License: MIT GitHub stars

English | 中文


Quick start

Pick how to launch the setup wizard — either way you build a standalone, framework-free harness from scratch (you choose the capabilities: paradigms, web UI, MCP, skills, memory) that is smoke-verified before handover.

Clone the repo and run the one-click launcher — cross-platform (Windows, macOS, Linux), nothing to memorize; double-clicking the file works too. It installs uv on first use, then asks whether you want the web wizard (recommended) or the terminal wizard:

git clone https://github.com/EpisodeYu/HarnessSmith.git && cd HarnessSmith
./HarnessSmith.sh     # macOS / Linux  — or double-click it
HarnessSmith.bat      # Windows        — or double-click it

Or run it on demand with uv — no clone, nothing installed permanently:

uvx --from "harnessmith[wizard]" harnessmith wizard --open   # web wizard (recommended)
uvx harnessmith new                                          # or the interactive CLI wizard

Then start it. The generated repo ships its own one-click launcher too, named after your project (e.g. My-Agent.sh on macOS / Linux, My-Agent.bat on Windows): double-click or run it and it auto-syncs and opens the web chat — set your model and key right there in the /config panel. Or drive it with uv:

cd my-agent                    # the name you chose
uv run my-agent serve --open   # recommended: configure your model and chat, all in the browser
uv run my-agent chat           # optional: chat in the terminal

See Getting started for every option, including non-interactive generation from a hand-written spec.

Why HarnessSmith?

Most "agent starters" hand you an app wired to a framework you cannot remove. HarnessSmith generates a repository that is yours — readable, editable, and free of any agent-orchestration framework.

HarnessSmith LangGraph scaffolds ADK / static templates
Agent-framework lock-in None — a plain Python loop you own LangGraph runtime ADK / mixed frameworks
Edit or delete any line Yes — self-contained repo Limited — app sits on the framework Limited
Config-to-code (only selected capabilities are generated) Yes No — fixed template No
Core loop size ~150–300 lines Abstracted by the framework Abstracted by the framework
Runtime dependency on the generator None Framework runtime Framework runtime
Tests + lockfile + Dockerfile, smoke-verified on generation Yes Varies Varies
Dual LLM protocol, switchable at runtime (OpenAI + Anthropic) Yes Varies Varies

Overview

HarnessSmith is a generator for the agent harness, in the spirit of create-next-app. A specification (HarnessSpec) is captured through a web wizard, an interactive terminal wizard, a preset, or a hand-written YAML file; HarnessSmith then renders a complete, independent Python repository — readable, editable, testable, and runnable on its own. The generated project is not a consumer of HarnessSmith: once generated, it has zero relationship with the generator.

Design positioning

  • No agent-framework lock-in. The generated code has zero dependency on any agent-orchestration framework. The loop is plain Python that you own. Ordinary general-purpose libraries (OpenAI SDK, Pydantic, Typer, FastAPI) are used as libraries, not as frameworks that own your control flow.
  • Own your code. The output is a self-contained repository with its own tests, lockfile, Dockerfile, and documentation. Every line can be read, changed, or deleted.
  • Config-to-code. Capabilities are selected at generation time; the generator renders only what was selected. A feature that is switched off leaves no trace — no module, no dependency, no dead code.
  • Thin by default. The default product is a minimal, fully runnable harness whose core loop stays in the low hundreds of lines. Heavier capabilities (MCP, web interface, skills, memory) are opt-in spec toggles.

Highlights

Full capability list — click to expand
  • Native function calling — the loop drives the model through the API's tool_calls (TAO/ReAct semantics), not through text parsing.
  • Dual LLM protocol, runtime-switchable — every product ships both an OpenAI Chat Completions client (provider-agnostic via base_url: vLLM, Together, Groq, LiteLLM, any compatible endpoint) and a native Anthropic Messages client. Each LLM profile selects its provider in runtime configuration; no regeneration required.
  • Reasoning streams as a first-class signal — thinking/reasoning deltas are surfaced live (a status line in the CLI; a collapsible reasoning panel in the web UI), and reasoning_content is preserved across tool-calling turns for models that require it.
  • Multi-paradigm runtimeagent (default tool-calling loop), plan and ask (both read-only), selectable per turn (--mode / web dropdown). Paradigms live in a thin registry; users add their own with @register_paradigm without touching the built-ins.
  • Sessions and resumption — every conversation persists locally; resume with --continue / --resume <id>, in the multi-turn chat REPL, or from the web session sidebar (automatic titling, rename, delete). In the web UI, conversations run in parallel — each session streams independently and switching the sidebar never interrupts a background run. Interrupted runs are crash-safe: state is checkpointed at message boundaries and repaired on resume.
  • Stop / continue / re-ask — a run can be cancelled mid-turn (cooperative cancellation that also terminates streaming), continued later with full context, or — in the web UI — re-asked by editing any earlier prompt and regenerating from that point.
  • Human-in-the-loop — a built-in ask_question tool lets the model ask the user structured clarifying questions, and tool-call confirmation (allow once / reject / allow for session / allow always) gates risky tools. Non-interactive contexts fail closed.
  • Persistent per-LLM cost accounting — a usage ledger accumulates token counts per LLM profile across runs; cost is derived from per-profile prices, and a per-profile cost_limit blocks the model before the next call once reached. Managed from the web Budget page or the usage CLI.
  • Context management — combinable triggers (window_pct, max_tokens, max_turns; driven by real token usage) select when to compact; strategies (truncate, summarize, none) define how; both are user-extensible registries. Oversized tool results are clipped before entering history, overflow recovery compacts on demand, a max_steps valve bounds runaway tool loops, and compaction folds within a single long turn (sub-turn keep_last_steps) so even one agentic turn stays inside the window.
  • Composable hooks and a thin tool-policy layer — mount one or more Hooks subclasses through config.hooks (subclass-and-mount, no @register_hook); five lifecycle points (before_step / after_step / before_tool / after_tool / on_error). before_tool may refuse a call and after_tool may redact a result — a code-level policy gate with no middleware machinery — and multiple hooks compose in order. The web UI has a dedicated Hooks tab with a privacy-safe execution log; info (CLI) and GET /registries (web) surface every extension point — tools, paradigms, context strategies/conditions, memory backends, imported extensions, and mounted hooks.
  • Per-session working directory — an optional working-directory hint (CLI --cwd, the chat REPL's /cwd, or the web chat toolbar with a directory browser) is injected into the system prompt as guidance, not a sandbox. The current date/time is injected each turn too.
  • Tool ecosystem without built-in bloat — a decorator-based tool registry with per-tool risk levels, plus an opt-in MCP client (stdio, HTTP, and SSE transports) with a curated catalog (keyless multi-engine web search, fetch, git, time, Desktop Commander, GitHub). The default web-search server is keyless and multi-engine (Bing/Baidu/DuckDuckGo/Brave/Sogou/…) with automatic failover, so it keeps working when any single engine is slow or unreachable on a given network. Node-based MCP servers are npm installed once into a stable per-server dir and launched directly with node (never the ephemeral npx cache, which is unreliable on Windows). MCP servers are managed at runtime: health status, add/edit/remove, and hot reconnection from the web panel; mcp status from the CLI.
  • Agent Skills — opt-in support for the open SKILL.md standard with progressive disclosure; skills are plain files, no framework involved.
  • Cross-session memory — an opt-in, self-maintained long-term note injected each turn, written through tools, consolidated by a dedicated LLM role at session boundaries, and replaceable via a thin @register_memory backend registry.
  • Always-applied project rules — markdown rule files (AGENTS.md / CLAUDE.md / .cursor/rules conventions) injected into every system prompt.
  • Full observability — a JSONL trace per run with token/cost accounting, and an opt-in, local-only debug log that records lifecycle events (names, counts, durations) and never message content, tool arguments, or secrets.
  • Verified runnable before handover — the generator locks dependencies and smoke-tests every new repository (uv sync, import check, a mock function-calling turn, pytest) before declaring it ready.

What gets generated

Core (always present)

Capability Description
Agent loop Native function-calling loop with paradigm dispatch, lifecycle hooks, and graceful stop conditions (including a max_steps valve)
LLM layer Profile registry with role routing (generation, compaction, plus optional title / memory roles), per-profile sampling parameters, timeout/retry/fallback, and dual-protocol clients (OpenAI-compatible + native Anthropic)
Tool registry Decorator-registered tools with risk levels; high-risk tools disabled by default, allowlist-only
Hooks & policy Composable Hooks subclasses mounted via config.hooks; observer + tool-policy lifecycle points (before_tool refuse / after_tool redact); extension discoverability via info / GET /registries
Sessions Local JSON persistence, --continue / --resume, chat REPL, crash-safe checkpointing, per-session working-directory hint
Interaction ask_question structured clarification + HITL tool confirmation, shared CLI/web infrastructure
Context Trigger/strategy compaction registries, tool-result clipping, overflow recovery, max_steps bound, sub-turn folding
Budget Persistent per-LLM cost ledger with per-profile prices and hard cost limits
Prompts System prompt assembly with always-applied rule-file injection, current date/time, and the working-directory hint
Observability JSONL trace + token/cost counts; opt-in local-only debug log
CLI run, chat, info, test-llm, set-key, usage (plus serve, mcp, memory when the matching modules are enabled)
Runnability uv.lock + .python-version, Dockerfile + .dockerignore + devcontainer, requirements.txt pip fallback, mock-LLM test suite, one-click launcher script

Optional modules (spec toggles; disabled = absent from code and dependencies)

Module Description
Web interface FastAPI + SSE chat with token-level streaming, collapsible reasoning and tool-call panels, a session sidebar with parallel per-session conversations, a chat toolbar (paradigm, generation model, working-directory picker), and a paged bilingual (en/zh) /config panel — LLM, Context, Tools, MCP, Hooks, Paradigms, Prompts, Budget, Memory, Observability, and System tabs (MCP and Memory appear only when those modules are enabled). Edits apply live and are written back to config.yaml with comments preserved
MCP tools Model Context Protocol client over stdio / HTTP / SSE, allowlist and risk flags, curated catalog prefill, runtime server management with health probes and hot reconnect
Agent Skills SKILL.md discovery, metadata injection, and on-demand loading
Long-term memory Self-maintained markdown note with tool-driven writes, policy shaping, consolidation, and a pluggable backend registry

Architecture

flowchart LR
  user[User] --> entry["CLI / terminal wizard / web wizard"]
  entry --> spec["HarnessSpec (Pydantic, YAML)"]
  spec --> gen["Generator (Jinja2)"]
  templates["Template library (no agent framework)"] --> gen
  catalog["MCP catalog"] -.-> gen
  gen --> repo["Generated repository (independently owned)"]
  subgraph repoInner [Generated repository]
    loop["loop.py + paradigms/ (agent / plan / ask)"]
    llm["llm.py + llm_anthropic.py (dual protocol)"]
    tools["tools.py (+ mcp.py stdio/http/sse)"]
    sessions["session.py + interaction.py"]
    ctx["context.py + usage.py + trace.py"]
    hooks["hooks.py + extensions.py (policy + discovery)"]
    cli["interfaces/cli.py"]
    web["interfaces/web.py (SSE chat + /config)"]
    extras["skills.py / memory.py (opt-in)"]
    docker["Dockerfile + devcontainer"]
  end
  repo --> repoInner

The generator and its output are strictly separated layers. The spec decides structure (which capabilities are compiled in); the generated product's config.yaml is the runtime authority for behavior (models, prompts, tool allowlists, context parameters, prices and limits) — all adjustable without regeneration.

Technology stack

Generator

  • Python ≥ 3.11, managed end-to-end with uv
  • Typer (CLI), questionary (interactive terminal wizard)
  • Jinja2 (template rendering)
  • Pydantic v2 + PyYAML (HarnessSpec validation and serialization)
  • FastAPI + uvicorn (web wizard, optional [wizard] extra — never shipped into products)

Generated product

  • Runtime: openai (Chat Completions, provider-agnostic via base_url), anthropic (native Messages), pydantic + pydantic-settings, pyyaml, typer
  • Web interface (when enabled): fastapi, uvicorn, ruamel.yaml (comment-preserving config write-back); the UI is a single static page (Tailwind CSS via CDN, no build step)
  • MCP (when enabled): the official mcp SDK
  • Tests: pytest with an offline mock LLM (dev dependency group; not a runtime dependency)
  • Environment contract: uv (uv.lock + .python-version) with Docker and requirements.txt fallbacks

The generated pyproject.toml contains no agent-orchestration framework, and the test suite asserts it.

Getting started

Prerequisites

  • uv (uv provisions the correct Python automatically; no system Python required)
  • Docker (optional, for containerized runs)

Installation

HarnessSmith is on PyPI. The Quick start needs no install — uvx runs it on demand and the cloned launcher bootstraps uv for you. For a persistent install of the generator CLI, any of these work:

uv tool install harnessmith        # install the CLI with uv
pip install harnessmith            # or with pip
# from source (for development):
git clone https://github.com/EpisodeYu/HarnessSmith.git && cd HarnessSmith && uv sync

Generating a harness

The Quick start covers the wizard and the one-click launcher. From an installed or cloned generator, every generation surface is a single command:

uv run harnessmith wizard                                   # web wizard (recommended; uv sync --extra wizard)
uv run harnessmith new                                      # interactive terminal wizard
uv run harnessmith new my-agent --preset coding-assistant   # non-interactive, from a bundled preset
uv run harnessmith new my-agent --spec ./harness.spec.yaml  # non-interactive, from a hand-written spec
uv run harnessmith doctor                                   # preflight check of the local toolchain
  • The web wizard (wizard) and the terminal wizard (new with no --spec / --preset) collect the same structural choices — display name, paradigms, web interface, MCP, skills, memory — and apply identical defaults; the web wizard suits desktops, the terminal wizard suits headless servers.
  • After rendering, the generator locks dependencies and runs a smoke verification (uv sync, import check, one mock function-calling turn, pytest). Pass --no-verify to skip it, for example when offline.
  • Secrets are never collected by any wizard and never enter the spec, the generated config.yaml, or git.
  • The --preset shortcut is for scripted or CI generation, not the recommended start: the bundled coding-assistant preset enables MCP with every tool allowlisted — shell and file writes included — and no confirmation gate (confirm: none). Prefer a wizard, or review config.yaml (tighten the tool allowlist, set confirm: high) before pointing it at a real model.

Running the generated harness

The Quick start gets you chatting via the launcher or serve. Beyond that, a few essentials — provision the venv (the launcher does it for you), store a key, probe the model, or run fully containerized:

cd my-agent && uv sync                   # provision Python + an isolated venv
uv run my-agent set-key OPENAI_API_KEY   # store a key in .env (never echoed) — or use the web /config panel
uv run my-agent test-llm                 # probe each configured model

# run fully containerized instead (Dockerfile generated by default):
docker build -t my-agent . && docker run --rm -it my-agent

Model and endpoint live in config.yaml (or the web /config LLM tab): set model, point base_url_env / api_key_env at the right env vars, and choose provider: openai or anthropic per profile. --mock runs offline without a key on run, chat, and serve. Every command is in the reference below.

Product CLI reference

Command Purpose
run [PROMPT] Execute one turn. Options: --mode agent|plan|ask, --stream, --continue, --resume <id>, --role, --cwd, --mock
chat Multi-turn REPL with persistent sessions; /cwd sets the working-directory hint; Ctrl-D or /exit to quit
serve Start the web interface (--host, --port, --open); web-enabled products
info Introspect every extension point — registered tools, paradigms, context strategies/conditions, memory backends, imported extensions, and mounted hooks
test-llm Connectivity and capability probe for each configured LLM profile
set-key <ENV_NAME> Write a secret into .env without echoing it or touching git
usage Inspect or clear the persistent per-LLM cost ledger
memory show|clear|path|consolidate Manage the long-term memory note; memory-enabled products
mcp status / mcp warm Probe MCP server health / pre-warm launchers; MCP-enabled products

Configuration model

Layer File Role
Generation-time spec harness.spec.yaml The recipe: which capabilities are compiled into the product, plus initial values. A snapshot is kept in the generated repository
Runtime configuration config.yaml The authority for behavior: LLM profiles and roles, prompts and rule files, tool allowlist, context strategy, hooks, MCP servers, prices and cost limits, observability. Editable by hand or via the web /config panel (live application + comment-preserving write-back)
Secrets .env (gitignored) The only location for real credentials. config.yaml and the spec reference environment-variable names only

Structural changes (adding or removing an interface or module) require regeneration; behavioral changes never do.

Security model

  • Secrets never enter git. Real values live exclusively in the gitignored .env; all other files reference environment-variable names. set-key and the web panel's key writer are write-only and never echo values. Traces and the debug log record no secrets.
  • High-risk tools are off by default. Shell and file-writing tools ship disabled and require explicit allowlisting; the runtime allowlist can only narrow the set compiled in at generation time, never extend it.
  • Human-in-the-loop confirmation (tools.confirm: none|high|all|<tool names>) intercepts risky tool calls with allow once / reject / allow for session / allow always; non-interactive contexts reject by default. Confirmation is a guardrail for trusted operators, not a security boundary — hard isolation belongs to Docker or to excluding the capability at generation time.
  • The web interface targets local, trusted use. The /config panel and the MCP management page can modify runtime behavior and launch local processes; do not expose them to untrusted networks.

License

MIT © 2026 EpisodeYu


HarnessSmith

HarnessSmith

锻造你自己的 agent harness。

一个"配置即生成"的代码生成器,产出一套你完全拥有的独立 agent harness 代码仓库——不绑定任何 agent 编排框架(无 LangChain、LangGraph、ADK),生成后不再依赖 HarnessSmith。

PyPI version PyPI downloads Python License: MIT GitHub stars

English | 中文


快速上手

选择启动配置向导的方式——两种都从头生成一套独立、无框架锁定的 harness(能力由你勾选:范式、Web 界面、MCP、技能、记忆),并在交付前完成冒烟自检。

克隆仓库并运行一键启动器——跨平台(Windows、macOS、Linux),无需记命令,双击文件也能跑。它会在首次使用时安装 uv,随后询问你要 Web 向导(推荐)还是终端向导:

git clone https://github.com/EpisodeYu/HarnessSmith.git && cd HarnessSmith
./HarnessSmith.sh     # macOS / Linux  —— 或直接双击
HarnessSmith.bat      # Windows        —— 或直接双击

或用 uv 按需运行——无需克隆,不留常驻安装:

uvx --from "harnessmith[wizard]" harnessmith wizard --open   # Web 向导(推荐)
uvx harnessmith new                                          # 或终端交互向导

随后启动它。生成的仓库自己也带一个一键启动器,以项目名命名(如 macOS / Linux 的 My-Agent.sh、Windows 的 My-Agent.bat):双击或直接运行,它会自动同步依赖并打开 Web 聊天——模型与 key 直接在 /config 面板里配置。或用 uv 启动:

cd my-agent                    # 你填写的项目名
uv run my-agent serve --open   # 推荐:在浏览器里配置模型与 key 并对话
uv run my-agent chat           # 可选:在终端对话

使用指南了解全部方式,包括用手写 spec 非交互生成。

为什么选 HarnessSmith?

多数"agent 脚手架"给你的是一个绑死在某框架上、无法移除的应用。HarnessSmith 生成的是一个属于你的仓库——可读、可改、且不含任何 agent 编排框架。

HarnessSmith LangGraph 脚手架 ADK / 静态模板
agent 框架锁定 ——属于你的纯 Python 循环 LangGraph 运行时 ADK / 多框架
每一行都可改可删 可以——自包含仓库 受限——应用建在框架之上 受限
配置即生成(只生成选中的能力) 否——固定模板
核心循环体量 约 150–300 行 被框架抽象 被框架抽象
生成后对生成器的运行期依赖 框架运行时 框架运行时
自带测试 + 锁文件 + Dockerfile + 冒烟自检 视情况 视情况
双 LLM 协议、运行期可切(OpenAI + Anthropic) 视情况 视情况

概述

HarnessSmith 是 agent harness 的生成器,定位类似 create-next-app。通过 Web 向导、终端交互向导、preset 或手写 YAML 采集一份规格(HarnessSpec),HarnessSmith 据此渲染出一个完整、独立的 Python 代码仓库——可读、可改、可测试、可独立运行。生成的项目与 HarnessSmith 没有任何运行期关系:生成即脱离。

设计定位

  • 无 agent 框架锁定。 生成代码对任何 agent 编排框架零依赖,循环是属于你的普通 Python 代码。通用库(OpenAI SDK、Pydantic、Typer、FastAPI)只作为库使用,不接管控制流。
  • 代码归你所有。 产出是带有自有测试、锁文件、Dockerfile 与文档的自包含仓库,每一行都可以阅读、修改或删除。
  • 配置即生成。 能力在生成期选择,生成器只渲染被选中的部分;关闭的功能不留任何痕迹——没有模块、没有依赖、没有死代码。
  • 默认极薄。 默认产物是最小但完整可跑的 harness,核心循环维持在数百行以内;较重的能力(MCP、Web 界面、技能、记忆)均为 spec 开关式可选项。

亮点

完整能力清单 —— 点击展开
  • 原生 function calling —— 循环通过 API 的 tool_calls(TAO/ReAct 语义)驱动模型,而非文本解析。
  • 双 LLM 协议,运行期可切 —— 每个产物同时内置 OpenAI Chat Completions 客户端(经 base_url 对接 vLLM、Together、Groq、LiteLLM 等任意兼容端点)与原生 Anthropic Messages 客户端;每个 LLM profile 在运行期配置中选择 provider,无需重新生成。
  • 推理过程一等公民 —— thinking/reasoning 增量实时呈现(CLI 状态行、Web 可折叠推理面板),并在工具调用多轮间保留 reasoning_content,兼容有此要求的模型。
  • 多范式运行时 —— agent(默认工具调用循环)、planask(均只读),每轮可切(--mode / Web 下拉)。范式存放于薄注册表,用户以 @register_paradigm 自行扩展,不触碰内置实现。
  • 会话持久化与续聊 —— 每次对话本地落盘;以 --continue / --resume <id>、多轮 chat REPL 或 Web 会话侧栏(自动起标题、重命名、删除)续聊。Web 界面支持多会话并行——每个会话独立流式输出,切换侧栏不会打断后台运行。中断的运行具备崩溃安全:状态在消息边界写入检查点,恢复时自动修复。
  • 停止 / 继续 / 重问 —— 回合中途可取消(协作式取消,流式输出一并终止),之后携带完整上下文继续;Web 界面支持就地编辑任一历史提问并从该点重新生成。
  • 人在环交互 —— 内置 ask_question 工具让模型向用户提出结构化澄清问题;工具调用确认(允许一次 / 拒绝 / 本会话允许 / 永久允许)拦截高风险工具,非交互场景默认拒绝。
  • 按 LLM 持久成本核算 —— 用量账本按 LLM profile 跨运行累计 token;成本由各 profile 单价派生,达到 cost_limit 即在下次调用前阻止该模型。经 Web Budget 页或 usage CLI 管理。
  • 上下文管理 —— 可组合触发条件(window_pctmax_tokensmax_turns,以真实 token 用量驱动)决定何时压缩;策略(truncatesummarizenone)决定如何压缩;两者均为用户可扩展的注册表。超大工具结果在入历史前截断,溢出时按需强制压缩;max_steps 阀值约束失控的工具循环,压缩还能在单个长回合内折叠(子回合 keep_last_steps),使一个 agentic 回合也能留在窗口内。
  • 可组合 hooks 与薄 tool-policy 层 —— 通过 config.hooks 挂载一个或多个 Hooks 子类(子类化并挂载,无 @register_hook);五个生命周期点(before_step / after_step / before_tool / after_tool / on_error)。before_tool 可拒绝一次调用、after_tool 可改写/脱敏结果——一层代码级 policy 门禁,不引入 middleware 机制——多个 hook 按顺序组合。Web 界面有专门的 Hooks tab 并带隐私安全的执行日志;info(CLI)与 GET /registries(Web)呈现所有扩展点——工具、范式、上下文策略/触发条件、记忆后端、已导入的 extensions 与已挂载的 hooks。
  • 按会话工作目录 —— 可选的工作目录提示(CLI --cwdchat REPL 的 /cwd,或 Web 聊天工具栏的目录浏览器)注入系统提示,作为指引而非沙箱。当前日期/时间也会每轮注入。
  • 不臃肿的工具生态 —— 装饰器注册的工具注册表带按工具风险分级,另有可选 MCP 客户端(stdio、HTTP、SSE 三种传输)与精选 catalog(免密钥多引擎网页搜索、fetch、git、时间、Desktop Commander、GitHub)。默认网页搜索 server 免密钥且多引擎(Bing/百度/DuckDuckGo/Brave/搜狗/…)带自动 failover,因此在某些引擎慢或不可达的网络下也能正常工作。Node 系 MCP server 会被 npm install 进固定的按 server 独立目录、再用 node 直接启动(不走临时 npx 缓存——它在 Windows 上不可靠)。MCP server 运行期管理:健康状态、增删改、热重连(Web 面板),CLI 侧 mcp status
  • Agent Skills —— 可选支持开放的 SKILL.md 标准与渐进披露;技能是纯文件,不引入框架。
  • 跨会话记忆 —— 可选的自维护长期笔记,每轮注入系统提示,经工具写入,在会话边界由专用 LLM 角色整理,并可通过薄 @register_memory 注册表替换后端。
  • 全局规则常驻注入 —— markdown 规则文件(AGENTS.md / CLAUDE.md / .cursor/rules 惯例)注入每轮系统提示。
  • 完整可观测性 —— 每次运行产出 JSONL trace 与 token/成本计数;可选的仅本地 debug 日志记录生命周期事件(名称、计数、耗时),绝不记录消息内容、工具参数或密钥。
  • 交付前验证可运行 —— 生成器锁定依赖并对每个新仓库执行冒烟验证(uv sync、import 检查、一次 mock function-calling、pytest),全绿才视为就绪。

生成内容

核心(始终生成)

能力 说明
Agent 循环 原生 function-calling 循环,含范式分发、生命周期 hook 与优雅停止(含 max_steps 阀值)
LLM 层 profile 注册表 + 角色路由(generationcompaction,以及可选 title / memory 角色),按 profile 的采样参数、超时/重试/fallback,双协议客户端(OpenAI 兼容 + 原生 Anthropic)
工具注册表 装饰器注册 + 风险分级;高风险工具默认关闭,仅 allowlist 显式开启
Hooks 与 policy config.hooks 挂载的可组合 Hooks 子类;observer + tool-policy 生命周期点(before_tool 拒绝 / after_tool 脱敏);经 info / GET /registries 提供扩展可发现性
会话 本地 JSON 持久化、--continue / --resumechat REPL、崩溃安全检查点、按会话工作目录提示
交互层 ask_question 结构化澄清 + HITL 工具确认,CLI/Web 共用同一套底座
上下文 触发条件/策略双注册表、工具结果截断、溢出自救、max_steps 约束、子回合折叠
预算 按 LLM 持久成本账本,按 profile 设单价与硬性成本上限
提示词 系统提示拼装 + 规则文件常驻注入、当前日期/时间、工作目录提示
可观测性 JSONL trace + token/成本计数;可选仅本地 debug 日志
CLI runchatinfotest-llmset-keyusage(启用对应模块时另有 servemcpmemory)
可运行性 uv.lock + .python-version、Dockerfile + .dockerignore + devcontainer、requirements.txt pip 兜底、mock LLM 测试套件、一键启动脚本

可选模块(spec 开关;关闭 = 代码与依赖中均不存在)

模块 说明
Web 界面 FastAPI + SSE 聊天,token 级流式、可折叠推理与工具调用面板、支持多会话并行的会话侧栏、聊天工具栏(范式、生成模型、工作目录选择),以及分页双语(中/英)/config 面板——LLM、Context、Tools、MCP、Hooks、Paradigms、Prompts、Budget、Memory、Observability、System 各 tab(MCP 与 Memory 仅在对应模块启用时出现)。修改即时生效并回写 config.yaml(保留注释)
MCP 工具 Model Context Protocol 客户端(stdio / HTTP / SSE),allowlist 与风险标记,精选 catalog 预填,运行期 server 管理(健康探测 + 热重连)
Agent Skills SKILL.md 发现、元数据注入与按需加载
长期记忆 自维护 markdown 笔记,工具驱动写入、策略塑形、整理压缩,后端可插拔

架构

生成器与产物是严格分离的两层。spec 决定结构(哪些能力被编译进产物);产物的 config.yaml 是行为的运行期权威(模型、提示词、工具 allowlist、上下文参数、单价与限额)——全部可在不重新生成的前提下调整。架构图见英文部分 Architecture

技术栈

生成器

  • Python ≥ 3.11,全链路使用 uv 管理
  • Typer(CLI)、questionary(终端交互向导)
  • Jinja2(模板渲染)
  • Pydantic v2 + PyYAML(HarnessSpec 校验与序列化)
  • FastAPI + uvicorn(Web 向导,可选 [wizard] extra——绝不进入产物)

生成产物

  • 运行期:openai(Chat Completions,经 base_url 对接任意兼容端点)、anthropic(原生 Messages)、pydantic + pydantic-settingspyyamltyper
  • Web 界面(启用时):fastapiuvicornruamel.yaml(保留注释的配置回写);前端为单一静态页面(Tailwind CSS CDN,无构建步骤)
  • MCP(启用时):官方 mcp SDK
  • 测试:pytest + 离线 mock LLM(dev 依赖组,非运行期依赖)
  • 环境契约:uv(uv.lock + .python-version),Docker 与 requirements.txt 兜底

生成的 pyproject.toml 不含任何 agent 编排框架,且测试套件对此作出断言。

使用指南

前置条件

  • uv(uv 会自动下载匹配的 Python,无需预装系统 Python)
  • Docker(可选,用于容器化运行)

安装

HarnessSmith 已发布到 PyPI快速上手无需安装——uvx 按需运行,克隆后的启动器也会自动准备 uv。若要常驻安装生成器 CLI,以下任选其一:

uv tool install harnessmith        # 用 uv 安装 CLI
pip install harnessmith            # 或用 pip
# 从源码(开发用):
git clone https://github.com/EpisodeYu/HarnessSmith.git && cd HarnessSmith && uv sync

生成 harness

快速上手已覆盖向导与一键启动器。从已安装或克隆的生成器出发,每个生成入口都是单命令:

uv run harnessmith wizard                                   # Web 向导(推荐;uv sync --extra wizard)
uv run harnessmith new                                      # 终端交互向导
uv run harnessmith new my-agent --preset coding-assistant   # 非交互,使用内置 preset
uv run harnessmith new my-agent --spec ./harness.spec.yaml  # 非交互,使用手写 spec
uv run harnessmith doctor                                   # 本机工具链预检
  • Web 向导(wizard)与 终端向导(new 不带 --spec / --preset)采集同一组结构选项——显示名、范式、Web 界面、MCP、技能、记忆——并应用一致的默认值;Web 向导适合桌面环境,终端向导适合无图形界面的服务器。
  • 渲染完成后,生成器锁定依赖并执行冒烟验证(uv sync、import 检查、一次 mock function-calling、pytest);离线等场景可用 --no-verify 跳过。
  • 任何向导都不采集密钥;密钥不会进入 spec、生成的 config.yaml 或 git。
  • --preset 是面向脚本 / CI 生成的捷径,并非推荐起点:内置的 coding-assistant preset 会开启 MCP 并把每个工具都加入 allowlist(含 shell 与写文件),且不设确认门禁(confirm: none)。请优先用向导,或在接入真实模型前先检查 config.yaml(收窄 allowlist、把 confirm 设为 high)。

运行生成的 harness

快速上手已用启动器或 serve 让你开始对话。此外还有几条实用命令——准备 venv(启动器会替你做)、存 key、探测模型,或完全容器化运行:

cd my-agent && uv sync                   # 准备 Python 与隔离 venv
uv run my-agent set-key OPENAI_API_KEY   # 把 key 写入 .env(不回显)——也可在 Web /config 面板里配置
uv run my-agent test-llm                 # 探测各配置模型

# 改为完全容器化运行(默认生成 Dockerfile):
docker build -t my-agent . && docker run --rm -it my-agent

模型与端点在 config.yaml(或 Web /config 的 LLM tab)配置:设置 model,将 base_url_env / api_key_env 指向对应环境变量,并为每个 profile 选择 provider: openaianthropicrunchatserve 均支持 --mock,无需 key 即可离线试用。完整命令见下方参考

产物 CLI 参考

命令 用途
run [PROMPT] 执行一轮。选项:--mode agent|plan|ask--stream--continue--resume <id>--role--cwd--mock
chat 多轮 REPL,会话自动持久化;/cwd 设置工作目录提示;Ctrl-D/exit 退出
serve 启动 Web 界面(--host--port--open);启用 Web 的产物
info 内省所有扩展点——已注册的工具、范式、上下文策略/触发条件、记忆后端、已导入的 extensions 与已挂载的 hooks
test-llm 对每个 LLM profile 做连通性与能力探测
set-key <ENV_NAME> 将密钥写入 .env,不回显、不触碰 git
usage 查看或清空按 LLM 的持久成本账本
memory show|clear|path|consolidate 管理长期记忆笔记;启用记忆的产物
mcp status / mcp warm 探测 MCP server 健康 / 预热启动器;启用 MCP 的产物

配置模型

文件 角色
生成期 spec harness.spec.yaml 配方:哪些能力被编译进产物,以及初始值;快照保留在生成的仓库中
运行期配置 config.yaml 行为的权威来源:LLM profile 与角色、提示词与规则文件、工具 allowlist、上下文策略、hooks、MCP server、单价与成本上限、可观测性。可手改,也可经 Web /config 面板修改(即时生效 + 保留注释回写)
密钥 .env(gitignored) 真实凭证的唯一存放处;config.yaml 与 spec 仅引用环境变量名称

结构性变更(增删接口或模块)需要重新生成;行为性变更永远不需要。

安全模型

  • 密钥不入 git。 真实值仅存于 gitignored 的 .env;其余文件只引用环境变量名。set-key 与 Web 面板的密钥写入均为只写、不回显;trace 与 debug 日志不记录密钥。
  • 高风险工具默认关闭。 shell 与写文件类工具默认禁用,需显式 allowlist 开启;运行期 allowlist 只能在生成期编译进的集合内收窄,永远不能扩张。
  • 人在环确认(tools.confirm: none|high|all|<工具名>)以"允许一次 / 拒绝 / 本会话允许 / 永久允许"拦截高风险工具调用;非交互场景默认拒绝。确认机制是面向可信操作者的护栏,不是安全边界——硬隔离依靠 Docker,或在生成期就不编译该能力。
  • Web 界面面向本地可信使用。 /config 面板与 MCP 管理页可修改运行期行为并启动本地进程,请勿暴露给不可信网络。

许可

MIT © 2026 EpisodeYu

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

harnessmith-0.2.0.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

harnessmith-0.2.0-py3-none-any.whl (408.4 kB view details)

Uploaded Python 3

File details

Details for the file harnessmith-0.2.0.tar.gz.

File metadata

  • Download URL: harnessmith-0.2.0.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for harnessmith-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7af5b024beffce2e01ad4399a960416dc0b10c18320e40d63fdbc20f2dc733ea
MD5 c31f17851ad486fa37d1eced54a7ec00
BLAKE2b-256 bfbb942e9d3a2df8f43248e0e5b880266ca5520f6e3417d0a866caba0bd6e0c5

See more details on using hashes here.

File details

Details for the file harnessmith-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: harnessmith-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 408.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for harnessmith-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5d06429535c08d122d86129fffd1ee3da5e072fffedaf77a4484663691bbb839
MD5 c325848da9941cef1f9f1d47e0ac8d13
BLAKE2b-256 7b32b82c5daf7438fd915377a19a092fe411d7d2aee51d8cce392f5a26468cc3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page