Forge your own agent harness — a config-to-code generator with no agent-framework lock-in.
Project description
HarnessSmith
Forge your own agent harness.
A config-to-code generator that produces a standalone, framework-free agent harness you fully own — no LangChain, no LangGraph, no ADK, and no dependency on HarnessSmith after generation.
English | 中文
Quick start
Pick how to launch the setup wizard — either way you build a standalone, framework-free harness from scratch (you choose the capabilities: paradigms, web UI, MCP, skills, memory) that is smoke-verified before handover.
Clone the repo and run the one-click launcher — cross-platform (Windows, macOS, Linux), nothing to memorize; double-clicking the file works too. It installs uv on first use, then asks whether you want the web wizard (recommended) or the terminal wizard:
git clone https://github.com/EpisodeYu/HarnessSmith.git && cd HarnessSmith
./HarnessSmith.sh # macOS / Linux — or double-click it
HarnessSmith.bat # Windows — or double-click it
Or run it on demand with uv — no clone, nothing installed permanently:
uvx --from "harnessmith[wizard]" harnessmith wizard --open # web wizard (recommended)
uvx harnessmith new # or the interactive CLI wizard
Then start it. The generated repo ships its own one-click launcher too, named after your project (e.g. My-Agent.sh on macOS / Linux, My-Agent.bat on Windows): double-click or run it and it auto-syncs and opens the web chat — set your model and key right there in the /config panel. Or drive it with uv:
cd my-agent # the name you chose
uv run my-agent serve --open # recommended: configure your model and chat, all in the browser
uv run my-agent chat # optional: chat in the terminal
See Getting started for every option, including non-interactive generation from a hand-written spec.
Why HarnessSmith?
Most "agent starters" hand you an app wired to a framework you cannot remove. HarnessSmith generates a repository that is yours — readable, editable, and free of any agent-orchestration framework.
| HarnessSmith | LangGraph scaffolds | ADK / static templates | |
|---|---|---|---|
| Agent-framework lock-in | None — a plain Python loop you own | LangGraph runtime | ADK / mixed frameworks |
| Edit or delete any line | Yes — self-contained repo | Limited — app sits on the framework | Limited |
| Config-to-code (only selected capabilities are generated) | Yes | No — fixed template | No |
| Core loop size | ~150–300 lines | Abstracted by the framework | Abstracted by the framework |
| Runtime dependency on the generator | None | Framework runtime | Framework runtime |
| Tests + lockfile + Dockerfile, smoke-verified on generation | Yes | Varies | Varies |
| Dual LLM protocol, switchable at runtime (OpenAI + Anthropic) | Yes | Varies | Varies |
Overview
HarnessSmith is a generator for the agent harness, in the spirit of create-next-app. A specification (HarnessSpec) is captured through a web wizard, an interactive terminal wizard, a preset, or a hand-written YAML file; HarnessSmith then renders a complete, independent Python repository — readable, editable, testable, and runnable on its own. The generated project is not a consumer of HarnessSmith: once generated, it has zero relationship with the generator.
Design positioning
- No agent-framework lock-in. The generated code has zero dependency on any agent-orchestration framework. The loop is plain Python that you own. Ordinary general-purpose libraries (OpenAI SDK, Pydantic, Typer, FastAPI) are used as libraries, not as frameworks that own your control flow.
- Own your code. The output is a self-contained repository with its own tests, lockfile, Dockerfile, and documentation. Every line can be read, changed, or deleted.
- Config-to-code. Capabilities are selected at generation time; the generator renders only what was selected. A feature that is switched off leaves no trace — no module, no dependency, no dead code.
- Thin by default. The default product is a minimal, fully runnable harness whose core loop stays in the low hundreds of lines. Heavier capabilities (MCP, web interface, skills, memory) are opt-in spec toggles.
Highlights
Full capability list — click to expand
- Native function calling — the loop drives the model through the API's
tool_calls(TAO/ReAct semantics), not through text parsing. - Dual LLM protocol, runtime-switchable — every product ships both an OpenAI Chat Completions client (provider-agnostic via
base_url: vLLM, Together, Groq, LiteLLM, any compatible endpoint) and a native Anthropic Messages client. Each LLM profile selects itsproviderin runtime configuration; no regeneration required. - Reasoning streams as a first-class signal — thinking/reasoning deltas are surfaced live (a status line in the CLI; a collapsible reasoning panel in the web UI), and
reasoning_contentis preserved across tool-calling turns for models that require it. - Multi-paradigm runtime —
agent(default tool-calling loop),planandask(both read-only), selectable per turn (--mode/ web dropdown). Paradigms live in a thin registry; users add their own with@register_paradigmwithout touching the built-ins. - Sessions and resumption — every conversation persists locally; resume with
--continue/--resume <id>, in the multi-turnchatREPL, or from the web session sidebar (automatic titling, rename, delete). In the web UI, conversations run in parallel — each session streams independently and switching the sidebar never interrupts a background run. Interrupted runs are crash-safe: state is checkpointed at message boundaries and repaired on resume. - Stop / continue / re-ask — a run can be cancelled mid-turn (cooperative cancellation that also terminates streaming), continued later with full context, or — in the web UI — re-asked by editing any earlier prompt and regenerating from that point.
- Human-in-the-loop — a built-in
ask_questiontool lets the model ask the user structured clarifying questions, and tool-call confirmation (allow once / reject / allow for session / allow always) gates risky tools. Non-interactive contexts fail closed. - Persistent per-LLM cost accounting — a usage ledger accumulates token counts per LLM profile across runs; cost is derived from per-profile prices, and a per-profile
cost_limitblocks the model before the next call once reached. Managed from the web Budget page or theusageCLI. - Context management — combinable triggers (
window_pct,max_tokens,max_turns; driven by real token usage) select when to compact; strategies (truncate,summarize,none) define how; both are user-extensible registries. Oversized tool results are clipped before entering history, overflow recovery compacts on demand, amax_stepsvalve bounds runaway tool loops, and compaction folds within a single long turn (sub-turnkeep_last_steps) so even one agentic turn stays inside the window. - Composable hooks and a thin tool-policy layer — mount one or more
Hookssubclasses throughconfig.hooks(subclass-and-mount, no@register_hook); five lifecycle points (before_step/after_step/before_tool/after_tool/on_error).before_toolmay refuse a call andafter_toolmay redact a result — a code-level policy gate with no middleware machinery — and multiple hooks compose in order. The web UI has a dedicated Hooks tab with a privacy-safe execution log;info(CLI) andGET /registries(web) surface every extension point — tools, paradigms, context strategies/conditions, memory backends, imported extensions, and mounted hooks. - Per-session working directory — an optional working-directory hint (CLI
--cwd, thechatREPL's/cwd, or the web chat toolbar with a directory browser) is injected into the system prompt as guidance, not a sandbox. The current date/time is injected each turn too. - Tool ecosystem without built-in bloat — a decorator-based tool registry with per-tool risk levels, plus an opt-in MCP client (stdio, HTTP, and SSE transports) with a curated catalog (keyless multi-engine web search, fetch, git, time, Desktop Commander, GitHub). The default web-search server is keyless and multi-engine (Bing/Baidu/DuckDuckGo/Brave/Sogou/…) with automatic failover, so it keeps working when any single engine is slow or unreachable on a given network. Node-based MCP servers are
npm installed once into a stable per-server dir and launched directly withnode(never the ephemeralnpxcache, which is unreliable on Windows). MCP servers are managed at runtime: health status, add/edit/remove, and hot reconnection from the web panel;mcp statusfrom the CLI. - Agent Skills — opt-in support for the open
SKILL.mdstandard with progressive disclosure; skills are plain files, no framework involved. - Cross-session memory — an opt-in, self-maintained long-term note injected each turn, written through tools, consolidated by a dedicated LLM role at session boundaries, and replaceable via a thin
@register_memorybackend registry. - Always-applied project rules — markdown rule files (
AGENTS.md/CLAUDE.md/.cursor/rulesconventions) injected into every system prompt. - Full observability — a JSONL trace per run with token/cost accounting, and an opt-in, local-only debug log that records lifecycle events (names, counts, durations) and never message content, tool arguments, or secrets.
- Verified runnable before handover — the generator locks dependencies and smoke-tests every new repository (
uv sync, import check, a mock function-calling turn,pytest) before declaring it ready.
What gets generated
Core (always present)
| Capability | Description |
|---|---|
| Agent loop | Native function-calling loop with paradigm dispatch, lifecycle hooks, and graceful stop conditions (including a max_steps valve) |
| LLM layer | Profile registry with role routing (generation, compaction, plus optional title / memory roles), per-profile sampling parameters, timeout/retry/fallback, and dual-protocol clients (OpenAI-compatible + native Anthropic) |
| Tool registry | Decorator-registered tools with risk levels; high-risk tools disabled by default, allowlist-only |
| Hooks & policy | Composable Hooks subclasses mounted via config.hooks; observer + tool-policy lifecycle points (before_tool refuse / after_tool redact); extension discoverability via info / GET /registries |
| Sessions | Local JSON persistence, --continue / --resume, chat REPL, crash-safe checkpointing, per-session working-directory hint |
| Interaction | ask_question structured clarification + HITL tool confirmation, shared CLI/web infrastructure |
| Context | Trigger/strategy compaction registries, tool-result clipping, overflow recovery, max_steps bound, sub-turn folding |
| Budget | Persistent per-LLM cost ledger with per-profile prices and hard cost limits |
| Prompts | System prompt assembly with always-applied rule-file injection, current date/time, and the working-directory hint |
| Observability | JSONL trace + token/cost counts; opt-in local-only debug log |
| CLI | run, chat, info, test-llm, set-key, usage (plus serve, mcp, memory when the matching modules are enabled) |
| Runnability | uv.lock + .python-version, Dockerfile + .dockerignore + devcontainer, requirements.txt pip fallback, mock-LLM test suite, one-click launcher script |
Optional modules (spec toggles; disabled = absent from code and dependencies)
| Module | Description |
|---|---|
| Web interface | FastAPI + SSE chat with token-level streaming, collapsible reasoning and tool-call panels, a session sidebar with parallel per-session conversations, a chat toolbar (paradigm, generation model, working-directory picker), and a paged bilingual (en/zh) /config panel — LLM, Context, Tools, MCP, Hooks, Paradigms, Prompts, Budget, Memory, Observability, and System tabs (MCP and Memory appear only when those modules are enabled). Edits apply live and are written back to config.yaml with comments preserved |
| MCP tools | Model Context Protocol client over stdio / HTTP / SSE, allowlist and risk flags, curated catalog prefill, runtime server management with health probes and hot reconnect |
| Agent Skills | SKILL.md discovery, metadata injection, and on-demand loading |
| Long-term memory | Self-maintained markdown note with tool-driven writes, policy shaping, consolidation, and a pluggable backend registry |
Architecture
flowchart LR
user[User] --> entry["CLI / terminal wizard / web wizard"]
entry --> spec["HarnessSpec (Pydantic, YAML)"]
spec --> gen["Generator (Jinja2)"]
templates["Template library (no agent framework)"] --> gen
catalog["MCP catalog"] -.-> gen
gen --> repo["Generated repository (independently owned)"]
subgraph repoInner [Generated repository]
loop["loop.py + paradigms/ (agent / plan / ask)"]
llm["llm.py + llm_anthropic.py (dual protocol)"]
tools["tools.py (+ mcp.py stdio/http/sse)"]
sessions["session.py + interaction.py"]
ctx["context.py + usage.py + trace.py"]
hooks["hooks.py + extensions.py (policy + discovery)"]
cli["interfaces/cli.py"]
web["interfaces/web.py (SSE chat + /config)"]
extras["skills.py / memory.py (opt-in)"]
docker["Dockerfile + devcontainer"]
end
repo --> repoInner
The generator and its output are strictly separated layers. The spec decides structure (which capabilities are compiled in); the generated product's config.yaml is the runtime authority for behavior (models, prompts, tool allowlists, context parameters, prices and limits) — all adjustable without regeneration.
Technology stack
Generator
- Python ≥ 3.11, managed end-to-end with uv
- Typer (CLI), questionary (interactive terminal wizard)
- Jinja2 (template rendering)
- Pydantic v2 + PyYAML (
HarnessSpecvalidation and serialization) - FastAPI + uvicorn (web wizard, optional
[wizard]extra — never shipped into products)
Generated product
- Runtime:
openai(Chat Completions, provider-agnostic viabase_url),anthropic(native Messages),pydantic+pydantic-settings,pyyaml,typer - Web interface (when enabled):
fastapi,uvicorn,ruamel.yaml(comment-preserving config write-back); the UI is a single static page (Tailwind CSS via CDN, no build step) - MCP (when enabled): the official
mcpSDK - Tests:
pytestwith an offline mock LLM (dev dependency group; not a runtime dependency) - Environment contract: uv (
uv.lock+.python-version) with Docker andrequirements.txtfallbacks
The generated pyproject.toml contains no agent-orchestration framework, and the test suite asserts it.
Getting started
Prerequisites
- uv (uv provisions the correct Python automatically; no system Python required)
- Docker (optional, for containerized runs)
Installation
HarnessSmith is on PyPI. The Quick start needs no install — uvx runs it on demand and the cloned launcher bootstraps uv for you. For a persistent install of the generator CLI, any of these work:
uv tool install harnessmith # install the CLI with uv
pip install harnessmith # or with pip
# from source (for development):
git clone https://github.com/EpisodeYu/HarnessSmith.git && cd HarnessSmith && uv sync
Generating a harness
The Quick start covers the wizard and the one-click launcher. From an installed or cloned generator, every generation surface is a single command:
uv run harnessmith wizard # web wizard (recommended; uv sync --extra wizard)
uv run harnessmith new # interactive terminal wizard
uv run harnessmith new my-agent --preset coding-assistant # non-interactive, from a bundled preset
uv run harnessmith new my-agent --spec ./harness.spec.yaml # non-interactive, from a hand-written spec
uv run harnessmith doctor # preflight check of the local toolchain
- The web wizard (
wizard) and the terminal wizard (newwith no--spec/--preset) collect the same structural choices — display name, paradigms, web interface, MCP, skills, memory — and apply identical defaults; the web wizard suits desktops, the terminal wizard suits headless servers. - After rendering, the generator locks dependencies and runs a smoke verification (
uv sync, import check, one mock function-calling turn,pytest). Pass--no-verifyto skip it, for example when offline. - Secrets are never collected by any wizard and never enter the spec, the generated
config.yaml, or git. - The
--presetshortcut is for scripted or CI generation, not the recommended start: the bundledcoding-assistantpreset enables MCP with every tool allowlisted — shell and file writes included — and no confirmation gate (confirm: none). Prefer a wizard, or reviewconfig.yaml(tighten the tool allowlist, setconfirm: high) before pointing it at a real model.
Running the generated harness
The Quick start gets you chatting via the launcher or serve. Beyond that, a few essentials — provision the venv (the launcher does it for you), store a key, probe the model, or run fully containerized:
cd my-agent && uv sync # provision Python + an isolated venv
uv run my-agent set-key OPENAI_API_KEY # store a key in .env (never echoed) — or use the web /config panel
uv run my-agent test-llm # probe each configured model
# run fully containerized instead (Dockerfile generated by default):
docker build -t my-agent . && docker run --rm -it my-agent
Model and endpoint live in config.yaml (or the web /config LLM tab): set model, point base_url_env / api_key_env at the right env vars, and choose provider: openai or anthropic per profile. --mock runs offline without a key on run, chat, and serve. Every command is in the reference below.
Product CLI reference
| Command | Purpose |
|---|---|
run [PROMPT] |
Execute one turn. Options: --mode agent|plan|ask, --stream, --continue, --resume <id>, --role, --cwd, --mock |
chat |
Multi-turn REPL with persistent sessions; /cwd sets the working-directory hint; Ctrl-D or /exit to quit |
serve |
Start the web interface (--host, --port, --open); web-enabled products |
info |
Introspect every extension point — registered tools, paradigms, context strategies/conditions, memory backends, imported extensions, and mounted hooks |
test-llm |
Connectivity and capability probe for each configured LLM profile |
set-key <ENV_NAME> |
Write a secret into .env without echoing it or touching git |
usage |
Inspect or clear the persistent per-LLM cost ledger |
memory show|clear|path|consolidate |
Manage the long-term memory note; memory-enabled products |
mcp status / mcp warm |
Probe MCP server health / pre-warm launchers; MCP-enabled products |
Configuration model
| Layer | File | Role |
|---|---|---|
| Generation-time spec | harness.spec.yaml |
The recipe: which capabilities are compiled into the product, plus initial values. A snapshot is kept in the generated repository |
| Runtime configuration | config.yaml |
The authority for behavior: LLM profiles and roles, prompts and rule files, tool allowlist, context strategy, hooks, MCP servers, prices and cost limits, observability. Editable by hand or via the web /config panel (live application + comment-preserving write-back) |
| Secrets | .env (gitignored) |
The only location for real credentials. config.yaml and the spec reference environment-variable names only |
Structural changes (adding or removing an interface or module) require regeneration; behavioral changes never do.
Security model
- Secrets never enter git. Real values live exclusively in the gitignored
.env; all other files reference environment-variable names.set-keyand the web panel's key writer are write-only and never echo values. Traces and the debug log record no secrets. - High-risk tools are off by default. Shell and file-writing tools ship disabled and require explicit allowlisting; the runtime allowlist can only narrow the set compiled in at generation time, never extend it.
- Human-in-the-loop confirmation (
tools.confirm: none|high|all|<tool names>) intercepts risky tool calls withallow once / reject / allow for session / allow always; non-interactive contexts reject by default. Confirmation is a guardrail for trusted operators, not a security boundary — hard isolation belongs to Docker or to excluding the capability at generation time. - The web interface targets local, trusted use. The
/configpanel and the MCP management page can modify runtime behavior and launch local processes; do not expose them to untrusted networks.
License
MIT © 2026 EpisodeYu
HarnessSmith
锻造你自己的 agent harness。
一个"配置即生成"的代码生成器,产出一套你完全拥有的独立 agent harness 代码仓库——不绑定任何 agent 编排框架(无 LangChain、LangGraph、ADK),生成后不再依赖 HarnessSmith。
English | 中文
快速上手
选择启动配置向导的方式——两种都从头生成一套独立、无框架锁定的 harness(能力由你勾选:范式、Web 界面、MCP、技能、记忆),并在交付前完成冒烟自检。
克隆仓库并运行一键启动器——跨平台(Windows、macOS、Linux),无需记命令,双击文件也能跑。它会在首次使用时安装 uv,随后询问你要 Web 向导(推荐)还是终端向导:
git clone https://github.com/EpisodeYu/HarnessSmith.git && cd HarnessSmith
./HarnessSmith.sh # macOS / Linux —— 或直接双击
HarnessSmith.bat # Windows —— 或直接双击
或用 uv 按需运行——无需克隆,不留常驻安装:
uvx --from "harnessmith[wizard]" harnessmith wizard --open # Web 向导(推荐)
uvx harnessmith new # 或终端交互向导
随后启动它。生成的仓库自己也带一个一键启动器,以项目名命名(如 macOS / Linux 的 My-Agent.sh、Windows 的 My-Agent.bat):双击或直接运行,它会自动同步依赖并打开 Web 聊天——模型与 key 直接在 /config 面板里配置。或用 uv 启动:
cd my-agent # 你填写的项目名
uv run my-agent serve --open # 推荐:在浏览器里配置模型与 key 并对话
uv run my-agent chat # 可选:在终端对话
见使用指南了解全部方式,包括用手写 spec 非交互生成。
为什么选 HarnessSmith?
多数"agent 脚手架"给你的是一个绑死在某框架上、无法移除的应用。HarnessSmith 生成的是一个属于你的仓库——可读、可改、且不含任何 agent 编排框架。
| HarnessSmith | LangGraph 脚手架 | ADK / 静态模板 | |
|---|---|---|---|
| agent 框架锁定 | 无——属于你的纯 Python 循环 | LangGraph 运行时 | ADK / 多框架 |
| 每一行都可改可删 | 可以——自包含仓库 | 受限——应用建在框架之上 | 受限 |
| 配置即生成(只生成选中的能力) | 是 | 否——固定模板 | 否 |
| 核心循环体量 | 约 150–300 行 | 被框架抽象 | 被框架抽象 |
| 生成后对生成器的运行期依赖 | 无 | 框架运行时 | 框架运行时 |
| 自带测试 + 锁文件 + Dockerfile + 冒烟自检 | 是 | 视情况 | 视情况 |
| 双 LLM 协议、运行期可切(OpenAI + Anthropic) | 是 | 视情况 | 视情况 |
概述
HarnessSmith 是 agent harness 的生成器,定位类似 create-next-app。通过 Web 向导、终端交互向导、preset 或手写 YAML 采集一份规格(HarnessSpec),HarnessSmith 据此渲染出一个完整、独立的 Python 代码仓库——可读、可改、可测试、可独立运行。生成的项目与 HarnessSmith 没有任何运行期关系:生成即脱离。
设计定位
- 无 agent 框架锁定。 生成代码对任何 agent 编排框架零依赖,循环是属于你的普通 Python 代码。通用库(OpenAI SDK、Pydantic、Typer、FastAPI)只作为库使用,不接管控制流。
- 代码归你所有。 产出是带有自有测试、锁文件、Dockerfile 与文档的自包含仓库,每一行都可以阅读、修改或删除。
- 配置即生成。 能力在生成期选择,生成器只渲染被选中的部分;关闭的功能不留任何痕迹——没有模块、没有依赖、没有死代码。
- 默认极薄。 默认产物是最小但完整可跑的 harness,核心循环维持在数百行以内;较重的能力(MCP、Web 界面、技能、记忆)均为 spec 开关式可选项。
亮点
完整能力清单 —— 点击展开
- 原生 function calling —— 循环通过 API 的
tool_calls(TAO/ReAct 语义)驱动模型,而非文本解析。 - 双 LLM 协议,运行期可切 —— 每个产物同时内置 OpenAI Chat Completions 客户端(经
base_url对接 vLLM、Together、Groq、LiteLLM 等任意兼容端点)与原生 Anthropic Messages 客户端;每个 LLM profile 在运行期配置中选择provider,无需重新生成。 - 推理过程一等公民 —— thinking/reasoning 增量实时呈现(CLI 状态行、Web 可折叠推理面板),并在工具调用多轮间保留
reasoning_content,兼容有此要求的模型。 - 多范式运行时 ——
agent(默认工具调用循环)、plan与ask(均只读),每轮可切(--mode/ Web 下拉)。范式存放于薄注册表,用户以@register_paradigm自行扩展,不触碰内置实现。 - 会话持久化与续聊 —— 每次对话本地落盘;以
--continue/--resume <id>、多轮chatREPL 或 Web 会话侧栏(自动起标题、重命名、删除)续聊。Web 界面支持多会话并行——每个会话独立流式输出,切换侧栏不会打断后台运行。中断的运行具备崩溃安全:状态在消息边界写入检查点,恢复时自动修复。 - 停止 / 继续 / 重问 —— 回合中途可取消(协作式取消,流式输出一并终止),之后携带完整上下文继续;Web 界面支持就地编辑任一历史提问并从该点重新生成。
- 人在环交互 —— 内置
ask_question工具让模型向用户提出结构化澄清问题;工具调用确认(允许一次 / 拒绝 / 本会话允许 / 永久允许)拦截高风险工具,非交互场景默认拒绝。 - 按 LLM 持久成本核算 —— 用量账本按 LLM profile 跨运行累计 token;成本由各 profile 单价派生,达到
cost_limit即在下次调用前阻止该模型。经 Web Budget 页或usageCLI 管理。 - 上下文管理 —— 可组合触发条件(
window_pct、max_tokens、max_turns,以真实 token 用量驱动)决定何时压缩;策略(truncate、summarize、none)决定如何压缩;两者均为用户可扩展的注册表。超大工具结果在入历史前截断,溢出时按需强制压缩;max_steps阀值约束失控的工具循环,压缩还能在单个长回合内折叠(子回合keep_last_steps),使一个 agentic 回合也能留在窗口内。 - 可组合 hooks 与薄 tool-policy 层 —— 通过
config.hooks挂载一个或多个Hooks子类(子类化并挂载,无@register_hook);五个生命周期点(before_step/after_step/before_tool/after_tool/on_error)。before_tool可拒绝一次调用、after_tool可改写/脱敏结果——一层代码级 policy 门禁,不引入 middleware 机制——多个 hook 按顺序组合。Web 界面有专门的 Hooks tab 并带隐私安全的执行日志;info(CLI)与GET /registries(Web)呈现所有扩展点——工具、范式、上下文策略/触发条件、记忆后端、已导入的 extensions 与已挂载的 hooks。 - 按会话工作目录 —— 可选的工作目录提示(CLI
--cwd、chatREPL 的/cwd,或 Web 聊天工具栏的目录浏览器)注入系统提示,作为指引而非沙箱。当前日期/时间也会每轮注入。 - 不臃肿的工具生态 —— 装饰器注册的工具注册表带按工具风险分级,另有可选 MCP 客户端(stdio、HTTP、SSE 三种传输)与精选 catalog(免密钥多引擎网页搜索、fetch、git、时间、Desktop Commander、GitHub)。默认网页搜索 server 免密钥且多引擎(Bing/百度/DuckDuckGo/Brave/搜狗/…)带自动 failover,因此在某些引擎慢或不可达的网络下也能正常工作。Node 系 MCP server 会被
npm install进固定的按 server 独立目录、再用node直接启动(不走临时npx缓存——它在 Windows 上不可靠)。MCP server 运行期管理:健康状态、增删改、热重连(Web 面板),CLI 侧mcp status。 - Agent Skills —— 可选支持开放的
SKILL.md标准与渐进披露;技能是纯文件,不引入框架。 - 跨会话记忆 —— 可选的自维护长期笔记,每轮注入系统提示,经工具写入,在会话边界由专用 LLM 角色整理,并可通过薄
@register_memory注册表替换后端。 - 全局规则常驻注入 —— markdown 规则文件(
AGENTS.md/CLAUDE.md/.cursor/rules惯例)注入每轮系统提示。 - 完整可观测性 —— 每次运行产出 JSONL trace 与 token/成本计数;可选的仅本地 debug 日志记录生命周期事件(名称、计数、耗时),绝不记录消息内容、工具参数或密钥。
- 交付前验证可运行 —— 生成器锁定依赖并对每个新仓库执行冒烟验证(
uv sync、import 检查、一次 mock function-calling、pytest),全绿才视为就绪。
生成内容
核心(始终生成)
| 能力 | 说明 |
|---|---|
| Agent 循环 | 原生 function-calling 循环,含范式分发、生命周期 hook 与优雅停止(含 max_steps 阀值) |
| LLM 层 | profile 注册表 + 角色路由(generation、compaction,以及可选 title / memory 角色),按 profile 的采样参数、超时/重试/fallback,双协议客户端(OpenAI 兼容 + 原生 Anthropic) |
| 工具注册表 | 装饰器注册 + 风险分级;高风险工具默认关闭,仅 allowlist 显式开启 |
| Hooks 与 policy | 经 config.hooks 挂载的可组合 Hooks 子类;observer + tool-policy 生命周期点(before_tool 拒绝 / after_tool 脱敏);经 info / GET /registries 提供扩展可发现性 |
| 会话 | 本地 JSON 持久化、--continue / --resume、chat REPL、崩溃安全检查点、按会话工作目录提示 |
| 交互层 | ask_question 结构化澄清 + HITL 工具确认,CLI/Web 共用同一套底座 |
| 上下文 | 触发条件/策略双注册表、工具结果截断、溢出自救、max_steps 约束、子回合折叠 |
| 预算 | 按 LLM 持久成本账本,按 profile 设单价与硬性成本上限 |
| 提示词 | 系统提示拼装 + 规则文件常驻注入、当前日期/时间、工作目录提示 |
| 可观测性 | JSONL trace + token/成本计数;可选仅本地 debug 日志 |
| CLI | run、chat、info、test-llm、set-key、usage(启用对应模块时另有 serve、mcp、memory) |
| 可运行性 | uv.lock + .python-version、Dockerfile + .dockerignore + devcontainer、requirements.txt pip 兜底、mock LLM 测试套件、一键启动脚本 |
可选模块(spec 开关;关闭 = 代码与依赖中均不存在)
| 模块 | 说明 |
|---|---|
| Web 界面 | FastAPI + SSE 聊天,token 级流式、可折叠推理与工具调用面板、支持多会话并行的会话侧栏、聊天工具栏(范式、生成模型、工作目录选择),以及分页双语(中/英)/config 面板——LLM、Context、Tools、MCP、Hooks、Paradigms、Prompts、Budget、Memory、Observability、System 各 tab(MCP 与 Memory 仅在对应模块启用时出现)。修改即时生效并回写 config.yaml(保留注释) |
| MCP 工具 | Model Context Protocol 客户端(stdio / HTTP / SSE),allowlist 与风险标记,精选 catalog 预填,运行期 server 管理(健康探测 + 热重连) |
| Agent Skills | SKILL.md 发现、元数据注入与按需加载 |
| 长期记忆 | 自维护 markdown 笔记,工具驱动写入、策略塑形、整理压缩,后端可插拔 |
架构
生成器与产物是严格分离的两层。spec 决定结构(哪些能力被编译进产物);产物的 config.yaml 是行为的运行期权威(模型、提示词、工具 allowlist、上下文参数、单价与限额)——全部可在不重新生成的前提下调整。架构图见英文部分 Architecture。
技术栈
生成器
- Python ≥ 3.11,全链路使用 uv 管理
- Typer(CLI)、questionary(终端交互向导)
- Jinja2(模板渲染)
- Pydantic v2 + PyYAML(
HarnessSpec校验与序列化) - FastAPI + uvicorn(Web 向导,可选
[wizard]extra——绝不进入产物)
生成产物
- 运行期:
openai(Chat Completions,经base_url对接任意兼容端点)、anthropic(原生 Messages)、pydantic+pydantic-settings、pyyaml、typer - Web 界面(启用时):
fastapi、uvicorn、ruamel.yaml(保留注释的配置回写);前端为单一静态页面(Tailwind CSS CDN,无构建步骤) - MCP(启用时):官方
mcpSDK - 测试:
pytest+ 离线 mock LLM(dev 依赖组,非运行期依赖) - 环境契约:uv(
uv.lock+.python-version),Docker 与requirements.txt兜底
生成的 pyproject.toml 不含任何 agent 编排框架,且测试套件对此作出断言。
使用指南
前置条件
- uv(uv 会自动下载匹配的 Python,无需预装系统 Python)
- Docker(可选,用于容器化运行)
安装
HarnessSmith 已发布到 PyPI。快速上手无需安装——uvx 按需运行,克隆后的启动器也会自动准备 uv。若要常驻安装生成器 CLI,以下任选其一:
uv tool install harnessmith # 用 uv 安装 CLI
pip install harnessmith # 或用 pip
# 从源码(开发用):
git clone https://github.com/EpisodeYu/HarnessSmith.git && cd HarnessSmith && uv sync
生成 harness
快速上手已覆盖向导与一键启动器。从已安装或克隆的生成器出发,每个生成入口都是单命令:
uv run harnessmith wizard # Web 向导(推荐;uv sync --extra wizard)
uv run harnessmith new # 终端交互向导
uv run harnessmith new my-agent --preset coding-assistant # 非交互,使用内置 preset
uv run harnessmith new my-agent --spec ./harness.spec.yaml # 非交互,使用手写 spec
uv run harnessmith doctor # 本机工具链预检
- Web 向导(
wizard)与 终端向导(new不带--spec/--preset)采集同一组结构选项——显示名、范式、Web 界面、MCP、技能、记忆——并应用一致的默认值;Web 向导适合桌面环境,终端向导适合无图形界面的服务器。 - 渲染完成后,生成器锁定依赖并执行冒烟验证(
uv sync、import 检查、一次 mock function-calling、pytest);离线等场景可用--no-verify跳过。 - 任何向导都不采集密钥;密钥不会进入 spec、生成的
config.yaml或 git。 --preset是面向脚本 / CI 生成的捷径,并非推荐起点:内置的coding-assistantpreset 会开启 MCP 并把每个工具都加入 allowlist(含 shell 与写文件),且不设确认门禁(confirm: none)。请优先用向导,或在接入真实模型前先检查config.yaml(收窄 allowlist、把confirm设为high)。
运行生成的 harness
快速上手已用启动器或 serve 让你开始对话。此外还有几条实用命令——准备 venv(启动器会替你做)、存 key、探测模型,或完全容器化运行:
cd my-agent && uv sync # 准备 Python 与隔离 venv
uv run my-agent set-key OPENAI_API_KEY # 把 key 写入 .env(不回显)——也可在 Web /config 面板里配置
uv run my-agent test-llm # 探测各配置模型
# 改为完全容器化运行(默认生成 Dockerfile):
docker build -t my-agent . && docker run --rm -it my-agent
模型与端点在 config.yaml(或 Web /config 的 LLM tab)配置:设置 model,将 base_url_env / api_key_env 指向对应环境变量,并为每个 profile 选择 provider: openai 或 anthropic。run、chat、serve 均支持 --mock,无需 key 即可离线试用。完整命令见下方参考。
产物 CLI 参考
| 命令 | 用途 |
|---|---|
run [PROMPT] |
执行一轮。选项:--mode agent|plan|ask、--stream、--continue、--resume <id>、--role、--cwd、--mock |
chat |
多轮 REPL,会话自动持久化;/cwd 设置工作目录提示;Ctrl-D 或 /exit 退出 |
serve |
启动 Web 界面(--host、--port、--open);启用 Web 的产物 |
info |
内省所有扩展点——已注册的工具、范式、上下文策略/触发条件、记忆后端、已导入的 extensions 与已挂载的 hooks |
test-llm |
对每个 LLM profile 做连通性与能力探测 |
set-key <ENV_NAME> |
将密钥写入 .env,不回显、不触碰 git |
usage |
查看或清空按 LLM 的持久成本账本 |
memory show|clear|path|consolidate |
管理长期记忆笔记;启用记忆的产物 |
mcp status / mcp warm |
探测 MCP server 健康 / 预热启动器;启用 MCP 的产物 |
配置模型
| 层 | 文件 | 角色 |
|---|---|---|
| 生成期 spec | harness.spec.yaml |
配方:哪些能力被编译进产物,以及初始值;快照保留在生成的仓库中 |
| 运行期配置 | config.yaml |
行为的权威来源:LLM profile 与角色、提示词与规则文件、工具 allowlist、上下文策略、hooks、MCP server、单价与成本上限、可观测性。可手改,也可经 Web /config 面板修改(即时生效 + 保留注释回写) |
| 密钥 | .env(gitignored) |
真实凭证的唯一存放处;config.yaml 与 spec 仅引用环境变量名称 |
结构性变更(增删接口或模块)需要重新生成;行为性变更永远不需要。
安全模型
- 密钥不入 git。 真实值仅存于 gitignored 的
.env;其余文件只引用环境变量名。set-key与 Web 面板的密钥写入均为只写、不回显;trace 与 debug 日志不记录密钥。 - 高风险工具默认关闭。 shell 与写文件类工具默认禁用,需显式 allowlist 开启;运行期 allowlist 只能在生成期编译进的集合内收窄,永远不能扩张。
- 人在环确认(
tools.confirm: none|high|all|<工具名>)以"允许一次 / 拒绝 / 本会话允许 / 永久允许"拦截高风险工具调用;非交互场景默认拒绝。确认机制是面向可信操作者的护栏,不是安全边界——硬隔离依靠 Docker,或在生成期就不编译该能力。 - Web 界面面向本地可信使用。
/config面板与 MCP 管理页可修改运行期行为并启动本地进程,请勿暴露给不可信网络。
许可
MIT © 2026 EpisodeYu
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file harnessmith-0.2.0.tar.gz.
File metadata
- Download URL: harnessmith-0.2.0.tar.gz
- Upload date:
- Size: 1.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7af5b024beffce2e01ad4399a960416dc0b10c18320e40d63fdbc20f2dc733ea
|
|
| MD5 |
c31f17851ad486fa37d1eced54a7ec00
|
|
| BLAKE2b-256 |
bfbb942e9d3a2df8f43248e0e5b880266ca5520f6e3417d0a866caba0bd6e0c5
|
File details
Details for the file harnessmith-0.2.0-py3-none-any.whl.
File metadata
- Download URL: harnessmith-0.2.0-py3-none-any.whl
- Upload date:
- Size: 408.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d06429535c08d122d86129fffd1ee3da5e072fffedaf77a4484663691bbb839
|
|
| MD5 |
c325848da9941cef1f9f1d47e0ac8d13
|
|
| BLAKE2b-256 |
7b32b82c5daf7438fd915377a19a092fe411d7d2aee51d8cce392f5a26468cc3
|