Hermit keeps Claude Code as the orchestrator while a local MCP executor handles Codex-aware coding work.
Project description
HermitAgent
Run Claude Code 50–80% cheaper: Hermit is an MCP executor with Codex-first fallback routing (codex → z.ai → local) while Claude stays the orchestrator.
HermitAgent plugs into Claude Code as an MCP sub-agent. Claude keeps doing what it is best at — planning, interviewing, code review — and delegates the high-token grunt work (file edits, test runs, commit/push, refactors) to a low-cost executor (Codex, local ollama, or flat-rate z.ai).
v0.3.x highlights (cost-first execution)
- Codex support is first-class: Hermit can run tasks via Codex, with
gpt-5.4atmediumreasoning as the default Codex lane. - Auto model routing when
modelis omitted: configurable viarouting.priority_modelsinsettings.json, and providers that are not configured/installed are skipped automatically (default chain:gpt-5.4 medium -> glm -> local ollama). - Explicit model requests are strict: if you ask for a specific model and it is unavailable, Hermit returns a clear unavailable error instead of silently switching providers.
- MCP + gateway auto-start:
bin/mcp-server.shnow ensures the local gateway is up, so Claude Code/Codex startup is simpler.
┌──────────────┐ MCP ┌──────────────┐ any OpenAI-compatible ┌───────┐
│ Claude Code │ ──────▶ │ HermitAgent │ ────────────────────────▶ │ LLM │
│ (planner) │ │ (executor) │ └───────┘
└──────────────┘ └──────────────┘
$$$ ~$0 / flat-rate
Why
Claude Code is great, but a /feature-develop session easily burns 100k+ Claude tokens on mechanical work — reading files, running pytest, formatting diffs, writing conventional-commit messages — that any competent code model can do. HermitAgent exposes three MCP tools (run_task, reply_task, check_task) so Claude Code can delegate whole skills to a cheaper model while the user stays in the familiar Claude Code UI.
The pattern
How much it actually saves
Measured numbers live under benchmarks/results/ — each file is one independent run pair.
We deliberately don't publish a single marketing percentage here. Savings depend on the task, the repo, and the executor you choose; a headline number without that context is noise. If you want a reproducible datapoint:
cp -r benchmarks/todo-api/starter /tmp/cc-run && cp -r benchmarks/todo-api/starter /tmp/cc-hermit-run- Run
/feature-develop <task>in one,/feature-develop-hermit <task>in the other (task spec:benchmarks/todo-api/TASK.md). - Feed the two Claude Code session logs into
scripts/measure-savings.sh— it prints a markdown table you can paste intobenchmarks/results/.
Full protocol: docs/measure-savings.md. Executor cost is treated as $0 (ollama = free, z.ai / GLM = flat-rate); what we compare is Claude-side tokens and USD.
What it gives you
The product is the pattern, not any specific skill:
Claude does reasoning, judgment, and quality gates. A cheap local / flat-rate executor does the grunt work. They talk to each other over MCP, so the switch is one word in a slash command.
The repo ships this pattern as four example skills under .claude/commands/ so you can see it in action and fork them into whatever workflow you already have:
/feature-develop-hermit— Claude interviews, Hermit implements and tests/code-apply-hermit— Claude reads the PR review, Hermit applies every line/code-polish-hermit— Claude picks what to polish, Hermit runs the lint/test loop/code-push-hermit— Claude writes the PR description, Hermit does the commit/push
These are reference implementations, tuned for the author's own workflow (GitHub PR-centric). The goal is for Claude to stay on the "small Claude tokens, big quality impact" work — interviews, review, final verification — and for everything high-volume / mechanical (Read × N, Edit × N, Bash/pytest loops, commit message writing) to land on Hermit. Write your own -hermit variant for the skills you actually live in; the docs/hermit-variants.md "add your own" recipe is a few steps.
Everything else in the repo is there to make that pattern work cleanly:
- MCP server (
run_task/reply_task/check_task) with bidirectional conversation — Hermit can ask Claude mid-task - Skill compatibility — same
SKILL.mdformat and YAML frontmatter as Claude Code; skills under~/.claude/skills/are shared read-only - Progressive-disclosure rule system — foundational rules stay auto-injected, contextual rules are on-demand skills (cuts session prefix from ~12k to ~3k tokens)
- Gateway (FastAPI + SSE) in front of the executor LLM — 429 fail-fast + failover, cache hints, dashboard at
:8765 - Model routing by name + auto chain — explicit names route by provider (
gpt-*-codex/gpt-5.4→ Codex,glm-*→ z.ai,name:tag→ local ollama); omitted model followsrouting.priority_models - Permission floor —
.env,*.pem,*.key,credentials*blocked across every mode (even YOLO) - Self-learning skills with model-aware lifecycles (validated-on models, 30-day auto-deprecation,
needs_reviewon model swap) - Optional standalone TUI (React + Ink) for when you want to use Hermit without Claude Code in the loop
How is this different from …?
| Project | Pattern | Trade-off |
|---|---|---|
| claude-code-router | Redirects all CC traffic to another provider | You lose Claude quality; the "Claude Code" session is really the local model |
| LiteLLM | Generic multi-provider proxy | Not coding-specific, no understanding of CC workflow |
| OpenHands / aider | Standalone agent, replaces Claude Code | Full migration away from CC; big UX change |
| Anthropic Agent SDK | Official sub-agent framework | DIY: you still write the executor, the local-model wiring, the MCP glue |
| HermitAgent | Claude stays the orchestrator; Hermit is the executor | Narrower scope, but drop-in: /foo → /foo-hermit |
If you don't use Claude Code, you don't need HermitAgent. If you do, and the monthly bill or the rate limits are a problem, this is what it is for.
Where the project is heading
The bundled skills still give Claude the full interview phase before delegating. The direction this project is moving in is the opposite: Claude does only the final verification pass — the executor does the interview, the plan, the implementation, the tests, the commit — and Claude is only woken up at the end to reject, accept, or ask for a narrow revision. The less Claude does, the more of the bill disappears. The existing -hermit skills are the conservative checkpoint on that spectrum; your own variants can push further.
Install
npm-first (clone-free)
npm install -g @cafitac/hermit-agent
hermit setup-codex
Or, if you only want the Claude-facing setup path first:
npm install -g @cafitac/hermit-agent
hermit setup-claude
The npm package is a thin launcher. On first run it bootstraps a managed Python runtime under ~/.hermit/npm-runtime, installs cafitac-hermit-agent from PyPI, and then forwards to the normal Hermit CLI. That means you can start from npm alone without cloning the repo first, while Hermit's Python runtime remains the real implementation.
When the npm-installed launcher detects a newer published version, hermit now prints a compact update hint on stderr. To upgrade the wrapper directly, run:
hermit self-update
Requirements for the npm-first path:
- Node.js 20+
- Python 3.11+
Source checkout
git clone https://github.com/cafitac/hermit-agent.git
cd hermit-agent
./install.sh
Or, once installed from PyPI / editable mode, use the guided installer entrypoint:
pip install cafitac-hermit-agent
hermit install
PyPI distribution name: cafitac-hermit-agent
Installed CLI commands remain: hermit, hermit-agent, hermit-gateway, hermit-setup
hermit install is the product-facing setup path: it asks a few yes/no questions, repairs or creates the local gateway API key, ensures the local gateway is running, offers MCP registration in ~/.claude.json, and can install or refresh the Codex channels path without making the user edit config files manually.
What the installer does automatically:
- Creates a project-local
.venv, bootstrapsuvinside it, and runsuv pip install -e '.[test]'. - Writes a default
~/.hermit/settings.json(modelglm-5.1, gateway URLhttp://localhost:8765, etc.). - Prompts
Generate a random gateway API key now?If you accept, it applies the schema inhermit_agent/gateway/migrations/001_initial.sqlto~/.hermit/gateway.db, inserts a freshly-generatedhermit-mcp-<random>key, and patchesgateway_api_keyin your settings file. - Prompts
Pull a local coding model via ollama?(skipped automatically ifollamais not installed). Accepting pullsqwen3-coder:30b(~18 GB). - Symlinks the four bundled
-hermitslash commands into~/.claude/commands/. - Prompts
Register Hermit MCP server in ~/.claude.json?with three choices: (a) project-specific, (b) user-wide, (c) skip. On accept, merges ahermit-channelstdio entry pointing at./bin/mcp-server.shinto~/.claude.json(backup:~/.claude.json.backup-<ts>). Safe on re-runs — an identical entry is detected and left alone. That launcher now auto-starts the local gateway on demand (and skips the start when the gateway is already healthy). - Prompts
Add hermit alias to <rc-file>?so you can runhermitfrom any shell. If an existing alias points to an old path (e.g. before thebin/move), the installer offers to update it. - Prints any "Pending manual steps" at the end — e.g. a reminder to launch Claude Code with
--dangerously-load-development-channels server:hermit-channel.
Useful flags:
./install.sh --no-api-key # skip the API key prompt (use placeholder)
./install.sh --no-ollama # skip the ollama prompt
./install.sh --skip-venv # reuse an existing .venv
./install.sh --no-mcp-register # skip the ~/.claude.json registration prompt
./install.sh --no-alias # skip the shell-rc alias prompt
Every prompt is idempotent: re-running the installer detects the existing API key, MCP entry, alias, and ollama model and reports them unchanged instead of duplicating.
Codex async-interaction path (experimental)
If you want Codex approval requests and free-text waits to flow through Hermit without manual polling, run:
hermit-agent setup-codex
That command:
- keeps Hermit as the public Codex-facing surface
- prepares Hermit's local async approval / reply path for Codex
- writes the needed project-local Hermit settings
- bootstraps the Codex discovery assets Hermit needs
- registers the local Codex marketplace root automatically
- prefers Codex app-server visible prompts when the host exposes that surface
- removes Hermit's legacy Codex UserPromptSubmit reply hook if it was installed earlier
- runs a compact local smoke check before reporting success
From an operator point of view, the important check is still:
codex mcp list
codex mcp get hermit-channel
If hermit-channel is present, Codex is pointed at Hermit correctly. The rest of the async reply transport is an internal Hermit implementation detail.
The preferred path is package-first. If Hermit's default packaged Codex support is unavailable, it can still fall back to a built sibling checkout via HERMIT_CODEX_CHANNELS_SOURCE_PATH (or ../codex-channels).
The default path is user-scope and local-first so Codex can discover the bridge across workspaces; remote backends stay optional and out of the critical path.
Claude-focused setup path
If you want the Claude-facing install path without also provisioning the Codex async runtime, run:
hermit setup-claude
This uses the same install flow, but skips the Codex runtime/bootstrap work and focuses on the gateway, API key, and Claude MCP registration path.
To reverse everything: ./uninstall.sh walks back through the same steps with per-item prompts (--yes accepts all; --keep-data leaves ~/.hermit/ alone). Ollama models are never deleted — remove manually with ollama rm <model>.
The hermit launcher transparently starts the gateway daemon if it isn't already running (HERMIT_AUTO_GATEWAY=0 opts out), so you never need to remember to run ./bin/gateway.sh --daemon first. The MCP launcher (./bin/mcp-server.sh) now does the same check-and-start flow, which makes both Claude Code and Codex able to bring up the full Hermit stack from the MCP entrypoint alone.
Pick an executor LLM
ollama (local, $0) — either accept the installer prompt, or:
brew install ollama
ollama pull qwen3-coder:30b
z.ai Coding Plan (flat-rate subscription) — add your z.ai key to ~/.hermit/settings.json:
{
"gateway_url": "http://localhost:8765",
"gateway_api_key": "hermit-mcp-…",
"model": "glm-5.1",
"providers": {
"z.ai": {
"base_url": "https://api.z.ai/api/coding/paas/v4",
"api_key": "<your z.ai key>",
"anthropic_base_url": "https://api.z.ai/api/anthropic"
}
}
}
Two keys, two layers: gateway_api_key authenticates clients against the local Hermit gateway, and providers[<slug>].api_key is the gateway's own credential for talking to the upstream platform. Add a new provider by dropping another block into providers — e.g. providers["anthropic"] with a base_url + api_key.
Skipped the API key prompt?
Either re-run ./install.sh (it detects a placeholder and re-prompts), or mint one manually — see docs/cc-setup.md § 2. Hermit will refuse to run until gateway_api_key is a real value, not CHANGE_ME_AFTER_FIRST_RUN.
Wire it into Claude Code
If you accepted the installer's MCP registration prompt, the hermit-channel stdio entry is already in ~/.claude.json — the remaining piece is launching Claude Code with --dangerously-load-development-channels server:hermit-channel so the channel capability is enabled. A shell alias works well:
alias cc='claude --dangerously-load-development-channels server:hermit-channel'
If you skipped the registration prompt (or want to adjust the scope later), see docs/cc-setup.md § 3 for the exact ~/.claude.json block.
Quick start — CC + Hermit (the recommended shape)
./bin/mcp-server.sh # auto-starts the gateway if needed, then serves MCP stdio
Then in Claude Code:
/feature-develop-hermit <ticket-or-short-task>
Claude interviews you about the ticket, writes the plan, and delegates the implementation to Hermit over MCP. You watch Hermit's progress in the Claude Code session; the executor tokens never hit your Claude bill.
Standalone (no Claude Code)
./bin/hermit.sh "fix the flaky test in tests/test_api.py" # one-shot CLI
./bin/hermit.sh # TUI (needs HERMIT_UI_DIR)
Two API endpoints
The gateway exposes the same upstream providers behind two wire-format-specific paths. Model routing (name:tag → local ollama, glm-* → z.ai, extensible) is identical between them.
-
/v1/chat/completions— OpenAI-native (primary sharing surface) Used by thehermitCLI, anything speaking the OpenAI SDK, and ngrok-exposed friends. Tier-gated via per-key platform ACL.from openai import OpenAI client = OpenAI(base_url="https://<ngrok>.ngrok.app/v1", api_key="<friend-key>")
-
/anthropic/v1/messages— Anthropic-native (alternative, not recommended as the Claude Code path) Enables pointing Claude Code at the gateway viaANTHROPIC_BASE_URL=http://localhost:8765/anthropic+ANTHROPIC_AUTH_TOKEN=<gateway-key>. z.ai is passthrough; ollama goes through a text-only Anthropic↔OpenAI translator (tool_use returns 400 in v1). Use only if you understand the tradeoff. This bypasses HermitAgent entirely — Claude Code drives and your CC-side tools/permissions are all that apply. The recommended integration remainsCC → MCP (hermit-channel) → HermitAgent, whichinstall.shalready sets up.
Platform ACL (operator vs friend):
Operator key (install.sh --generate-api-key, the default)
→ platforms: local, z.ai, anthropic, codex (full access)
Friend key (install.sh --generate-friend-key)
→ platforms: local (local ollama only; 403 for glm-*)
A key with zero rows in api_key_platform is denied everything (default-deny).
Configuration
Priority: CLI flag > env var > <cwd>/.hermit/settings.json > ~/.hermit/settings.json > defaults.
If model is omitted in a task request, Hermit follows routing.priority_models from settings.json. The default chain is gpt-5.4 (medium) -> z.ai -> local ollama`, but any provider that is not configured or installed in the current environment is skipped automatically.
{
"gateway_url": "http://localhost:8765",
"gateway_api_key": "hermit-mcp-…",
"model": "glm-5.1",
"routing": {
"priority_models": [
{"model": "gpt-5.4", "reasoning_effort": "medium"},
{"model": "glm-5.1"},
{"model": "qwen3-coder:30b"}
]
},
"response_language": "auto",
"compact_instructions": "",
"ollama_max_loaded": 1,
"external_max_concurrent": 10
}
Common templates:
Codex-first
{
"routing": {
"priority_models": [
{"model": "gpt-5.4", "reasoning_effort": "medium"},
{"model": "glm-5.1"},
{"model": "qwen3-coder:30b"}
]
}
}
z.ai-first
{
"routing": {
"priority_models": [
{"model": "glm-5.1"},
{"model": "gpt-5.4", "reasoning_effort": "medium"},
{"model": "qwen3-coder:30b"}
]
}
}
local-only
{
"routing": {
"priority_models": [
{"model": "qwen3-coder:30b"}
]
}
}
ollama_max_loaded is how many distinct models the gateway lets ollama hold in memory simultaneously — if a request targets a not-yet-loaded model while the budget is already full, the gateway returns 503 with Retry-After instead of letting ollama swap itself into an OOM. external_max_concurrent caps in-flight requests to external providers (z.ai, OpenAI, …); excess requests queue rather than fail. This is the replacement for the old ollama-proxy — the gateway itself is safe to expose (e.g. via ngrok).
Field semantics after the proxy refactor:
gateway_url/gateway_api_key— client-facing. What thehermitCLI (and any other client) sends to authenticate against the local gateway.providers[<slug>]— gateway-internal, upstream. Per-platform block the gateway uses to reach z.ai / Anthropic / OpenAI / etc. on your behalf. Clients never see these. Adding a new provider is one JSON block — the adapter layer picks it up by slug.
Architecture (short version)
- AgentLoop — LLM turn, tool call, result, compact when context fills
- Gateway — FastAPI layer in front of the executor. Classifier, routing, failover, web dashboard
- MCP server — exposes
run_task/reply_task/check_task/cancel_taskfor Claude Code - Channel notifications —
notifications/claude/channelframes emitted inline by the Python MCP server; Claude Code renders them as<channel source="hermit-channel">blocks - Skills — markdown with YAML frontmatter, hot-loaded at session start, compatible with
~/.claude/skills/
Layout
hermit_agent/ # agent, loop, tools, gateway, MCP, skills
.claude/ # this repo's own Claude Code config
scripts/harness/ # harness tooling (cc-learner.py, etc.)
tests/ # pytest suite
docs/ # user/operator docs and architecture notes
bin/ # launchers for hermit / gateway / MCP
claw-code-main/ # reference mirror / sibling workspace (not part of HermitAgent package)
hermes-agent/ # sibling project with its own workflows and release surface
react/ # standalone frontend package experiments / support surface
CI map
- Root
.github/workflows/python-tests.yml— validates thehermit_agentPython package on Python 3.11–3.13. hermes-agent/.github/workflows/*— sibling project CI; not the root package gate.claw-code-main/rust/.github/workflows/*— nested Rust workspace CI; separate responsibility.
If you touch root hermit_agent/, bin/, docs, or tests/, the root workflow is the primary CI contract.
Status
Early, working, single-author. MIT. No release cadence. No roadmap promises. Clone, read the code, open an issue if something is broken.
Running tests
.venv/bin/pytest tests/ # conftest.py auto-excludes ollama-dependent tests
Boundaries
- Hermit does not modify
~/.claude/— it only reads~/.claude/skills/for cross-tool skill reuse
Model guardrail profiles
Hermit ships model profiles under hermit_agent/profiles/defaults/ for the
main built-in lanes:
qwen3-coder:30bgpt-5.4gpt-5.3glm-5.1unknownfallback
These profiles drive guardrail activation defaults through
hermit_agent.guardrails.engine.GuardrailEngine. If you want to tune a
specific model locally without forking the repo, place an override file at:
~/.hermit/profiles/<model-slug>.yaml
Examples:
~/.hermit/profiles/gpt-5.4.yaml~/.hermit/profiles/glm-5.1.yaml
User profiles override the built-in defaults when the model id matches.
- Hermit does not require Claude Code; it just shines brightest as its sub-agent
- Nothing phones home. Everything runs locally or through the LLM endpoint you configure
License
MIT — see LICENSE.
See also
- docs/cc-setup.md — registering Hermit as a Claude Code MCP sub-agent
- docs/hermit-variants.md — the
-hermitskill family in detail - docs/measure-savings.md — cost-savings measurement protocol
- benchmarks/ — reproducible task specs and community datapoints
- CONTRIBUTING.md — contribution guide
- .dev/ — internal design notes
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cafitac_hermit_agent-0.3.6.tar.gz.
File metadata
- Download URL: cafitac_hermit_agent-0.3.6.tar.gz
- Upload date:
- Size: 373.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec05be7c5846dfbb1efe708fe245b5b0cc09b53ca3a8716946dcdf61ff0029cb
|
|
| MD5 |
d6c788bd2074fd1c0f8ac5981a9120f9
|
|
| BLAKE2b-256 |
a1d9d3425b16549acc4a10ba2fa27595a553818152eafa6f1aed22a9c4d24043
|
File details
Details for the file cafitac_hermit_agent-0.3.6-py3-none-any.whl.
File metadata
- Download URL: cafitac_hermit_agent-0.3.6-py3-none-any.whl
- Upload date:
- Size: 310.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d1d8a106078a1ee6908d0c7d9bd73171f194f4f1e98608a6acd50faecd02e3e
|
|
| MD5 |
57148b232f6db2ce3dcdfbb03fe7a580
|
|
| BLAKE2b-256 |
61a17d2d8f5380e0c6c840f5656fd286c1d58d180dcd255f28c0cd58a2975bd5
|