Run Claude Code cheaper: a Python MCP sub-agent that delegates grunt work to local ollama or z.ai.

These details have not been verified by PyPI

Project links

Project description

HermitAgent

Run Claude Code 50–80% cheaper: Hermit is an MCP executor with Codex-first fallback routing (codex → z.ai → local) while Claude stays the orchestrator.

HermitAgent plugs into Claude Code as an MCP sub-agent. Claude keeps doing what it is best at — planning, interviewing, code review — and delegates the high-token grunt work (file edits, test runs, commit/push, refactors) to a low-cost executor (Codex, local ollama, or flat-rate z.ai).

v0.3.x highlights (cost-first execution)

Codex support is first-class: Hermit can run tasks via Codex, with gpt-5.4 at medium reasoning as the default Codex lane.
Auto model routing when model is omitted: configurable via routing.priority_models in settings.json, and providers that are not configured/installed are skipped automatically (default chain: gpt-5.4 medium -> glm -> local ollama).
Explicit model requests are strict: if you ask for a specific model and it is unavailable, Hermit returns a clear unavailable error instead of silently switching providers.
MCP + gateway auto-start: bin/mcp-server.sh now ensures the local gateway is up, so Claude Code/Codex startup is simpler.

┌──────────────┐   MCP   ┌──────────────┐   any OpenAI-compatible   ┌───────┐
│  Claude Code │ ──────▶ │  HermitAgent │ ────────────────────────▶ │  LLM  │
│ (planner)    │         │  (executor)  │                           └───────┘
└──────────────┘         └──────────────┘
     $$$                      ~$0 / flat-rate

Why

Claude Code is great, but a /feature-develop session easily burns 100k+ Claude tokens on mechanical work — reading files, running pytest, formatting diffs, writing conventional-commit messages — that any competent code model can do. HermitAgent exposes three MCP tools (run_task, reply_task, check_task) so Claude Code can delegate whole skills to a cheaper model while the user stays in the familiar Claude Code UI.

The pattern

demo

How much it actually saves

Measured numbers live under benchmarks/results/ — each file is one independent run pair.

We deliberately don't publish a single marketing percentage here. Savings depend on the task, the repo, and the executor you choose; a headline number without that context is noise. If you want a reproducible datapoint:

cp -r benchmarks/todo-api/starter /tmp/cc-run && cp -r benchmarks/todo-api/starter /tmp/cc-hermit-run
Run /feature-develop <task> in one, /feature-develop-hermit <task> in the other (task spec: benchmarks/todo-api/TASK.md).
Feed the two Claude Code session logs into scripts/measure-savings.sh — it prints a markdown table you can paste into benchmarks/results/.

Full protocol: docs/measure-savings.md. Executor cost is treated as $0 (ollama = free, z.ai / GLM = flat-rate); what we compare is Claude-side tokens and USD.

What it gives you

The product is the pattern, not any specific skill:

Claude does reasoning, judgment, and quality gates. A cheap local / flat-rate executor does the grunt work. They talk to each other over MCP, so the switch is one word in a slash command.

The repo ships this pattern as four example skills under .claude/commands/ so you can see it in action and fork them into whatever workflow you already have:

/feature-develop-hermit — Claude interviews, Hermit implements and tests
/code-apply-hermit — Claude reads the PR review, Hermit applies every line
/code-polish-hermit — Claude picks what to polish, Hermit runs the lint/test loop
/code-push-hermit — Claude writes the PR description, Hermit does the commit/push

These are reference implementations, tuned for the author's own workflow (GitHub PR-centric). The goal is for Claude to stay on the "small Claude tokens, big quality impact" work — interviews, review, final verification — and for everything high-volume / mechanical (Read × N, Edit × N, Bash/pytest loops, commit message writing) to land on Hermit. Write your own -hermit variant for the skills you actually live in; the docs/hermit-variants.md "add your own" recipe is a few steps.

Everything else in the repo is there to make that pattern work cleanly:

MCP server (run_task / reply_task / check_task) with bidirectional conversation — Hermit can ask Claude mid-task
Skill compatibility — same SKILL.md format and YAML frontmatter as Claude Code; skills under ~/.claude/skills/ are shared read-only
Progressive-disclosure rule system — foundational rules stay auto-injected, contextual rules are on-demand skills (cuts session prefix from ~12k to ~3k tokens)
Gateway (FastAPI + SSE) in front of the executor LLM — 429 fail-fast + failover, cache hints, dashboard at :8765
Model routing by name + auto chain — explicit names route by provider (gpt-*-codex / gpt-5.4 → Codex, glm-* → z.ai, name:tag → local ollama); omitted model follows routing.priority_models
Permission floor — .env, *.pem, *.key, credentials* blocked across every mode (even YOLO)
Self-learning skills with model-aware lifecycles (validated-on models, 30-day auto-deprecation, needs_review on model swap)
Optional standalone TUI (React + Ink) for when you want to use Hermit without Claude Code in the loop

How is this different from …?

Project	Pattern	Trade-off
claude-code-router	Redirects all CC traffic to another provider	You lose Claude quality; the "Claude Code" session is really the local model
LiteLLM	Generic multi-provider proxy	Not coding-specific, no understanding of CC workflow
OpenHands / aider	Standalone agent, replaces Claude Code	Full migration away from CC; big UX change
Anthropic Agent SDK	Official sub-agent framework	DIY: you still write the executor, the local-model wiring, the MCP glue
HermitAgent	Claude stays the orchestrator; Hermit is the executor	Narrower scope, but drop-in: `/foo` → `/foo-hermit`

If you don't use Claude Code, you don't need HermitAgent. If you do, and the monthly bill or the rate limits are a problem, this is what it is for.

Where the project is heading

The bundled skills still give Claude the full interview phase before delegating. The direction this project is moving in is the opposite: Claude does only the final verification pass — the executor does the interview, the plan, the implementation, the tests, the commit — and Claude is only woken up at the end to reject, accept, or ask for a narrow revision. The less Claude does, the more of the bill disappears. The existing -hermit skills are the conservative checkpoint on that spectrum; your own variants can push further.

Install

git clone https://github.com/cafitac/hermit-agent.git
cd hermit-agent
./install.sh

Or, once installed from PyPI / editable mode, use the guided installer entrypoint:

hermit install

hermit install is the product-facing setup path: it asks a few yes/no questions, repairs or creates the local gateway API key, ensures the local gateway is running, offers MCP registration in ~/.claude.json, and can install or refresh the Codex channels path without making the user edit config files manually.

What the installer does automatically:

Creates a project-local .venv, bootstraps uv inside it, and runs uv pip install -e '.[test]'.
Writes a default ~/.hermit/settings.json (model glm-5.1, gateway URL http://localhost:8765, etc.).
Prompts Generate a random gateway API key now? If you accept, it applies the schema in hermit_agent/gateway/migrations/001_initial.sql to ~/.hermit/gateway.db, inserts a freshly-generated hermit-mcp-<random> key, and patches gateway_api_key in your settings file.
Prompts Pull a local coding model via ollama? (skipped automatically if ollama is not installed). Accepting pulls qwen3-coder:30b (~18 GB).
Symlinks the four bundled -hermit slash commands into ~/.claude/commands/.
Prompts Register Hermit MCP server in ~/.claude.json? with three choices: (a) project-specific, (b) user-wide, (c) skip. On accept, merges a hermit-channel stdio entry pointing at ./bin/mcp-server.sh into ~/.claude.json (backup: ~/.claude.json.backup-<ts>). Safe on re-runs — an identical entry is detected and left alone. That launcher now auto-starts the local gateway on demand (and skips the start when the gateway is already healthy).
Prompts Add hermit alias to <rc-file>? so you can run hermit from any shell. If an existing alias points to an old path (e.g. before the bin/ move), the installer offers to update it.
Prints any "Pending manual steps" at the end — e.g. a reminder to launch Claude Code with --dangerously-load-development-channels server:hermit-channel.

Useful flags:

./install.sh --no-api-key        # skip the API key prompt (use placeholder)
./install.sh --no-ollama         # skip the ollama prompt
./install.sh --skip-venv         # reuse an existing .venv
./install.sh --no-mcp-register   # skip the ~/.claude.json registration prompt
./install.sh --no-alias          # skip the shell-rc alias prompt

Every prompt is idempotent: re-running the installer detects the existing API key, MCP entry, alias, and ollama model and reports them unchanged instead of duplicating.

Codex async-interaction path (experimental)

If you want Codex approval requests and free-text waits to flow through Hermit without manual polling, run:

hermit-agent setup-codex

That command:

keeps Hermit as the public Codex-facing surface
prepares Hermit's local async approval / reply path for Codex
writes the needed project-local Hermit settings
bootstraps the Codex discovery assets Hermit needs
registers the local Codex marketplace root automatically
prefers Codex app-server visible prompts when the host exposes that surface
removes Hermit's legacy Codex UserPromptSubmit reply hook if it was installed earlier
runs a compact local smoke check before reporting success

From an operator point of view, the important check is still:

codex mcp list
codex mcp get hermit-channel

If hermit-channel is present, Codex is pointed at Hermit correctly. The rest of the async reply transport is an internal Hermit implementation detail.

The preferred path is package-first. If Hermit's default packaged Codex support is unavailable, it can still fall back to a built sibling checkout via HERMIT_CODEX_CHANNELS_SOURCE_PATH (or ../codex-channels).

The default path is user-scope and local-first so Codex can discover the bridge across workspaces; remote backends stay optional and out of the critical path.

To reverse everything: ./uninstall.sh walks back through the same steps with per-item prompts (--yes accepts all; --keep-data leaves ~/.hermit/ alone). Ollama models are never deleted — remove manually with ollama rm <model>.

The hermit launcher transparently starts the gateway daemon if it isn't already running (HERMIT_AUTO_GATEWAY=0 opts out), so you never need to remember to run ./bin/gateway.sh --daemon first. The MCP launcher (./bin/mcp-server.sh) now does the same check-and-start flow, which makes both Claude Code and Codex able to bring up the full Hermit stack from the MCP entrypoint alone.

Pick an executor LLM

ollama (local, $0) — either accept the installer prompt, or:

brew install ollama
ollama pull qwen3-coder:30b

z.ai Coding Plan (flat-rate subscription) — add your z.ai key to ~/.hermit/settings.json:

{
  "gateway_url": "http://localhost:8765",
  "gateway_api_key": "hermit-mcp-…",
  "model": "glm-5.1",
  "providers": {
    "z.ai": {
      "base_url": "https://api.z.ai/api/coding/paas/v4",
      "api_key": "<your z.ai key>",
      "anthropic_base_url": "https://api.z.ai/api/anthropic"
    }
  }
}

Two keys, two layers: gateway_api_key authenticates clients against the local Hermit gateway, and providers[<slug>].api_key is the gateway's own credential for talking to the upstream platform. Add a new provider by dropping another block into providers — e.g. providers["anthropic"] with a base_url + api_key.

Skipped the API key prompt?

Either re-run ./install.sh (it detects a placeholder and re-prompts), or mint one manually — see docs/cc-setup.md § 2. Hermit will refuse to run until gateway_api_key is a real value, not CHANGE_ME_AFTER_FIRST_RUN.

Wire it into Claude Code

If you accepted the installer's MCP registration prompt, the hermit-channel stdio entry is already in ~/.claude.json — the remaining piece is launching Claude Code with --dangerously-load-development-channels server:hermit-channel so the channel capability is enabled. A shell alias works well:

alias cc='claude --dangerously-load-development-channels server:hermit-channel'

If you skipped the registration prompt (or want to adjust the scope later), see docs/cc-setup.md § 3 for the exact ~/.claude.json block.

Quick start — CC + Hermit (the recommended shape)

./bin/mcp-server.sh               # auto-starts the gateway if needed, then serves MCP stdio

Then in Claude Code:

/feature-develop-hermit <ticket-or-short-task>

Claude interviews you about the ticket, writes the plan, and delegates the implementation to Hermit over MCP. You watch Hermit's progress in the Claude Code session; the executor tokens never hit your Claude bill.

Standalone (no Claude Code)

./bin/hermit.sh "fix the flaky test in tests/test_api.py"   # one-shot CLI
./bin/hermit.sh                                              # TUI (needs HERMIT_UI_DIR)

Two API endpoints

The gateway exposes the same upstream providers behind two wire-format-specific paths. Model routing (name:tag → local ollama, glm-* → z.ai, extensible) is identical between them.

/v1/chat/completions — OpenAI-native (primary sharing surface) Used by the hermit CLI, anything speaking the OpenAI SDK, and ngrok-exposed friends. Tier-gated via per-key platform ACL.
```
from openai import OpenAI
client = OpenAI(base_url="https://<ngrok>.ngrok.app/v1", api_key="<friend-key>")
```
/anthropic/v1/messages — Anthropic-native (alternative, not recommended as the Claude Code path) Enables pointing Claude Code at the gateway via ANTHROPIC_BASE_URL=http://localhost:8765/anthropic + ANTHROPIC_AUTH_TOKEN=<gateway-key>. z.ai is passthrough; ollama goes through a text-only Anthropic↔OpenAI translator (tool_use returns 400 in v1). Use only if you understand the tradeoff. This bypasses HermitAgent entirely — Claude Code drives and your CC-side tools/permissions are all that apply. The recommended integration remains CC → MCP (hermit-channel) → HermitAgent, which install.sh already sets up.

Platform ACL (operator vs friend):

Operator key (install.sh --generate-api-key, the default)
  → platforms: local, z.ai, anthropic, codex   (full access)

Friend key (install.sh --generate-friend-key)
  → platforms: local                            (local ollama only; 403 for glm-*)

A key with zero rows in api_key_platform is denied everything (default-deny).

Configuration

Priority: CLI flag > env var > <cwd>/.hermit/settings.json > ~/.hermit/settings.json > defaults.

If model is omitted in a task request, Hermit follows routing.priority_models from settings.json. The default chain is gpt-5.4 (medium) -> z.ai -> local ollama`, but any provider that is not configured or installed in the current environment is skipped automatically.

{
  "gateway_url": "http://localhost:8765",
  "gateway_api_key": "hermit-mcp-…",
  "model": "glm-5.1",
  "routing": {
    "priority_models": [
      {"model": "gpt-5.4", "reasoning_effort": "medium"},
      {"model": "glm-5.1"},
      {"model": "qwen3-coder:30b"}
    ]
  },
  "response_language": "auto",
  "compact_instructions": "",
  "ollama_max_loaded": 1,
  "external_max_concurrent": 10
}

Common templates:

Codex-first

{
  "routing": {
    "priority_models": [
      {"model": "gpt-5.4", "reasoning_effort": "medium"},
      {"model": "glm-5.1"},
      {"model": "qwen3-coder:30b"}
    ]
  }
}

z.ai-first

{
  "routing": {
    "priority_models": [
      {"model": "glm-5.1"},
      {"model": "gpt-5.4", "reasoning_effort": "medium"},
      {"model": "qwen3-coder:30b"}
    ]
  }
}

local-only

{
  "routing": {
    "priority_models": [
      {"model": "qwen3-coder:30b"}
    ]
  }
}

ollama_max_loaded is how many distinct models the gateway lets ollama hold in memory simultaneously — if a request targets a not-yet-loaded model while the budget is already full, the gateway returns 503 with Retry-After instead of letting ollama swap itself into an OOM. external_max_concurrent caps in-flight requests to external providers (z.ai, OpenAI, …); excess requests queue rather than fail. This is the replacement for the old ollama-proxy — the gateway itself is safe to expose (e.g. via ngrok).

Field semantics after the proxy refactor:

gateway_url / gateway_api_key — client-facing. What the hermit CLI (and any other client) sends to authenticate against the local gateway.
providers[<slug>] — gateway-internal, upstream. Per-platform block the gateway uses to reach z.ai / Anthropic / OpenAI / etc. on your behalf. Clients never see these. Adding a new provider is one JSON block — the adapter layer picks it up by slug.

Architecture (short version)

AgentLoop — LLM turn, tool call, result, compact when context fills
Gateway — FastAPI layer in front of the executor. Classifier, routing, failover, web dashboard
MCP server — exposes run_task / reply_task / check_task / cancel_task for Claude Code
Channel notifications — notifications/claude/channel frames emitted inline by the Python MCP server; Claude Code renders them as <channel source="hermit-channel"> blocks
Skills — markdown with YAML frontmatter, hot-loaded at session start, compatible with ~/.claude/skills/

Layout

hermit_agent/                # agent, loop, tools, gateway, MCP, skills
.claude/                     # this repo's own Claude Code config
scripts/harness/             # harness tooling (cc-learner.py, etc.)
tests/                       # pytest suite
docs/                        # user/operator docs and architecture notes
bin/                         # launchers for hermit / gateway / MCP
claw-code-main/             # reference mirror / sibling workspace (not part of HermitAgent package)
hermes-agent/               # sibling project with its own workflows and release surface
react/                      # standalone frontend package experiments / support surface

CI map

Root .github/workflows/python-tests.yml — validates the hermit_agent Python package on Python 3.11–3.13.
hermes-agent/.github/workflows/* — sibling project CI; not the root package gate.
claw-code-main/rust/.github/workflows/* — nested Rust workspace CI; separate responsibility.

If you touch root hermit_agent/, bin/, docs, or tests/, the root workflow is the primary CI contract.

Status

Early, working, single-author. MIT. No release cadence. No roadmap promises. Clone, read the code, open an issue if something is broken.

Running tests

.venv/bin/pytest tests/    # conftest.py auto-excludes ollama-dependent tests

Boundaries

Hermit does not modify ~/.claude/ — it only reads ~/.claude/skills/ for cross-tool skill reuse

Model guardrail profiles

Hermit ships model profiles under hermit_agent/profiles/defaults/ for the main built-in lanes:

qwen3-coder:30b
gpt-5.4
gpt-5.3
glm-5.1
unknown fallback

These profiles drive guardrail activation defaults through hermit_agent.guardrails.engine.GuardrailEngine. If you want to tune a specific model locally without forking the repo, place an override file at:

~/.hermit/profiles/<model-slug>.yaml

Examples:

~/.hermit/profiles/gpt-5.4.yaml
~/.hermit/profiles/glm-5.1.yaml

User profiles override the built-in defaults when the model id matches.

Hermit does not require Claude Code; it just shines brightest as its sub-agent
Nothing phones home. Everything runs locally or through the LLM endpoint you configure

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.58

Apr 27, 2026

0.3.57

Apr 27, 2026

0.3.56

Apr 27, 2026

0.3.55

Apr 27, 2026

0.3.54

Apr 27, 2026

0.3.53

Apr 27, 2026

0.3.52

Apr 27, 2026

0.3.51

Apr 27, 2026

0.3.50

Apr 27, 2026

0.3.49

Apr 27, 2026

0.3.48

Apr 27, 2026

0.3.47

Apr 27, 2026

0.3.46

Apr 27, 2026

0.3.45

Apr 27, 2026

0.3.44

Apr 27, 2026

0.3.43

Apr 27, 2026

0.3.42

Apr 27, 2026

0.3.41

Apr 27, 2026

0.3.40

Apr 27, 2026

0.3.39

Apr 27, 2026

0.3.38

Apr 27, 2026

0.3.37

Apr 27, 2026

0.3.36

Apr 27, 2026

0.3.35

Apr 27, 2026

0.3.34

Apr 27, 2026

0.3.33

Apr 27, 2026

0.3.32

Apr 27, 2026

0.3.31

Apr 26, 2026

0.3.30

Apr 26, 2026

0.3.29

Apr 26, 2026

0.3.28

Apr 26, 2026

0.3.27

Apr 26, 2026

0.3.26

Apr 26, 2026

0.3.25

Apr 26, 2026

0.3.24

Apr 26, 2026

0.3.23

Apr 26, 2026

0.3.22

Apr 26, 2026

0.3.21

Apr 26, 2026

0.3.20

Apr 26, 2026

0.3.18

Apr 26, 2026

0.3.17

Apr 26, 2026

0.3.16

Apr 26, 2026

0.3.15

Apr 26, 2026

0.3.14

Apr 26, 2026

0.3.11

Apr 26, 2026

0.3.10

Apr 24, 2026

0.3.9

Apr 23, 2026

0.3.7

Apr 23, 2026

0.3.6

Apr 23, 2026

0.3.5

Apr 23, 2026

0.3.4

Apr 23, 2026

0.3.3

Apr 23, 2026

This version

0.3.2

Apr 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cafitac_hermit_agent-0.3.2.tar.gz (370.6 kB view details)

Uploaded Apr 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cafitac_hermit_agent-0.3.2-py3-none-any.whl (308.7 kB view details)

Uploaded Apr 23, 2026 Python 3

File details

Details for the file cafitac_hermit_agent-0.3.2.tar.gz.

File metadata

Download URL: cafitac_hermit_agent-0.3.2.tar.gz
Upload date: Apr 23, 2026
Size: 370.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cafitac_hermit_agent-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`e954a893272ded2459c2ea43434f3213c5d7d027fac3a907d884d25b41f60cfd`
MD5	`40368f383e1a19b5bc8d4d0a5182b365`
BLAKE2b-256	`219394f861fb36e09db0ff742b25c5816b3749d2cb1d3471423aa8e612eb36dc`

See more details on using hashes here.

File details

Details for the file cafitac_hermit_agent-0.3.2-py3-none-any.whl.

File metadata

Download URL: cafitac_hermit_agent-0.3.2-py3-none-any.whl
Upload date: Apr 23, 2026
Size: 308.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cafitac_hermit_agent-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f7c6952956bc9bb87b0491640838f87b33a73acef1ea32b820fa385a8c0befb5`
MD5	`6ad1de2cc4999123acda7b782a5b0e2a`
BLAKE2b-256	`594c1f19a3f86fd6c6a6a626a5ac2827d6c17103cba989001ad70a2d62650d86`

See more details on using hashes here.

cafitac-hermit-agent 0.3.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

HermitAgent

v0.3.x highlights (cost-first execution)

Why

The pattern

How much it actually saves

What it gives you

How is this different from …?

Where the project is heading

Install

Codex async-interaction path (experimental)

Pick an executor LLM

Skipped the API key prompt?

Wire it into Claude Code

Quick start — CC + Hermit (the recommended shape)

Standalone (no Claude Code)

Two API endpoints

Configuration

Architecture (short version)

Layout

CI map

Status

Running tests

Boundaries

Model guardrail profiles

License

See also

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes