A local-first hierarchical coding orchestrator for delegating tasks to local or self-hosted models.
Project description
Hierocode
Hierarchical coding with cost-aware model delegation.
Frontier models plan and review; local models write the code. Cost per task is $0 incremental on a bring-your-own-subscription setup.
You bring: (1) Python ≥3.10, (2) a local Ollama server with at least one model pulled, and (3) one of — a Claude Pro / ChatGPT Plus subscription, an Anthropic API key, or (degraded) a second local model for planning. Details in Before you install below.
Why hierocode
Frontier-model APIs charge per token, and code generation is token-heavy. A single non-trivial refactor can consume thousands of output tokens across multiple retries, driving API costs into the dollars-per-task range quickly. Subscription plans (Claude Pro, ChatGPT Plus) are cheaper on a per-message basis but have soft quotas — burning them on code generation leaves nothing for the rest of your workday.
Hierocode splits the work by role. The planner (your Claude Pro or ChatGPT Plus subscription, or an API key) turns your task into a structured JSON work breakdown and QAs the result. The drafter (a local Ollama model) does the actual code writing for each bounded unit. The reviewer defaults to the planner tier. Planner calls are short — 5–20 k input tokens, 1–3 k output — so two or three of them barely dent a subscription quota. Drafter calls run entirely on your own hardware at $0.
Before you install
Hierocode needs three things on your machine. pip install hierocode is the last
step, not the first.
1. Python ≥ 3.10
python3 --version
If older: brew install python@3.12 (macOS), sudo apt install python3.12
(Ubuntu/Debian), or https://www.python.org/downloads/ (Windows).
2. A local drafter — Ollama + at least one model
This is where the actual code gets written. Runs entirely on your hardware, $0 per call.
# Install Ollama (one-time):
# macOS / Windows: https://ollama.com/download (GUI installer)
# Linux: curl -fsSL https://ollama.com/install.sh | sh
# Pull a drafter model (~2 GB for llama3.2:3b)
ollama pull llama3.2:3b
# Verify the Ollama server is running (most installers auto-start it):
curl -fsS http://localhost:11434/ >/dev/null && echo "ollama: ok"
# If not running: `ollama serve` (macOS/Linux) or start the tray app (Windows).
Model size the wizard picks for you based on RAM:
| RAM | Recommended drafter |
|---|---|
| ≤ 12 GB | llama3.2:1b |
| 12–24 GB | llama3.2:3b |
| 24–48 GB | qwen2.5-coder:7b |
| ≥ 48 GB | qwen2.5-coder:14b |
3. A planner — pick ONE
The planner breaks your task into sized units and QAs the drafter's output. You need exactly one of these paths configured:
| Path | Cost per task | Setup |
|---|---|---|
| Claude Pro ($17/mo) | $0 incremental | Install Claude Code, run claude /login |
| ChatGPT Plus ($20/mo) | $0 incremental | Install Codex CLI, run codex login |
| Anthropic API key | ~$0.01–0.15/task | export ANTHROPIC_API_KEY=sk-... |
No subscription or API key? Hierocode falls back to using Ollama as the planner too. It works, but plan quality drops noticeably on non-trivial tasks. The wizard warns when this happens. Get a subscription or API key for any real use.
Quickstart — the $17 journey
Assumes the three prereqs above are satisfied and claude /login has been run.
pip install hierocode
First run
# Detect your environment and write a tailored config.
hierocode init --wizard
# Confirm roles and providers were detected correctly.
hierocode models show
# Sanity-check that providers are reachable.
hierocode doctor
# Estimate cost and message quota before committing.
hierocode run -t "add input validation to the login endpoint" --estimate
# Run the full plan → draft → QA pipeline.
hierocode run -t "add input validation to the login endpoint"
The wizard detects claude on PATH, picks it as the planner, reads your RAM to
choose the right Ollama drafter model, and writes ~/.hierocode.yaml. Nothing is
sent to an API until you run hierocode run (or hierocode plan).
Quickstart — the API journey
If you have ANTHROPIC_API_KEY set and no claude or codex CLI installed, the
wizard still works: hierocode init --wizard detects the env var and routes the
planner through the Anthropic Messages API using claude-haiku-4-5 by default.
Run hierocode models set planner claude-sonnet-4-6 afterward if you want a stronger
planner. All other steps are identical to the $17 journey above.
Architecture
hierocode run
|
+-- skeleton.py AST symbol index of the repo (Python) / file list (other)
|
+-- capacity.py Probe drafter model: num_ctx, RAM, GPU, tier
|
+-- Planner (claude_code_cli / codex_cli / anthropic / ollama)
| Receives: task + skeleton + capacity profile
| Returns: JSON Plan — list of TaskUnits
| Cache: plan_cache keyed on (task, skeleton, planner_model, drafter_model)
|
+-- Dispatcher Iterates TaskUnits
|
+-- Budget Pack target + context files into drafter context window
|
+-- Drafter (Ollama) — writes code for one bounded unit
| Returns: unified diff
|
+-- QA (Planner tier) — reviews diff against acceptance criteria
verdict:
accept -> emit diff to stdout
revise -> loop back to Drafter (cap: max_revisions_per_unit)
split -> re-plan the unit into sub-units
escalate -> planner drafts the unit directly (cap: max_escalations_per_task)
Diffs are printed to stdout; they are not applied automatically.
Supported providers
| Type | Auth required | Typical role |
|---|---|---|
claude_code_cli |
claude binary, logged in to Pro |
Planner, Reviewer |
codex_cli |
codex binary, logged in to Plus |
Planner, Reviewer |
anthropic |
ANTHROPIC_API_KEY env var |
Planner, Reviewer |
ollama |
none | Drafter (and fallback Planner) |
openai_compatible |
bearer token env var | Planner or Drafter |
lmstudio |
none | Drafter |
transformers_local |
none | Drafter |
Model IDs in each provider:
claude_code_cli/anthropic:claude-opus-4-7,claude-sonnet-4-6,claude-haiku-4-5codex_cli:gpt-5,gpt-5-codex,o4-miniollama: whatever you have pulled; wizard recommends based on RAM
Cost model
The drafter is always local ($0). Only planner calls have a cost.
| Planner path | Cost per task | Notes |
|---|---|---|
claude_code_cli (Pro, $17/mo) |
$0 incremental | Counts against subscription quota |
codex_cli (Plus, $20/mo) |
$0 incremental | Counts against subscription quota |
anthropic Haiku API |
~$0.01–0.03 | 4 calls x ~10 k input + ~2 k output |
anthropic Sonnet API |
~$0.05–0.15 | 4 calls x ~10 k input + ~2 k output |
The 4-call estimate breaks down as:
- Plan cache hit: 0 planner calls.
- Plan cache miss + typical 3 QA rounds: 1 planning call + 3 QA calls = 4 calls.
Token counts use a char/4 heuristic (not tiktoken) with a 20% safety margin.
Pricing can be overridden — drop ~/.hierocode/pricing.yaml with your own
model prices and subscription limits. Useful when Anthropic ships a new model
before hierocode does, or when you want to track quota math for a different
subscription tier. See plan.md or broker/pricing.py for the schema.
Advanced: active exploration — set routing.planner.exploration: active
in your config to let Claude Code / Codex use their Read/Grep/Glob
tools during planning. Plans get noticeably better; token cost goes up
(still 1 message against your Pro/Plus quota, but input tokens compound
within that message). Off by default to preserve the predictable cost story.
Commands
Setup
hierocode init # Write default ~/.hierocode.yaml
hierocode init --wizard # Auto-detect environment, write tailored config
hierocode init --force # Overwrite existing config
hierocode doctor # Validate config and ping all providers
Inspection
hierocode providers list # List configured providers and their types
hierocode models list # Query each provider for available model names
hierocode models show # Show planner / drafter / reviewer role bindings
hierocode resources # Print CPU count and RAM (total / available)
hierocode workers suggest # Suggest safe / balanced / aggressive worker counts
Execution
hierocode plan -t "..." # Generate and print a JSON plan (no drafting)
hierocode plan -t "..." --planner-model <m> # Override planner model
hierocode run -t "..." # Full pipeline: plan -> draft -> QA
hierocode run -t "..." --estimate # Print cost/quota estimate, then exit
hierocode run -t "..." --drafter-model <m> # Override drafter model
hierocode run -t "..." --repo ./path # Set repo root for skeleton builder
hierocode draft -t "..." --file path/to/file # Draft a patch for a single file
hierocode review -t "..." --file path/to/file # Review a single file
Maintenance
hierocode models set planner claude-sonnet-4-6 # Set planner model
hierocode models set drafter llama3.2:3b # Set drafter model
hierocode models set planner claude-haiku-4-5 -p claude_pro # With explicit provider
hierocode cache clear # Delete all cached plans
hierocode version # Print installed version
Interactive TUI
Run hierocode with no arguments to enter a persistent REPL (requires a config
to exist; run hierocode init --wizard first). The TUI provides slash commands
for every one-shot subcommand above plus a few extras:
/run <task> Full plan → draft → QA pipeline, with live
progress panel (plan units, elapsed time,
token counts, quota bar).
/plan <task> Plan only; /plan show re-displays the last plan.
/estimate <task> Cost/quota preview without executing.
/draft <task> <file> Single-file escape hatch.
/review <task> <file> Single-file review.
/diff Re-display the last diff.
/apply Apply the last diff to disk with per-file
confirmation (y / yes-all / skip / abort).
/usage Per-role token/message usage for the session,
with a quota bar if on a subscription path.
/models Show current role bindings.
/models set <role> <model> Update routing, persists to YAML.
/cache clear Wipe cached plans.
/config edit Open ~/.hierocode.yaml in $EDITOR.
/doctor /resources /repo Diagnostics.
/task save <name> <desc> Save a named task alias (persisted).
/task <name> Run a saved alias.
/task list / /task delete <n> Alias CRUD.
/history /clear /help /exit
Plain-text input (no leading /) can either prompt before running or dispatch
directly to /run — set tui.interaction_mode in your config to prompt
(default, safer) or immediate.
Configuration reference
hierocode init --wizard writes ~/.hierocode.yaml. A representative example for
a Claude Pro user with 16 GB RAM:
# Generated by hierocode init --wizard
default_provider: local_ollama
providers:
local_ollama:
type: ollama
base_url: http://localhost:11434
auth:
type: none
claude_pro:
type: claude_code_cli
auth:
type: none
codex_plus:
type: codex_cli
auth:
type: none
anthropic_api:
type: anthropic
auth:
type: bearer_env
env_var: ANTHROPIC_API_KEY
routing:
planner:
provider: claude_pro
model: claude-sonnet-4-6
drafter:
provider: local_ollama
model: llama3.2:3b
reviewer:
provider: claude_pro
model: claude-sonnet-4-6
policy:
max_revisions_per_unit: 2
max_escalations_per_task: 3
warn_before_escalation: true
parallelization:
default_strategy: balanced
max_local_workers: 4
max_remote_workers: 8
hierocode init (no wizard) writes a minimal config that points every role at
local_ollama with llama3.2:3b. That works out of the box for pure-local setups
but runs in degraded mode (Ollama-as-planner). Use hierocode init --wizard for
any real setup so the three prereqs above are actually wired up.
v0.1 YAML files still parse — legacy keys like default_model, small_model,
routing.draft_model, and routing.review_model are silently ignored. Run
hierocode init --wizard --force once to upgrade cleanly.
Current limitations
- Not an autonomous agent. Hierocode does not apply patches, run tests, or iterate on failures without user intervention. Every diff must be reviewed and applied manually.
- Diffs are printed, not applied. The
runanddraftcommands emit unified diffs to stdout. There is no--applyflag in v0.2. - AST skeleton is Python-only. Non-Python files are listed by name and size; their symbols are not extracted. The planner sees less context for non-Python repos.
- Token counting uses a char/4 heuristic. The budget module does not use tiktoken for Ollama-side estimation; a 20% safety margin compensates but is not exact.
- Single-file
draftandreviewcommands do not use the QA loop. They call the provider once and return raw output. Usehierocode runfor the full pipeline.
Contributing
See CONTRIBUTING.md for development setup, linting, and test instructions.
Acknowledgements
Built with the help of Claude Opus 4.7 (Anthropic).
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hierocode-0.3.5.tar.gz.
File metadata
- Download URL: hierocode-0.3.5.tar.gz
- Upload date:
- Size: 123.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6033c49181d8f7c7caa3467bccc9dc70b79e37648f8d22c9e69680e6848187e2
|
|
| MD5 |
f8a6e30ad2bf4029777698b3c90c853c
|
|
| BLAKE2b-256 |
336095eecd6f443c34b18fad27d161b756c6de670f369ad079abfdb8169537d0
|
File details
Details for the file hierocode-0.3.5-py3-none-any.whl.
File metadata
- Download URL: hierocode-0.3.5-py3-none-any.whl
- Upload date:
- Size: 85.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47cc7a73ac3046deba0aa928725e5560f962e0a317fdbc9994351faed90464bd
|
|
| MD5 |
6277a4d78cb0acb9f5fb904dbd247237
|
|
| BLAKE2b-256 |
796f44d934833dcb43f13b944d8b53b3229e50ffdfde150c78bfe442e47658b6
|