Universal cheap-subagent CLI — route low-reasoning tasks from Claude Code / Codex to Ollama, LM Studio, OpenRouter, Anthropic Haiku, and other backends.
Project description
agent-delegate
Route low-reasoning work from Claude Code, Codex, Cursor, and other AI coding agents to a cheap or local LLM — Ollama, LM Studio, OpenRouter, Anthropic Haiku, vLLM, llama.cpp, Groq, or Cerebras.
Claude Code and Codex burn upstream tokens — and weekly subscription quota — on tasks a local Ollama model handles fine: bulk file reads, boilerplate generation, log summarization, fact extraction. agent-delegate is a tiny, zero-dependency CLI that any AI coding tool can shell out to. The cheap model does the busywork; your reasoning-grade context stays clean.
┌─────────────────────┐ delegate ask/write/summarize ┌──────────────────────┐
│ Claude Code, Codex, │ ────────────────────────────────► │ Ollama / LM Studio / │
│ Cursor, Aider, etc. │ ◄──────────────────────────────── │ OpenRouter / Haiku │
└─────────────────────┘ JSON result └──────────────────────┘
▲ │
│ optional JSONL usage log │
└──────────────────────────────────────────────────┘
Why
Without agent-delegate |
With agent-delegate |
|---|---|
| Claude/Codex reads 8 files just to scaffold one test → ~40k upstream tokens | Local qwen3-coder reads those 8 files, returns the test → ~0 upstream tokens |
| Long log summarization eats your 5-hour cap | Cheap model summarizes, only the summary enters upstream context |
| Boilerplate generation burns weekly quota | Free local model writes it; you review |
| Locked to one provider's "weak model" mode (aider, cursor) | Any OpenAI-compatible endpoint, switchable per call |
Install
pipx install agent-delegate
# or
pip install --user agent-delegate
Zero runtime dependencies. Python 3.11+. Works on macOS, Linux, Windows.
Quickstart
# Answer a question from a corpus of files
agent-delegate ask --paths src/auth.py src/db.py --question "where is the session created?"
# Generate a draft file from context + spec
agent-delegate write --context tests/test_users.py \
--spec "draft a parallel test suite for the orders table" \
--target tests/test_orders.py
# Summarize anything piped in
tail -n 500 server.log | agent-delegate summarize
# Override profile / model per call
agent-delegate --profile haiku ask --paths README.md --question "..."
agent-delegate --profile openrouter --model meta-llama/llama-3.3-70b-instruct summarize < diff.txt
# Short alias
ad ask --paths README.md --question "what does this CLI do?"
One-time setup for AI coding tools
agent-delegate install
Idempotently injects a "delegation policy" rule block into:
| Tool | Target |
|---|---|
| Claude Code (CLI) | ~/.claude/CLAUDE.md |
| Codex CLI | ~/.codex/AGENTS.md |
| Claude desktop app | printed snippet for the Project instructions UI |
| Codex web app | printed snippet for the Project instructions UI |
Each block is bounded by <!-- agent-delegate:begin vX.Y.Z --> ... <!-- agent-delegate:end --> markers, so re-running install upgrades cleanly and uninstall strips the block while preserving everything else.
Preview before writing:
agent-delegate install --dry-run
agent-delegate install --print claude-desktop # just dump the snippet
Backends
| Backend | Profile | Base URL | Auth |
|---|---|---|---|
| Ollama (local + cloud) | ollama |
http://localhost:11434/v1 |
none |
| LM Studio | lmstudio |
http://localhost:1234/v1 |
none |
| OpenRouter | openrouter |
https://openrouter.ai/api/v1 |
OPENROUTER_API_KEY |
| Anthropic Haiku | haiku |
https://api.anthropic.com/v1 |
ANTHROPIC_API_KEY |
| vLLM | (custom) | configurable | optional |
| llama.cpp server | (custom) | http://localhost:8080/v1 |
none |
| OpenAI / Groq / Cerebras / Together / Fireworks / Hyperbolic | (custom) | their host | their API key |
Any OpenAI-compatible endpoint works — drop your own profile into ~/.agent-delegate/profiles.toml. Anthropic native API has its own adapter (backend = "anthropic").
Profiles
agent-delegate reads ~/.agent-delegate/profiles.toml. Write a starter file with the bundled defaults:
agent-delegate profiles init
Example:
default_profile = "ollama"
[profiles.ollama]
backend = "openai-compat"
base_url = "http://localhost:11434/v1"
default_model = "qwen3-coder:480b-cloud"
[profiles.lmstudio]
backend = "openai-compat"
base_url = "http://localhost:1234/v1"
default_model = "qwen2.5-coder-32b-instruct"
[profiles.openrouter]
backend = "openai-compat"
base_url = "https://openrouter.ai/api/v1"
api_key_env = "OPENROUTER_API_KEY"
default_model = "meta-llama/llama-3.3-70b-instruct"
[profiles.haiku]
backend = "anthropic"
api_key_env = "ANTHROPIC_API_KEY"
default_model = "claude-haiku-4-5"
[profiles.groq]
backend = "openai-compat"
base_url = "https://api.groq.com/openai/v1"
api_key_env = "GROQ_API_KEY"
default_model = "llama-3.3-70b-versatile"
Inspect:
agent-delegate profiles list
agent-delegate profiles show ollama
agent-delegate doctor # ping each profile + check API keys
CLI reference
| Command | What it does |
|---|---|
ask |
Answer a question using one or more files as context |
write |
Draft a boilerplate file from context + spec |
summarize |
Condense stdin (logs, transcripts, diffs) |
install |
Idempotently inject delegation rules into AI tool configs |
uninstall |
Strip injected rule blocks |
status |
Show install state + active profile + version |
doctor |
Probe each profile for reachability + auth |
profiles list / profiles show / profiles init |
Manage profiles |
Run agent-delegate <command> --help for full flags.
What to delegate — and what NOT to
Good fits:
- Reading 3+ files to answer a focused question
- Generating boilerplate (tests, types, scaffolding, configs, CRUD endpoints)
- Summarizing long logs, transcripts, build output, diffs
- Extracting facts or identifiers from a corpus
Bad fits — keep these in your reasoning model:
- Secrets, credentials, customer data, PII
- Security decisions, auth flows, crypto
- Root-cause debugging that needs tight iteration
- Database migrations, anything that mutates production
- One-line surgical edits (round-trip cost > benefit)
The bundled rule snippets enforce this split inside Claude Code / Codex.
Optional usage tracking
Set AGENT_DELEGATE_LOG_DIR=/path/to/dir to log every delegate call as JSONL. Each record includes timestamp, profile, backend, model, prompt/completion tokens, cwd, and parent process ID. Format is compatible with token-meter — the companion dashboard for tracking Claude / Codex / delegate token usage.
How it compares
| Tool | What it does | Why agent-delegate is different |
|---|---|---|
aider --weak-model |
Aider's cheap-model mode | Aider-only; not callable from Claude Code, Codex, Cursor |
cursor delegate (hypothetical) |
Editor-bundled subagent | IDE-locked, single provider |
litellm proxy |
Multi-provider proxy | Different layer — agent-delegate is the CLI verb your agent calls |
agent-delegate |
Universal CLI verb | Any AI tool can shell out; any OpenAI-compatible endpoint or Anthropic; zero deps |
Contributing
PRs welcome. Local dev:
git clone https://github.com/aliaihub/agent-delegate.git
cd agent-delegate
pip install -e ".[dev]"
pytest
ruff check src tests
Keep the package zero-dep (stdlib only) and Python 3.11+ compatible. See CONTRIBUTING.md.
Security disclosures: see SECURITY.md. Bug reports and feature requests via issues.
License
MIT © 2026 — see LICENSE for full text.
Keywords: ollama CLI, claude code subagent, codex delegate, claude haiku cheap, openrouter cli, llm router, ai coding agent, cost reduction, token budget, quota tracker, LM Studio CLI, vLLM, llama.cpp, prompt cache, multi-backend LLM, local LLM, openai-compatible CLI, claude code rules, codex AGENTS.md, AI tool delegation, agent SDK companion.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_delegate-0.1.0.tar.gz.
File metadata
- Download URL: agent_delegate-0.1.0.tar.gz
- Upload date:
- Size: 18.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc7146b1c51353a4e8387c85f5990025b664aa3fc6595077ea40c2434b968145
|
|
| MD5 |
2ebdc6208740011d867cc1d633c19614
|
|
| BLAKE2b-256 |
f9771ead61a2cfe2ed4d5ea36e47158a172c505fcef7c8eb9afa411497d10268
|
Provenance
The following attestation bundles were made for agent_delegate-0.1.0.tar.gz:
Publisher:
publish.yml on aliaihub/agent-delegate
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agent_delegate-0.1.0.tar.gz -
Subject digest:
fc7146b1c51353a4e8387c85f5990025b664aa3fc6595077ea40c2434b968145 - Sigstore transparency entry: 1552156068
- Sigstore integration time:
-
Permalink:
aliaihub/agent-delegate@64fdf7d02f91aac6955a85a31b3f491886c5a182 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/aliaihub
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@64fdf7d02f91aac6955a85a31b3f491886c5a182 -
Trigger Event:
push
-
Statement type:
File details
Details for the file agent_delegate-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agent_delegate-0.1.0-py3-none-any.whl
- Upload date:
- Size: 28.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c4e3b0e123789b96c9cfc147cbfb20d30db8043725499adebf9cfd5b2a277dc
|
|
| MD5 |
690904c9fdc0b6178e190a2d881d0833
|
|
| BLAKE2b-256 |
8753807e4def62e2724c995176f808a9c6cde202f82a185bd45d094923ace843
|
Provenance
The following attestation bundles were made for agent_delegate-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on aliaihub/agent-delegate
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agent_delegate-0.1.0-py3-none-any.whl -
Subject digest:
3c4e3b0e123789b96c9cfc147cbfb20d30db8043725499adebf9cfd5b2a277dc - Sigstore transparency entry: 1552156350
- Sigstore integration time:
-
Permalink:
aliaihub/agent-delegate@64fdf7d02f91aac6955a85a31b3f491886c5a182 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/aliaihub
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@64fdf7d02f91aac6955a85a31b3f491886c5a182 -
Trigger Event:
push
-
Statement type: