ModelRouter: fast deterministic model routing and OpenAI-compatible proxying for custom AI agents
Project description
ModelRouter
Deterministic, fast, safety-first model routing for custom AI agents.
ModelRouter gives agents one local OpenAI-compatible endpoint that routes each chat request to the right configured model server. Simple work can go to fast local models, complex work to stronger reasoning models, fresh research to research tools, repo work to code models, and risky actions to human confirmation.
Use With Your Agent In 3 Minutes
Install the proxy extra:
pip install "hermes-router[proxy]"
Create first-run configs:
model-router init --preset lmstudio --yes
Start the local routing proxy:
model-router-proxy --config ~/.model-router/routing_proxy.yaml
Point any OpenAI-compatible agent/client at:
http://127.0.0.1:8082/v1
Useful follow-ups:
model-router validate-proxy-config --config ~/.model-router/routing_proxy.yaml
model-router doctor --config ~/.model-router/routing_proxy.yaml
curl http://127.0.0.1:8082/health
This project is intentionally a decision router only. It does not execute prompts, call model providers, load local model weights, browse the web, run shell commands, send messages, delete files, or purchase anything.
At a Glance
| Need | ModelRouter provides |
|---|---|
| Fast hot-path routing | ModelRouter.route_fast(prompt) returns an engine string |
| Diagnostic decisions | ModelRouter.route(prompt) returns scores, flags, reasons, and alternatives |
| CLI tooling | decide, validate-config, dispatch-plan, and setup commands |
| Local/API flexibility | YAML routing targets for local models, hosted APIs, vision, image generation, and custom adapters |
| Safety boundaries | High-risk or invalid requests fail closed to human_confirm |
| Setup help | Safe local scans, config recommendations, and opt-in Hugging Face download plans |
Highlights
- Deterministic heuristic routing with no LLM classification call.
- Fast initialized hot path:
router.route_fast(prompt)returns only the selected engine. - Rich receipt path:
router.route(prompt)returns scores, reasons, rejected engines, alternatives, requirements, and safety flags. - YAML-driven engine catalog; model names are not hardcoded throughout the router.
- OpenAI-compatible proxy for agents that only know how to call a local AI endpoint.
- First-run
model-router initfor local proxy configs. - User-configurable routing targets for local models, hosted APIs, web/RAG tools, vision, image generation, or custom adapters.
- Fail-closed safety: missing/invalid config and high-risk actions route to
human_confirm. - Declarative availability checks for env vars, commands, and local paths.
- Setup assistant for local/API/mixed model configuration and optional Hugging Face download plans.
Project Status
ModelRouter is a lean production-ready decision layer when embedded through the initialized Python API. The stable surface today is:
ModelRouter.route_fast(...)for production routing.ModelRouter.route(...)for diagnostic and audit receipts.- Config-driven model/agent catalog.
- Safe dry-run dispatch plans.
- Local setup wizard and recommendations.
The local proxy is the main product path for agents. Direct dispatch beyond OpenAI-compatible chat forwarding remains intentionally behind explicit adapter boundaries and confirmation gates.
Install
Requires Python 3.11 or newer.
git clone https://github.com/doncazper/model-router.git
cd model-router
python -m pip install -e ".[dev]"
For normal use from PyPI:
pip install "hermes-router[proxy]"
ModelRouter began as Hermes Router and was renamed after evolving into a
generic OpenAI-compatible routing proxy for local/custom agents. The PyPI
distribution name remains hermes-router for compatibility because
model-router is already occupied on PyPI. The primary command and Python API
are model-router, model-router-proxy, and import model_router.
If your shell does not provide python, use python3. If your system Python is
older, use uv:
uv run --python 3.11 --with pytest --with PyYAML python -m pytest
Quick Start
Readable CLI output:
model-router decide "rewrite this text"
JSON receipt:
model-router decide --json "fix the repo and run tests"
Expected default routing examples:
| Prompt | Selected engine |
|---|---|
rewrite this text |
fast_local |
summarize these notes |
balanced_local |
design a distributed task scheduler architecture |
reasoning_local |
fix the repo and run tests |
code_agent |
search the web for the latest TypeScript release notes |
web_research |
extract text from this screenshot |
multimodal_vision |
generate an image of a router dashboard |
image_generation |
drop the production database |
human_confirm |
Python API
Initialize once and reuse the router. Runtime calls stay in memory and do not re-read YAML, scan disk, or run setup helpers.
from model_router import ModelRouter
router = ModelRouter.from_config("configs/model_router.yaml")
# Production hot path: selected engine only.
engine = router.route_fast("fix the repo and run tests")
# Diagnostic path: scores, reasons, rejected engines, alternatives, and flags.
decision = router.route("fix the repo and run tests")
print(engine)
print(decision.requires_code_execution)
Use route_fast(...) for production routing, live routing loops, UI
responsiveness, and high-volume classification. Use route(...) when you need
a receipt, explanation, audit trail, or ranked alternatives. If you need a rich
decision but not ranked alternatives:
decision = router.route("rewrite this text", include_alternatives=False)
For one-off scripts, the compatibility function remains available:
from model_router import route_prompt
decision = route_prompt("research current GLP-1 supplement trends")
The historical hermes.plugins.model_router import path remains available for
backward compatibility, but new custom-agent integrations should use
model_router.
CLI
After installation, you can use the console command:
model-router decide "rewrite this text"
model-router decide --json "fix the repo and run tests"
The old hermes-router command remains as a compatibility alias for existing
scripts.
Use a custom catalog:
model-router decide \
--config configs/model_router.local.yaml \
"research current GLP-1 supplement trends"
Pass routing hints:
model-router decide \
--attachment image \
--force-engine multimodal_vision \
--max-cost-tier medium \
--max-latency-tier medium \
"summarize this attachment"
Validate a config:
model-router validate-config
model-router validate-config --json
Create a dry-run dispatch plan:
model-router dispatch-plan "fix the repo and run tests"
model-router dispatch-plan --json "rewrite this text"
model-router dispatch-plan --include-alternatives --json "rewrite this text"
Dispatch plans only describe what a future adapter would do. They do not execute
models, tools, shell commands, provider calls, or external actions. They skip
ranked alternatives by default for speed; pass --include-alternatives when a
full receipt is useful.
Local Routing Proxy
Most agents can talk to an OpenAI-compatible local endpoint. Install the optional proxy extra to expose one local endpoint that routes each chat request to the configured upstream model server:
model-router init --preset lmstudio --yes
model-router-proxy --config ~/.model-router/routing_proxy.yaml
Then point the agent at:
http://127.0.0.1:8082/v1
The proxy supports /v1/chat/completions, /v1/models, and /health. It
calls initialized route_fast(...) once per chat request, maps the selected
engine to a configured backend, overrides the outgoing backend model, and
forwards to an OpenAI-compatible upstream such as LM Studio, llama.cpp server,
LocalAI, or a frontier gateway. human_confirm returns HTTP 409 and is never
forwarded. Tools are preserved by default and can be stripped per backend for
small local models.
Packaged presets:
model-router init --preset lmstudio --yes
model-router init --preset ollama --yes
model-router init --preset llamacpp --yes
model-router init --preset localai --yes
model-router init --preset hosted-openai-compatible --yes
Use model-router doctor --config ~/.model-router/routing_proxy.yaml when a
backend is unavailable or a model name/endpoint is wrong.
Hindsight Routing Logs
The proxy can write privacy-safe JSONL events for calibration and replay:
observability:
enabled: true
log_path: ~/.model-router/routing-events.jsonl
prompt_capture: redacted_preview
By default events keep a prompt hash, length, estimated tokens, selected engine,
scores, feature flags, backend, fallback status, and latencies. Raw prompts are
not stored unless prompt_capture: full or MODEL_ROUTER_LOG_PROMPTS=1 is set.
Use full capture only during deliberate calibration runs.
When a route is wrong, label it:
model-router feedback req-123 code_agent --notes "repo prompt routed too small"
Replay captured traffic against the current router:
python scripts/replay_routing_log.py \
--events ~/.model-router/routing-events.jsonl \
--feedback ~/.model-router/routing-feedback.jsonl \
--json
Rows without full prompts are skipped for replay but still useful for aggregate latency, score, fallback, and route distribution analysis.
Troubleshooting
- Wrong route: enable observability, label the request with
model-router feedback, and replay logs before changing scoring. - Backend unavailable or wrong model: run
model-router doctor --config ~/.model-router/routing_proxy.yamland check/health. Both diagnostics verify backend reachability and, when/v1/modelsreturns a model list, that each configured backend model is advertised by the upstream server. human_confirm: the prompt matched a destructive, sending, purchase/payment, deployment, or other high-impact action. Use explicit safety overrides only in versioned configs.- Proxy auth: if
proxy.api_keyorproxy.api_key_envis configured, clients must sendAuthorization: Bearer <token>. - Logs/replay: default logs do not include raw prompts. Use
prompt_capture: fullorMODEL_ROUTER_LOG_PROMPTS=1only during deliberate calibration runs.
Example Receipt
{
"selected_engine": "code_agent",
"complexity_score": 56,
"risk_score": 38,
"confidence_score": 90,
"fallback_engine": "reasoning_local",
"requires_confirmation": false,
"requires_tools": true,
"requires_freshness": false,
"requires_code_execution": true,
"requires_vision": false,
"requires_image_generation": false,
"config_valid": true,
"availability_valid": true,
"reasons": [
"coding or repository intent",
"tool use likely",
"file, shell, or GitHub operation",
"coding or repository work"
],
"rejected_engines": [
{
"engine": "fast_local",
"reason": "tools required but engine does not support tools"
}
],
"alternatives": [
{
"engine": "web_research",
"rank_score": 61,
"capability": 70,
"trust": 60,
"cost": 50,
"latency": 75,
"reasons": [
"capability 70/100",
"trust 60/100",
"cost 50/100",
"latency 75/100"
]
}
]
}
Receipts intentionally do not include the raw prompt.
Configure Engines
The default catalog lives at configs/model_router.yaml. Machine-specific
settings should go in configs/model_router.local.yaml and be passed with
--config.
Routing targets map semantic routes to configured engines:
routing_targets:
simple: fast_local
balanced: balanced_local
reasoning: reasoning_local
coding: code_agent
research: web_research
vision: multimodal_vision
image_generation: image_generation
confirmation: human_confirm
Human confirmation is a default-on safety feature. Escape hatches are explicit, scoped config choices:
safety:
require_human_confirmation: true
confirmation_overrides:
allow_destructive_actions: false
allow_send_actions: false
allow_purchase_actions: false
allow_high_impact_external_actions: false
allow_ambiguous_high_impact: false
Each target points at an engine entry:
engines:
claude_code:
provider: anthropic
model: claude-code
adapter: claude_code
strengths:
- repository edits
- tests
max_context: 200000
cost_tier: high
latency_tier: medium
capability: 90
trust: 90
cost: 80
latency: 45
supports_tools: true
enabled: true
fallback: code_agent
availability:
status: auto
required_commands:
- claude
Coding does not have to use Codex. You can point routing_targets.coding at
claude_code, codex, code_agent, a local coding model, or any custom
engine you define.
Optional numeric metadata uses a 0-100 scale:
capability: model/agent strength.trust: reliability for sensitive work.cost: relative cost, where higher means more expensive.latency: relative latency, where higher means slower.
These values rank compatible alternatives. They do not override the configured target when that target is enabled, available, and compatible.
Setup Assistant
The setup assistant can create a local config without guessing what you want.
Scan your machine:
model-router setup scan
model-router setup scan --json
Get recommendations:
model-router setup recommend
model-router setup recommend --json
Recommendations are produced by a bundled, versioned model advisor catalog at
hermes/plugins/model_router/data/model_catalog.yaml. The advisor detects basic
local hardware signals such as RAM, CPU architecture, Apple Silicon, and free
disk space, then ranks setup-time Hugging Face suggestions for each route. This
does not run during route_fast(...), route(...), or ordinary decide calls.
Run the wizard:
model-router setup wizard \
--output configs/model_router.local.yaml
Write a recommended config non-interactively:
model-router setup write \
--output configs/model_router.local.yaml
setup write will not overwrite an existing file unless --force is passed.
The wizard asks whether you want:
- Local LLMs only.
- API keys / hosted models.
- A mix of local models, hosted APIs, and agent tools.
It then walks each main route and shows numbered local model choices plus hardware-aware recommended downloads when a local role is missing. Downloads are never run by ordinary routing commands. They require explicit confirmation.
The scanner includes current LM Studio model storage at
~/.lmstudio/models, plus Ollama, Hugging Face cache, and common local model
folders, so wizard choices should reflect the models your local tools can see.
If recommended downloads are available and the Hugging Face hf CLI is missing,
the wizard warns at the beginning and asks whether to install it into the current
Python environment before model choices start. Declining is safe; the router can
still write the config, and downloads can be run later.
Plan downloads:
model-router setup download
model-router setup download --route fast_local
Run an approved Hugging Face download:
model-router setup download \
--route balanced_local \
--repo-id custom-org/custom-model \
--execute
For non-interactive scripts, add --yes.
Engine Roles
| Role | Default coverage |
|---|---|
| Intent classifier/router | intent_router plus deterministic router code |
| Fast response/summarization | fast_local, balanced_local |
| Deep reasoning/planning | reasoning_local |
| Coding/repo work | code_agent, with optional codex or claude_code |
| Web research/RAG | web_research |
| Multimodal/vision/OCR | multimodal_vision |
| Image generation | image_generation |
| Confirmation/fail-closed | human_confirm |
Safety Model
- The router never executes user requests.
- The router never calls hosted model APIs.
- The router never loads local model weights.
- The router never sends email, deletes files, buys anything, or runs shell commands.
- High-risk destructive, sending, purchasing, payment, scheduling, publishing, and external-action prompts require confirmation by default.
- Confirmation escape hatches must be explicit in
safety.confirmation_overrides; the router does not learn approvals or silently relax safety rules. force_enginecannot bypass human confirmation.- Missing or invalid config routes to
human_confirm. - Unavailable or incompatible engines are skipped through configured fallbacks.
- Receipts omit raw prompt text.
Performance
Use the initialized API for runtime performance:
python scripts/benchmark_route_fast.py
python scripts/benchmark_route_fast.py --json
python scripts/check_route_fast_latency.py --json
route_fast(...) is the production hot path. It returns only the selected
engine string. The scorer precompiles its stable regex patterns at import time, and
initialized routers keep YAML config and availability results in memory. The
richer route(...) path does more work by design because it builds scores,
explanations, rejected-engine details, alternatives, and receipt fields.
The CLI is intended for humans, diagnostics, and scripts. Latency-sensitive
services should not spawn a Python process per prompt; instantiate ModelRouter
once and call the Python API in process.
The default production SLO for initialized ordinary prompts is <= 25 us best
sample and <= 50 us mean sample for route_fast(...). The benchmark guard
enforces those budgets in CI. See
Production readiness for the API contract,
benchmark command, SLOs, and logging guidance.
Install For Local Testing
Use a virtual environment so the router does not modify a managed Python installation:
cd /path/to/model-router
python3.11 -m venv .venv
source .venv/bin/activate
python -m pip install -e ".[dev]"
model-router decide --json "fix the repo and run tests"
For a non-editable install from GitHub:
python -m pip install "git+https://github.com/doncazper/model-router.git@v0.5.0"
model-router decide "rewrite this text"
The package exposes console commands, model-router and the legacy
hermes-router alias, plus
the importable Python API:
from model_router import ModelRouter
router = ModelRouter.from_config()
engine = router.route_fast(prompt)
The default catalog is included as package data, so ModelRouter.from_config()
works after wheel installation without relying on the repository checkout. Pass
an explicit config path when an embedding app needs its own engine catalog.
See examples/basic_custom_agent.py for a minimal host-neutral integration.
ModelRouter does not currently claim any host-app plugin manifest or automatic
per-turn model switching contract. Embedding applications should use their own
runtime integration boundary and call the stable route_fast(...) production
API.
Development
Run tests:
python -m pytest
Run lint:
python -m ruff check .
With uv and Python 3.11:
uv run --python 3.11 --with pytest --with PyYAML python -m pytest
uv run --python 3.11 --with ruff --with PyYAML python -m ruff check .
Project Layout
model_router/
__init__.py # Generic public import path
hermes/plugins/model_router/
availability.py # Non-executing availability validation
cli.py # CLI entrypoint
config.py # YAML catalog loading and validation
data/ # Packaged default config
dispatch.py # Safe dry-run dispatch plans
models.py # Dataclass models and JSON-safe serialization
policy.py # Engine selection and fail-closed fallback rules
receipts.py # Routing receipt helpers
scorer.py # Deterministic heuristic prompt scoring
setup_assistant.py # Local setup scanning and config recommendations
configs/
model_router.yaml
model_router.local.example.yaml
docs/
adapter-contract.md
model-router.md
examples/
basic_custom_agent.py
scripts/
benchmark_route_fast.py
tests/
The model_router package is the generic public import path. The
hermes/plugins path is retained only as a backward-compatible legacy namespace.
Neither path is a host-application plugin registration point.
Documentation
Roadmap
v0.5: Usable Local Proxy Beta
- Keep the proxy-first install path polished:
pip install "hermes-router[proxy]". - Keep
model-router init,validate-proxy-config,doctor,/health, log rotation, and provider presets reliable. - Publish releases with a changelog, GitHub release notes, and benchmark output.
v0.6: Passthrough And Gateway Mode
- Add router mode and passthrough mode.
- Keep legacy command/import aliases for one release.
- Add backend request overrides for temperature, context, max tokens, and common generation controls.
- Add first-class llama.cpp and MLX gateway templates.
v0.6.5: Managed Local Runtime Beta
- Add explicit process configuration for llama.cpp and MLX backends.
- Add start, stop, restart, status, and logs commands for configured runtimes.
- Detect port conflicts and capture per-backend process logs.
- Keep process management opt-in and transparent; never auto-start arbitrary commands without user configuration.
v1.0: Finished Local AI Gateway
- Version the public config schema and provide migrations.
- Use labeled real-world routing logs as release-blocking regression checks.
- Document security/privacy expectations for logs, proxy auth, process commands, and local network exposure.
- Decide whether to ship a web UI or keep the product CLI/proxy-first.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hermes_router-0.5.3.tar.gz.
File metadata
- Download URL: hermes_router-0.5.3.tar.gz
- Upload date:
- Size: 102.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3791b7a16b2afe8f276931ab5bde4395d0693a5eea175a8ca3ef04256211d676
|
|
| MD5 |
6da15a46999372c12b63b11673cdb383
|
|
| BLAKE2b-256 |
dcc7ea90a2197cf038022112b82be6596a7dbd8c266ba175ced00a1527fa4039
|
Provenance
The following attestation bundles were made for hermes_router-0.5.3.tar.gz:
Publisher:
publish.yml on doncazper/model-router
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hermes_router-0.5.3.tar.gz -
Subject digest:
3791b7a16b2afe8f276931ab5bde4395d0693a5eea175a8ca3ef04256211d676 - Sigstore transparency entry: 1904037841
- Sigstore integration time:
-
Permalink:
doncazper/model-router@a9ff6e09c5f028fc4ce6cdfc0ba6c8dac183df36 -
Branch / Tag:
refs/tags/v0.5.3 - Owner: https://github.com/doncazper
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a9ff6e09c5f028fc4ce6cdfc0ba6c8dac183df36 -
Trigger Event:
release
-
Statement type:
File details
Details for the file hermes_router-0.5.3-py3-none-any.whl.
File metadata
- Download URL: hermes_router-0.5.3-py3-none-any.whl
- Upload date:
- Size: 76.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1365c4d8bfec3a9c226972e8bd77996a6c23b0f2a83e937777f63e73eaab05ff
|
|
| MD5 |
bf0c6ddf51f01d0963ecd45c02495b48
|
|
| BLAKE2b-256 |
c71790ea5f62001a5472e60288dc0fe4842cc0e7390d2c13d54fb23c9ccfc075
|
Provenance
The following attestation bundles were made for hermes_router-0.5.3-py3-none-any.whl:
Publisher:
publish.yml on doncazper/model-router
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hermes_router-0.5.3-py3-none-any.whl -
Subject digest:
1365c4d8bfec3a9c226972e8bd77996a6c23b0f2a83e937777f63e73eaab05ff - Sigstore transparency entry: 1904038137
- Sigstore integration time:
-
Permalink:
doncazper/model-router@a9ff6e09c5f028fc4ce6cdfc0ba6c8dac183df36 -
Branch / Tag:
refs/tags/v0.5.3 - Owner: https://github.com/doncazper
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a9ff6e09c5f028fc4ce6cdfc0ba6c8dac183df36 -
Trigger Event:
release
-
Statement type: