Skip to main content

Fast deterministic model routing for custom AI agents

Project description

Hermes Router

Deterministic, fast, safety-first model routing for custom AI agents.

Hermes Router gives agents one local OpenAI-compatible endpoint that routes each chat request to the right configured model server. Simple work can go to fast local models, complex work to stronger reasoning models, fresh research to research tools, repo work to code models, and risky actions to human confirmation.

Use With Your Agent In 3 Minutes

Install the proxy extra:

pip install "hermes-router[proxy]"

Create first-run configs:

model-router init --preset lmstudio --yes

Start the local routing proxy:

model-router-proxy --config ~/.model-router/routing_proxy.yaml

Point any OpenAI-compatible agent/client at:

http://127.0.0.1:8082/v1

Useful follow-ups:

model-router validate-proxy-config --config ~/.model-router/routing_proxy.yaml
model-router doctor --config ~/.model-router/routing_proxy.yaml
curl http://127.0.0.1:8082/health

This project is intentionally a decision router only. It does not execute prompts, call model providers, load local model weights, browse the web, run shell commands, send messages, delete files, or purchase anything.

At a Glance

Need Hermes Router provides
Fast hot-path routing ModelRouter.route_fast(prompt) returns an engine string
Diagnostic decisions ModelRouter.route(prompt) returns scores, flags, reasons, and alternatives
CLI tooling decide, validate-config, dispatch-plan, and setup commands
Local/API flexibility YAML routing targets for local models, hosted APIs, vision, image generation, and custom adapters
Safety boundaries High-risk or invalid requests fail closed to human_confirm
Setup help Safe local scans, config recommendations, and opt-in Hugging Face download plans

Highlights

  • Deterministic heuristic routing with no LLM classification call.
  • Fast initialized hot path: router.route_fast(prompt) returns only the selected engine.
  • Rich receipt path: router.route(prompt) returns scores, reasons, rejected engines, alternatives, requirements, and safety flags.
  • YAML-driven engine catalog; model names are not hardcoded throughout the router.
  • OpenAI-compatible proxy for agents that only know how to call a local AI endpoint.
  • First-run model-router init for local proxy configs.
  • User-configurable routing targets for local models, hosted APIs, web/RAG tools, vision, image generation, or custom adapters.
  • Fail-closed safety: missing/invalid config and high-risk actions route to human_confirm.
  • Declarative availability checks for env vars, commands, and local paths.
  • Setup assistant for local/API/mixed model configuration and optional Hugging Face download plans.

Project Status

Hermes Router is a lean production-ready decision layer when embedded through the initialized Python API. The stable surface today is:

  • ModelRouter.route_fast(...) for production routing.
  • ModelRouter.route(...) for diagnostic and audit receipts.
  • Config-driven model/agent catalog.
  • Safe dry-run dispatch plans.
  • Local setup wizard and recommendations.

The local proxy is the main product path for agents. Direct dispatch beyond OpenAI-compatible chat forwarding remains intentionally behind explicit adapter boundaries and confirmation gates.

Install

Requires Python 3.11 or newer.

git clone https://github.com/doncazper/hermes-router.git
cd hermes-router
python -m pip install -e ".[dev]"

For normal use from PyPI:

pip install "hermes-router[proxy]"

If your shell does not provide python, use python3. If your system Python is older, use uv:

uv run --python 3.11 --with pytest --with PyYAML python -m pytest

Quick Start

Readable CLI output:

model-router decide "rewrite this text"

JSON receipt:

model-router decide --json "fix the repo and run tests"

Expected default routing examples:

Prompt Selected engine
rewrite this text fast_local
summarize these notes balanced_local
design a distributed task scheduler architecture reasoning_local
fix the repo and run tests code_agent
search the web for the latest TypeScript release notes web_research
extract text from this screenshot multimodal_vision
generate an image of a router dashboard image_generation
drop the production database human_confirm

Python API

Initialize once and reuse the router. Runtime calls stay in memory and do not re-read YAML, scan disk, or run setup helpers.

from model_router import ModelRouter

router = ModelRouter.from_config("configs/model_router.yaml")

# Production hot path: selected engine only.
engine = router.route_fast("fix the repo and run tests")

# Diagnostic path: scores, reasons, rejected engines, alternatives, and flags.
decision = router.route("fix the repo and run tests")

print(engine)
print(decision.requires_code_execution)

Use route_fast(...) for production routing, live routing loops, UI responsiveness, and high-volume classification. Use route(...) when you need a receipt, explanation, audit trail, or ranked alternatives. If you need a rich decision but not ranked alternatives:

decision = router.route("rewrite this text", include_alternatives=False)

For one-off scripts, the compatibility function remains available:

from model_router import route_prompt

decision = route_prompt("research current GLP-1 supplement trends")

The historical hermes.plugins.model_router import path remains available for backward compatibility, but new custom-agent integrations should use model_router.

CLI

After installation, you can use the console command:

hermes-router decide "rewrite this text"
hermes-router decide --json "fix the repo and run tests"
model-router decide "rewrite this text"

Use a custom catalog:

model-router decide \
  --config configs/model_router.local.yaml \
  "research current GLP-1 supplement trends"

Pass routing hints:

model-router decide \
  --attachment image \
  --force-engine multimodal_vision \
  --max-cost-tier medium \
  --max-latency-tier medium \
  "summarize this attachment"

Validate a config:

model-router validate-config
model-router validate-config --json

Create a dry-run dispatch plan:

model-router dispatch-plan "fix the repo and run tests"
model-router dispatch-plan --json "rewrite this text"
model-router dispatch-plan --include-alternatives --json "rewrite this text"

Dispatch plans only describe what a future adapter would do. They do not execute models, tools, shell commands, provider calls, or external actions. They skip ranked alternatives by default for speed; pass --include-alternatives when a full receipt is useful.

Local Routing Proxy

Most agents can talk to an OpenAI-compatible local endpoint. Install the optional proxy extra to expose one local endpoint that routes each chat request to the configured upstream model server:

model-router init --preset lmstudio --yes
model-router-proxy --config ~/.model-router/routing_proxy.yaml

Then point the agent at:

http://127.0.0.1:8082/v1

The proxy supports /v1/chat/completions, /v1/models, and /health. It calls initialized route_fast(...) once per chat request, maps the selected engine to a configured backend, overrides the outgoing backend model, and forwards to an OpenAI-compatible upstream such as LM Studio, llama.cpp server, LocalAI, or a frontier gateway. human_confirm returns HTTP 409 and is never forwarded. Tools are preserved by default and can be stripped per backend for small local models.

Packaged presets:

model-router init --preset lmstudio --yes
model-router init --preset ollama --yes
model-router init --preset llamacpp --yes
model-router init --preset localai --yes
model-router init --preset hosted-openai-compatible --yes

Use model-router doctor --config ~/.model-router/routing_proxy.yaml when a backend is unavailable or a model name/endpoint is wrong.

Hindsight Routing Logs

The proxy can write privacy-safe JSONL events for calibration and replay:

observability:
  enabled: true
  log_path: ~/.model-router/routing-events.jsonl
  prompt_capture: redacted_preview

By default events keep a prompt hash, length, estimated tokens, selected engine, scores, feature flags, backend, fallback status, and latencies. Raw prompts are not stored unless prompt_capture: full or MODEL_ROUTER_LOG_PROMPTS=1 is set. Use full capture only during deliberate calibration runs.

When a route is wrong, label it:

model-router feedback req-123 code_agent --notes "repo prompt routed too small"

Replay captured traffic against the current router:

python scripts/replay_routing_log.py \
  --events ~/.model-router/routing-events.jsonl \
  --feedback ~/.model-router/routing-feedback.jsonl \
  --json

Rows without full prompts are skipped for replay but still useful for aggregate latency, score, fallback, and route distribution analysis.

Troubleshooting

  • Wrong route: enable observability, label the request with model-router feedback, and replay logs before changing scoring.
  • Backend unavailable or wrong model: run model-router doctor --config ~/.model-router/routing_proxy.yaml and check /health. Both diagnostics verify backend reachability and, when /v1/models returns a model list, that each configured backend model is advertised by the upstream server.
  • human_confirm: the prompt matched a destructive, sending, purchase/payment, deployment, or other high-impact action. Use explicit safety overrides only in versioned configs.
  • Proxy auth: if proxy.api_key or proxy.api_key_env is configured, clients must send Authorization: Bearer <token>.
  • Logs/replay: default logs do not include raw prompts. Use prompt_capture: full or MODEL_ROUTER_LOG_PROMPTS=1 only during deliberate calibration runs.

Example Receipt

{
  "selected_engine": "code_agent",
  "complexity_score": 56,
  "risk_score": 38,
  "confidence_score": 90,
  "fallback_engine": "reasoning_local",
  "requires_confirmation": false,
  "requires_tools": true,
  "requires_freshness": false,
  "requires_code_execution": true,
  "requires_vision": false,
  "requires_image_generation": false,
  "config_valid": true,
  "availability_valid": true,
  "reasons": [
    "coding or repository intent",
    "tool use likely",
    "file, shell, or GitHub operation",
    "coding or repository work"
  ],
  "rejected_engines": [
    {
      "engine": "fast_local",
      "reason": "tools required but engine does not support tools"
    }
  ],
  "alternatives": [
    {
      "engine": "web_research",
      "rank_score": 61,
      "capability": 70,
      "trust": 60,
      "cost": 50,
      "latency": 75,
      "reasons": [
        "capability 70/100",
        "trust 60/100",
        "cost 50/100",
        "latency 75/100"
      ]
    }
  ]
}

Receipts intentionally do not include the raw prompt.

Configure Engines

The default catalog lives at configs/model_router.yaml. Machine-specific settings should go in configs/model_router.local.yaml and be passed with --config.

Routing targets map semantic routes to configured engines:

routing_targets:
  simple: fast_local
  balanced: balanced_local
  reasoning: reasoning_local
  coding: code_agent
  research: web_research
  vision: multimodal_vision
  image_generation: image_generation
  confirmation: human_confirm

Human confirmation is a default-on safety feature. Escape hatches are explicit, scoped config choices:

safety:
  require_human_confirmation: true
  confirmation_overrides:
    allow_destructive_actions: false
    allow_send_actions: false
    allow_purchase_actions: false
    allow_high_impact_external_actions: false
    allow_ambiguous_high_impact: false

Each target points at an engine entry:

engines:
  claude_code:
    provider: anthropic
    model: claude-code
    adapter: claude_code
    strengths:
      - repository edits
      - tests
    max_context: 200000
    cost_tier: high
    latency_tier: medium
    capability: 90
    trust: 90
    cost: 80
    latency: 45
    supports_tools: true
    enabled: true
    fallback: code_agent
    availability:
      status: auto
      required_commands:
        - claude

Coding does not have to use Codex. You can point routing_targets.coding at claude_code, codex, code_agent, a local coding model, or any custom engine you define.

Optional numeric metadata uses a 0-100 scale:

  • capability: model/agent strength.
  • trust: reliability for sensitive work.
  • cost: relative cost, where higher means more expensive.
  • latency: relative latency, where higher means slower.

These values rank compatible alternatives. They do not override the configured target when that target is enabled, available, and compatible.

Setup Assistant

The setup assistant can create a local config without guessing what you want.

Scan your machine:

model-router setup scan
model-router setup scan --json

Get recommendations:

model-router setup recommend
model-router setup recommend --json

Recommendations are produced by a bundled, versioned model advisor catalog at hermes/plugins/model_router/data/model_catalog.yaml. The advisor detects basic local hardware signals such as RAM, CPU architecture, Apple Silicon, and free disk space, then ranks setup-time Hugging Face suggestions for each route. This does not run during route_fast(...), route(...), or ordinary decide calls.

Run the wizard:

model-router setup wizard \
  --output configs/model_router.local.yaml

Write a recommended config non-interactively:

model-router setup write \
  --output configs/model_router.local.yaml

setup write will not overwrite an existing file unless --force is passed.

The wizard asks whether you want:

  • Local LLMs only.
  • API keys / hosted models.
  • A mix of local models, hosted APIs, and agent tools.

It then walks each main route and shows numbered local model choices plus hardware-aware recommended downloads when a local role is missing. Downloads are never run by ordinary routing commands. They require explicit confirmation.

The scanner includes current LM Studio model storage at ~/.lmstudio/models, plus Ollama, Hugging Face cache, and common local model folders, so wizard choices should reflect the models your local tools can see.

If recommended downloads are available and the Hugging Face hf CLI is missing, the wizard warns at the beginning and asks whether to install it into the current Python environment before model choices start. Declining is safe; the router can still write the config, and downloads can be run later.

Plan downloads:

model-router setup download
model-router setup download --route fast_local

Run an approved Hugging Face download:

model-router setup download \
  --route balanced_local \
  --repo-id custom-org/custom-model \
  --execute

For non-interactive scripts, add --yes.

Engine Roles

Role Default coverage
Intent classifier/router intent_router plus deterministic router code
Fast response/summarization fast_local, balanced_local
Deep reasoning/planning reasoning_local
Coding/repo work code_agent, with optional codex or claude_code
Web research/RAG web_research
Multimodal/vision/OCR multimodal_vision
Image generation image_generation
Confirmation/fail-closed human_confirm

Safety Model

  • The router never executes user requests.
  • The router never calls hosted model APIs.
  • The router never loads local model weights.
  • The router never sends email, deletes files, buys anything, or runs shell commands.
  • High-risk destructive, sending, purchasing, payment, scheduling, publishing, and external-action prompts require confirmation by default.
  • Confirmation escape hatches must be explicit in safety.confirmation_overrides; the router does not learn approvals or silently relax safety rules.
  • force_engine cannot bypass human confirmation.
  • Missing or invalid config routes to human_confirm.
  • Unavailable or incompatible engines are skipped through configured fallbacks.
  • Receipts omit raw prompt text.

Performance

Use the initialized API for runtime performance:

python scripts/benchmark_route_fast.py
python scripts/benchmark_route_fast.py --json
python scripts/check_route_fast_latency.py --json

route_fast(...) is the production hot path. It returns only the selected engine string. The scorer precompiles its stable regex patterns at import time, and initialized routers keep YAML config and availability results in memory. The richer route(...) path does more work by design because it builds scores, explanations, rejected-engine details, alternatives, and receipt fields.

The CLI is intended for humans, diagnostics, and scripts. Latency-sensitive services should not spawn a Python process per prompt; instantiate ModelRouter once and call the Python API in process.

The default production SLO for initialized ordinary prompts is <= 25 us best sample and <= 50 us mean sample for route_fast(...). The benchmark guard enforces those budgets in CI. See Production readiness for the API contract, benchmark command, SLOs, and logging guidance.

Install For Local Testing

Use a virtual environment so the router does not modify a managed Python installation:

cd /path/to/hermes-router
python3.11 -m venv .venv
source .venv/bin/activate
python -m pip install -e ".[dev]"
model-router decide --json "fix the repo and run tests"

For a non-editable install from GitHub:

python -m pip install "git+https://github.com/doncazper/hermes-router.git@v0.5.0"
model-router decide "rewrite this text"

The package exposes console commands, hermes-router and model-router, plus the importable Python API:

from model_router import ModelRouter

router = ModelRouter.from_config()
engine = router.route_fast(prompt)

The default catalog is included as package data, so ModelRouter.from_config() works after wheel installation without relying on the repository checkout. Pass an explicit config path when an embedding app needs its own engine catalog.

See examples/basic_custom_agent.py for a minimal host-neutral integration.

Hermes Router does not currently claim any host-app plugin manifest or automatic per-turn model switching contract. Embedding applications should use their own runtime integration boundary and call the stable route_fast(...) production API.

Development

Run tests:

python -m pytest

Run lint:

python -m ruff check .

With uv and Python 3.11:

uv run --python 3.11 --with pytest --with PyYAML python -m pytest
uv run --python 3.11 --with ruff --with PyYAML python -m ruff check .

Project Layout

model_router/
  __init__.py          # Generic public import path
hermes/plugins/model_router/
  availability.py     # Non-executing availability validation
  cli.py              # CLI entrypoint
  config.py           # YAML catalog loading and validation
  data/               # Packaged default config
  dispatch.py         # Safe dry-run dispatch plans
  models.py           # Dataclass models and JSON-safe serialization
  policy.py           # Engine selection and fail-closed fallback rules
  receipts.py         # Routing receipt helpers
  scorer.py           # Deterministic heuristic prompt scoring
  setup_assistant.py  # Local setup scanning and config recommendations
configs/
  model_router.yaml
  model_router.local.example.yaml
docs/
  adapter-contract.md
  model-router.md
examples/
  basic_custom_agent.py
scripts/
  benchmark_route_fast.py
tests/

The model_router package is the generic public import path. The hermes/plugins path is retained only as a backward-compatible legacy namespace. Neither path is a host-application plugin registration point.

Documentation

Roadmap

v0.5: Usable Local Proxy Beta

  • Keep the proxy-first install path polished: pip install "hermes-router[proxy]".
  • Keep model-router init, validate-proxy-config, doctor, /health, log rotation, and provider presets reliable.
  • Publish releases with a changelog, GitHub release notes, and benchmark output.

v0.6: Gateway Rename And Passthrough

  • Rename the product toward a broader OpenAI-compatible local gateway identity.
  • Add router mode and passthrough mode.
  • Keep legacy command/import aliases for one release.
  • Add backend request overrides for temperature, context, max tokens, and common generation controls.
  • Add first-class llama.cpp and MLX gateway templates.

v0.6.5: Managed Local Runtime Beta

  • Add explicit process configuration for llama.cpp and MLX backends.
  • Add start, stop, restart, status, and logs commands for configured runtimes.
  • Detect port conflicts and capture per-backend process logs.
  • Keep process management opt-in and transparent; never auto-start arbitrary commands without user configuration.

v1.0: Finished Local AI Gateway

  • Version the public config schema and provide migrations.
  • Use labeled real-world routing logs as release-blocking regression checks.
  • Document security/privacy expectations for logs, proxy auth, process commands, and local network exposure.
  • Decide whether to ship a web UI or keep the product CLI/proxy-first.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hermes_router-0.5.1.tar.gz (98.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hermes_router-0.5.1-py3-none-any.whl (75.3 kB view details)

Uploaded Python 3

File details

Details for the file hermes_router-0.5.1.tar.gz.

File metadata

  • Download URL: hermes_router-0.5.1.tar.gz
  • Upload date:
  • Size: 98.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hermes_router-0.5.1.tar.gz
Algorithm Hash digest
SHA256 9f980974ff4859b1a7de3c86002f2be768ed7257d1e2d22b78aebcaeed7589db
MD5 21275cec89a7377b53554c244350337e
BLAKE2b-256 f1a559cee98833c56b8ba75ab822a760cff113e60f5761310f1ff46324044cbc

See more details on using hashes here.

Provenance

The following attestation bundles were made for hermes_router-0.5.1.tar.gz:

Publisher: publish.yml on doncazper/hermes-router

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hermes_router-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: hermes_router-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 75.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hermes_router-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c113c7aada4414b05118989ffa7d1982366ffce926b129c9ebefc685e280fa9e
MD5 e4cee67c0524389a1dc31cb5521b423d
BLAKE2b-256 f164b634090184a5869e4410a3e761b0cb88e2886571c431f0a6013f202114f8

See more details on using hashes here.

Provenance

The following attestation bundles were made for hermes_router-0.5.1-py3-none-any.whl:

Publisher: publish.yml on doncazper/hermes-router

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page