ModelRouter: fast deterministic model routing and OpenAI-compatible proxying for custom AI agents

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

doncazper

These details have not been verified by PyPI

Project description

ModelRouter

Deterministic, fast, safety-first model routing for custom AI agents.

ModelRouter gives agents one local OpenAI-compatible endpoint that routes each chat request to the right configured model server. Simple work can go to fast local models, complex work to stronger reasoning models, fresh research to research tools, repo work to code models, and risky actions to human confirmation.

Use With Your Agent In 3 Minutes

Install the proxy extra:

pip install "hermes-router[proxy]"

Create first-run configs:

model-router init --preset lmstudio --yes

Start the local routing proxy:

model-router-proxy --config ~/.model-router/routing_proxy.yaml

Point any OpenAI-compatible agent/client at:

http://127.0.0.1:8082/v1

Useful follow-ups:

model-router validate-proxy-config --config ~/.model-router/routing_proxy.yaml
model-router doctor --config ~/.model-router/routing_proxy.yaml
curl http://127.0.0.1:8082/health

This project is intentionally a decision router only. It does not execute prompts, call model providers, load local model weights, browse the web, run shell commands, send messages, delete files, or purchase anything.

At a Glance

Need	ModelRouter provides
Fast hot-path routing	`ModelRouter.route_fast(prompt)` returns an engine string
Diagnostic decisions	`ModelRouter.route(prompt)` returns scores, flags, reasons, and alternatives
CLI tooling	`decide`, `validate-config`, `dispatch-plan`, and `setup` commands
Local/API flexibility	YAML routing targets for local models, hosted APIs, vision, image generation, and custom adapters
Safety boundaries	High-risk or invalid requests fail closed to `human_confirm`
Setup help	Safe local scans, config recommendations, and opt-in Hugging Face download plans

Highlights

Deterministic heuristic routing with no LLM classification call.
Fast initialized hot path: router.route_fast(prompt) returns only the selected engine.
Rich receipt path: router.route(prompt) returns scores, reasons, rejected engines, alternatives, requirements, and safety flags.
YAML-driven engine catalog; model names are not hardcoded throughout the router.
OpenAI-compatible proxy for agents that only know how to call a local AI endpoint.
First-run model-router init for local proxy configs.
User-configurable routing targets for local models, hosted APIs, web/RAG tools, vision, image generation, or custom adapters.
Fail-closed safety: missing/invalid config and high-risk actions route to human_confirm.
Declarative availability checks for env vars, commands, and local paths.
Setup assistant for local/API/mixed model configuration and optional Hugging Face download plans.

Project Status

ModelRouter is a lean production-ready decision layer when embedded through the initialized Python API. The stable surface today is:

ModelRouter.route_fast(...) for production routing.
ModelRouter.route(...) for diagnostic and audit receipts.
Config-driven model/agent catalog.
Safe dry-run dispatch plans.
Local setup wizard and recommendations.

The local proxy is the main product path for agents. Direct dispatch beyond OpenAI-compatible chat forwarding remains intentionally behind explicit adapter boundaries and confirmation gates.

Install

Requires Python 3.11 or newer.

git clone https://github.com/doncazper/model-router.git
cd model-router
python -m pip install -e ".[dev]"

For normal use from PyPI:

pip install "hermes-router[proxy]"

ModelRouter began as Hermes Router and was renamed after evolving into a generic OpenAI-compatible routing proxy for local/custom agents. The PyPI distribution name remains hermes-router for compatibility because model-router is already occupied on PyPI. The primary command and Python API are model-router, model-router-proxy, and import model_router.

If your shell does not provide python, use python3. If your system Python is older, use uv:

uv run --python 3.11 --with pytest --with PyYAML python -m pytest

Quick Start

Readable CLI output:

model-router decide "rewrite this text"

JSON receipt:

model-router decide --json "fix the repo and run tests"

Expected default routing examples:

Prompt	Selected engine
`rewrite this text`	`fast_local`
`summarize these notes`	`balanced_local`
`design a distributed task scheduler architecture`	`reasoning_local`
`fix the repo and run tests`	`code_agent`
`search the web for the latest TypeScript release notes`	`web_research`
`extract text from this screenshot`	`multimodal_vision`
`generate an image of a router dashboard`	`image_generation`
`drop the production database`	`human_confirm`

Python API

Initialize once and reuse the router. Runtime calls stay in memory and do not re-read YAML, scan disk, or run setup helpers.

from model_router import ModelRouter

router = ModelRouter.from_config("configs/model_router.yaml")

# Production hot path: selected engine only.
engine = router.route_fast("fix the repo and run tests")

# Diagnostic path: scores, reasons, rejected engines, alternatives, and flags.
decision = router.route("fix the repo and run tests")

print(engine)
print(decision.requires_code_execution)

Use route_fast(...) for production routing, live routing loops, UI responsiveness, and high-volume classification. Use route(...) when you need a receipt, explanation, audit trail, or ranked alternatives. If you need a rich decision but not ranked alternatives:

decision = router.route("rewrite this text", include_alternatives=False)

For one-off scripts, the compatibility function remains available:

from model_router import route_prompt

decision = route_prompt("research current GLP-1 supplement trends")

The historical hermes.plugins.model_router import path remains available for backward compatibility, but new custom-agent integrations should use model_router.

CLI

After installation, you can use the console command:

model-router decide "rewrite this text"
model-router decide --json "fix the repo and run tests"

The old hermes-router command remains as a compatibility alias for existing scripts.

Use a custom catalog:

model-router decide \
  --config configs/model_router.local.yaml \
  "research current GLP-1 supplement trends"

Pass routing hints:

model-router decide \
  --attachment image \
  --force-engine multimodal_vision \
  --max-cost-tier medium \
  --max-latency-tier medium \
  "summarize this attachment"

Validate a config:

model-router validate-config
model-router validate-config --json

Create a dry-run dispatch plan:

model-router dispatch-plan "fix the repo and run tests"
model-router dispatch-plan --json "rewrite this text"
model-router dispatch-plan --include-alternatives --json "rewrite this text"

Dispatch plans only describe what a future adapter would do. They do not execute models, tools, shell commands, provider calls, or external actions. They skip ranked alternatives by default for speed; pass --include-alternatives when a full receipt is useful.

Local Routing Proxy

Most agents can talk to an OpenAI-compatible local endpoint. Install the optional proxy extra to expose one local endpoint that routes each chat request to the configured upstream model server:

model-router init --preset lmstudio --yes
model-router-proxy --config ~/.model-router/routing_proxy.yaml

Then point the agent at:

http://127.0.0.1:8082/v1

The proxy supports /v1/chat/completions, /v1/models, and /health. It calls initialized route_fast(...) once per chat request, maps the selected engine to a configured backend, overrides the outgoing backend model, and forwards to an OpenAI-compatible upstream such as LM Studio, llama.cpp server, LocalAI, or a frontier gateway. human_confirm returns HTTP 409 and is never forwarded. Tools are preserved by default and can be stripped per backend for small local models.

Packaged presets:

model-router init --preset lmstudio --yes
model-router init --preset ollama --yes
model-router init --preset llamacpp --yes
model-router init --preset localai --yes
model-router init --preset hosted-openai-compatible --yes

Use model-router doctor --config ~/.model-router/routing_proxy.yaml when a backend is unavailable or a model name/endpoint is wrong.

Hindsight Routing Logs

The proxy can write privacy-safe JSONL events for calibration and replay:

observability:
  enabled: true
  log_path: ~/.model-router/routing-events.jsonl
  prompt_capture: redacted_preview

By default events keep a prompt hash, length, estimated tokens, selected engine, scores, feature flags, backend, fallback status, and latencies. Raw prompts are not stored unless prompt_capture: full or MODEL_ROUTER_LOG_PROMPTS=1 is set. Use full capture only during deliberate calibration runs.

When a route is wrong, label it:

model-router feedback req-123 code_agent --notes "repo prompt routed too small"

Replay captured traffic against the current router:

python scripts/replay_routing_log.py \
  --events ~/.model-router/routing-events.jsonl \
  --feedback ~/.model-router/routing-feedback.jsonl \
  --json

Rows without full prompts are skipped for replay but still useful for aggregate latency, score, fallback, and route distribution analysis.

Troubleshooting

Wrong route: enable observability, label the request with model-router feedback, and replay logs before changing scoring.
Backend unavailable or wrong model: run model-router doctor --config ~/.model-router/routing_proxy.yaml and check /health. Both diagnostics verify backend reachability and, when /v1/models returns a model list, that each configured backend model is advertised by the upstream server.
human_confirm: the prompt matched a destructive, sending, purchase/payment, deployment, or other high-impact action. Use explicit safety overrides only in versioned configs.
Proxy auth: if proxy.api_key or proxy.api_key_env is configured, clients must send Authorization: Bearer <token>.
Logs/replay: default logs do not include raw prompts. Use prompt_capture: full or MODEL_ROUTER_LOG_PROMPTS=1 only during deliberate calibration runs.

Example Receipt

{
  "selected_engine": "code_agent",
  "complexity_score": 56,
  "risk_score": 38,
  "confidence_score": 90,
  "fallback_engine": "reasoning_local",
  "requires_confirmation": false,
  "requires_tools": true,
  "requires_freshness": false,
  "requires_code_execution": true,
  "requires_vision": false,
  "requires_image_generation": false,
  "config_valid": true,
  "availability_valid": true,
  "reasons": [
    "coding or repository intent",
    "tool use likely",
    "file, shell, or GitHub operation",
    "coding or repository work"
  ],
  "rejected_engines": [
    {
      "engine": "fast_local",
      "reason": "tools required but engine does not support tools"
    }
  ],
  "alternatives": [
    {
      "engine": "web_research",
      "rank_score": 61,
      "capability": 70,
      "trust": 60,
      "cost": 50,
      "latency": 75,
      "reasons": [
        "capability 70/100",
        "trust 60/100",
        "cost 50/100",
        "latency 75/100"
      ]
    }
  ]
}

Receipts intentionally do not include the raw prompt.

Configure Engines

The default catalog lives at configs/model_router.yaml. Machine-specific settings should go in configs/model_router.local.yaml and be passed with --config.

Routing targets map semantic routes to configured engines:

routing_targets:
  simple: fast_local
  balanced: balanced_local
  reasoning: reasoning_local
  coding: code_agent
  research: web_research
  vision: multimodal_vision
  image_generation: image_generation
  confirmation: human_confirm

Human confirmation is a default-on safety feature. Escape hatches are explicit, scoped config choices:

safety:
  require_human_confirmation: true
  confirmation_overrides:
    allow_destructive_actions: false
    allow_send_actions: false
    allow_purchase_actions: false
    allow_high_impact_external_actions: false
    allow_ambiguous_high_impact: false

Each target points at an engine entry:

engines:
  claude_code:
    provider: anthropic
    model: claude-code
    adapter: claude_code
    strengths:
      - repository edits
      - tests
    max_context: 200000
    cost_tier: high
    latency_tier: medium
    capability: 90
    trust: 90
    cost: 80
    latency: 45
    supports_tools: true
    enabled: true
    fallback: code_agent
    availability:
      status: auto
      required_commands:
        - claude

Coding does not have to use Codex. You can point routing_targets.coding at claude_code, codex, code_agent, a local coding model, or any custom engine you define.

Optional numeric metadata uses a 0-100 scale:

capability: model/agent strength.
trust: reliability for sensitive work.
cost: relative cost, where higher means more expensive.
latency: relative latency, where higher means slower.

These values rank compatible alternatives. They do not override the configured target when that target is enabled, available, and compatible.

Setup Assistant

The setup assistant can create a local config without guessing what you want.

Scan your machine:

model-router setup scan
model-router setup scan --json

Get recommendations:

model-router setup recommend
model-router setup recommend --json

Recommendations are produced by a bundled, versioned model advisor catalog at hermes/plugins/model_router/data/model_catalog.yaml. The advisor detects basic local hardware signals such as RAM, CPU architecture, Apple Silicon, and free disk space, then ranks setup-time Hugging Face suggestions for each route. This does not run during route_fast(...), route(...), or ordinary decide calls.

Run the wizard:

model-router setup wizard \
  --output configs/model_router.local.yaml

Write a recommended config non-interactively:

model-router setup write \
  --output configs/model_router.local.yaml

setup write will not overwrite an existing file unless --force is passed.

The wizard asks whether you want:

Local LLMs only.
API keys / hosted models.
A mix of local models, hosted APIs, and agent tools.

It then walks each main route and shows numbered local model choices plus hardware-aware recommended downloads when a local role is missing. Downloads are never run by ordinary routing commands. They require explicit confirmation.

The scanner includes current LM Studio model storage at ~/.lmstudio/models, plus Ollama, Hugging Face cache, and common local model folders, so wizard choices should reflect the models your local tools can see.

If recommended downloads are available and the Hugging Face hf CLI is missing, the wizard warns at the beginning and asks whether to install it into the current Python environment before model choices start. Declining is safe; the router can still write the config, and downloads can be run later.

Plan downloads:

model-router setup download
model-router setup download --route fast_local

Run an approved Hugging Face download:

model-router setup download \
  --route balanced_local \
  --repo-id custom-org/custom-model \
  --execute

For non-interactive scripts, add --yes.

Engine Roles

Role	Default coverage
Intent classifier/router	`intent_router` plus deterministic router code
Fast response/summarization	`fast_local`, `balanced_local`
Deep reasoning/planning	`reasoning_local`
Coding/repo work	`code_agent`, with optional `codex` or `claude_code`
Web research/RAG	`web_research`
Multimodal/vision/OCR	`multimodal_vision`
Image generation	`image_generation`
Confirmation/fail-closed	`human_confirm`

Safety Model

The router never executes user requests.
The router never calls hosted model APIs.
The router never loads local model weights.
The router never sends email, deletes files, buys anything, or runs shell commands.
High-risk destructive, sending, purchasing, payment, scheduling, publishing, and external-action prompts require confirmation by default.
Confirmation escape hatches must be explicit in safety.confirmation_overrides; the router does not learn approvals or silently relax safety rules.
force_engine cannot bypass human confirmation.
Missing or invalid config routes to human_confirm.
Unavailable or incompatible engines are skipped through configured fallbacks.
Receipts omit raw prompt text.

Performance

Use the initialized API for runtime performance:

python scripts/benchmark_route_fast.py
python scripts/benchmark_route_fast.py --json
python scripts/check_route_fast_latency.py --json

route_fast(...) is the production hot path. It returns only the selected engine string. The scorer precompiles its stable regex patterns at import time, and initialized routers keep YAML config and availability results in memory. The richer route(...) path does more work by design because it builds scores, explanations, rejected-engine details, alternatives, and receipt fields.

The CLI is intended for humans, diagnostics, and scripts. Latency-sensitive services should not spawn a Python process per prompt; instantiate ModelRouter once and call the Python API in process.

The default production SLO for initialized ordinary prompts is <= 25 us best sample and <= 50 us mean sample for route_fast(...). The benchmark guard enforces those budgets in CI. See Production readiness for the API contract, benchmark command, SLOs, and logging guidance.

Install For Local Testing

Use a virtual environment so the router does not modify a managed Python installation:

cd /path/to/model-router
python3.11 -m venv .venv
source .venv/bin/activate
python -m pip install -e ".[dev]"
model-router decide --json "fix the repo and run tests"

For a non-editable install from GitHub:

python -m pip install "git+https://github.com/doncazper/model-router.git@v0.5.0"
model-router decide "rewrite this text"

The package exposes console commands, model-router and the legacy hermes-router alias, plus the importable Python API:

from model_router import ModelRouter

router = ModelRouter.from_config()
engine = router.route_fast(prompt)

The default catalog is included as package data, so ModelRouter.from_config() works after wheel installation without relying on the repository checkout. Pass an explicit config path when an embedding app needs its own engine catalog.

See examples/basic_custom_agent.py for a minimal host-neutral integration.

ModelRouter does not currently claim any host-app plugin manifest or automatic per-turn model switching contract. Embedding applications should use their own runtime integration boundary and call the stable route_fast(...) production API.

Development

Run tests:

python -m pytest

Run lint:

python -m ruff check .

With uv and Python 3.11:

uv run --python 3.11 --with pytest --with PyYAML python -m pytest
uv run --python 3.11 --with ruff --with PyYAML python -m ruff check .

Project Layout

model_router/
  __init__.py          # Generic public import path
hermes/plugins/model_router/
  availability.py     # Non-executing availability validation
  cli.py              # CLI entrypoint
  config.py           # YAML catalog loading and validation
  data/               # Packaged default config
  dispatch.py         # Safe dry-run dispatch plans
  models.py           # Dataclass models and JSON-safe serialization
  policy.py           # Engine selection and fail-closed fallback rules
  receipts.py         # Routing receipt helpers
  scorer.py           # Deterministic heuristic prompt scoring
  setup_assistant.py  # Local setup scanning and config recommendations
configs/
  model_router.yaml
  model_router.local.example.yaml
docs/
  adapter-contract.md
  model-router.md
examples/
  basic_custom_agent.py
scripts/
  benchmark_route_fast.py
tests/

The model_router package is the generic public import path. The hermes/plugins path is retained only as a backward-compatible legacy namespace. Neither path is a host-application plugin registration point.

Documentation

Roadmap

v0.5: Usable Local Proxy Beta

Keep the proxy-first install path polished: pip install "hermes-router[proxy]".
Keep model-router init, validate-proxy-config, doctor, /health, log rotation, and provider presets reliable.
Publish releases with a changelog, GitHub release notes, and benchmark output.

v0.6: Passthrough And Gateway Mode

Add router mode and passthrough mode.
Keep legacy command/import aliases for one release.
Add backend request overrides for temperature, context, max tokens, and common generation controls.
Add first-class llama.cpp and MLX gateway templates.

v0.6.5: Managed Local Runtime Beta

Add explicit process configuration for llama.cpp and MLX backends.
Add start, stop, restart, status, and logs commands for configured runtimes.
Detect port conflicts and capture per-backend process logs.
Keep process management opt-in and transparent; never auto-start arbitrary commands without user configuration.

v1.0: Finished Local AI Gateway

Version the public config schema and provide migrations.
Use labeled real-world routing logs as release-blocking regression checks.
Document security/privacy expectations for logs, proxy auth, process commands, and local network exposure.
Decide whether to ship a web UI or keep the product CLI/proxy-first.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

doncazper

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.6.2

Jun 22, 2026

0.6.1

Jun 22, 2026

0.6.0

Jun 22, 2026

0.5.4

Jun 22, 2026

This version

0.5.3

Jun 21, 2026

0.5.2

Jun 21, 2026

0.5.1

Jun 21, 2026

0.5.0

Jun 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hermes_router-0.5.3.tar.gz (102.3 kB view details)

Uploaded Jun 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hermes_router-0.5.3-py3-none-any.whl (76.1 kB view details)

Uploaded Jun 21, 2026 Python 3

File details

Details for the file hermes_router-0.5.3.tar.gz.

File metadata

Download URL: hermes_router-0.5.3.tar.gz
Upload date: Jun 21, 2026
Size: 102.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hermes_router-0.5.3.tar.gz
Algorithm	Hash digest
SHA256	`3791b7a16b2afe8f276931ab5bde4395d0693a5eea175a8ca3ef04256211d676`
MD5	`6da15a46999372c12b63b11673cdb383`
BLAKE2b-256	`dcc7ea90a2197cf038022112b82be6596a7dbd8c266ba175ced00a1527fa4039`

See more details on using hashes here.

Provenance

The following attestation bundles were made for hermes_router-0.5.3.tar.gz:

Publisher: publish.yml on doncazper/model-router

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: hermes_router-0.5.3.tar.gz
- Subject digest: 3791b7a16b2afe8f276931ab5bde4395d0693a5eea175a8ca3ef04256211d676
- Sigstore transparency entry: 1904037841
- Sigstore integration time: Jun 21, 2026
Source repository:
- Permalink: doncazper/model-router@a9ff6e09c5f028fc4ce6cdfc0ba6c8dac183df36
- Branch / Tag: refs/tags/v0.5.3
- Owner: https://github.com/doncazper
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a9ff6e09c5f028fc4ce6cdfc0ba6c8dac183df36
- Trigger Event: release

File details

Details for the file hermes_router-0.5.3-py3-none-any.whl.

File metadata

Download URL: hermes_router-0.5.3-py3-none-any.whl
Upload date: Jun 21, 2026
Size: 76.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hermes_router-0.5.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1365c4d8bfec3a9c226972e8bd77996a6c23b0f2a83e937777f63e73eaab05ff`
MD5	`bf0c6ddf51f01d0963ecd45c02495b48`
BLAKE2b-256	`c71790ea5f62001a5472e60288dc0fe4842cc0e7390d2c13d54fb23c9ccfc075`

See more details on using hashes here.

Provenance

The following attestation bundles were made for hermes_router-0.5.3-py3-none-any.whl:

Publisher: publish.yml on doncazper/model-router

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: hermes_router-0.5.3-py3-none-any.whl
- Subject digest: 1365c4d8bfec3a9c226972e8bd77996a6c23b0f2a83e937777f63e73eaab05ff
- Sigstore transparency entry: 1904038137
- Sigstore integration time: Jun 21, 2026
Source repository:
- Permalink: doncazper/model-router@a9ff6e09c5f028fc4ce6cdfc0ba6c8dac183df36
- Branch / Tag: refs/tags/v0.5.3
- Owner: https://github.com/doncazper
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a9ff6e09c5f028fc4ce6cdfc0ba6c8dac183df36
- Trigger Event: release

hermes-router 0.5.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

ModelRouter

Use With Your Agent In 3 Minutes

At a Glance

Highlights

Project Status

Install

Quick Start

Python API

CLI

Local Routing Proxy

Hindsight Routing Logs

Troubleshooting

Example Receipt

Configure Engines

Setup Assistant

Engine Roles

Safety Model

Performance

Install For Local Testing

Development

Project Layout

Documentation

Roadmap

v0.5: Usable Local Proxy Beta

v0.6: Passthrough And Gateway Mode

v0.6.5: Managed Local Runtime Beta

v1.0: Finished Local AI Gateway

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance