Embedding-based scaffold router for Claude API. Routes tasks to the right scaffold using centroid matching. By Hermes Labs.
Project description
claude-router
Route Claude API calls to the cheapest model that works. 5 validated scaffolds, embedding-based task classification in ~10ms. Validated on 300+ blind-judged API calls.
Results
| Task | Best Setup | Cost | Quality vs. Baseline |
|---|---|---|---|
| Eval/scoring | Haiku + scaffold | $0.06 | MAE 1.0 (vs Sonnet raw: 1.2) |
| Research | Sonnet + scaffold | $0.28 | 8.49/10 (vs Opus raw: 7.45) |
| Content | Haiku + scaffold | $0.06 | 4/5 blind wins vs Sonnet |
| Code review | Sonnet (raw) | $0.28 | 8.7/10 (vs Opus: 8.1) |
Anti-findings
These are the blocker issues. The router handles them automatically:
- Scaffolds break operational tasks (0/9 success). Haiku treats constraints as meta-instructions instead of executing tasks.
- Scaffolds hurt coding (4.9 vs 6.4 raw). Don't scaffold code review, design, or debugging.
- Opus doesn't scaffold. Safety-critical evals need Opus raw (MAE 0.0), not scaffolded.
The routing table avoids these entirely: no scaffolds on operational, coding, safety-critical, or conversation tasks.
Install
Requires: Python 3.10+, requests, numpy, and Ollama running locally with nomic-embed-text.
pip install claude-router
ollama pull nomic-embed-text
Quick start
from claude_router import ClaudeRouter
router = ClaudeRouter()
result = router.route("Evaluate this research paper for methodological rigor")
print(result["model"]) # claude-haiku-4-5
print(result["scaffold_key"]) # calibrated-scoring
print(result["cost_per_1k"]) # 0.0008
# Build prompt with scaffold prepended
prompt = router.build_prompt("Evaluate this research paper...")
# → Pass prompt as system message to Anthropic API
Or CLI:
python router.py "Write a blog post about Q2 results"
How it works
- Embed your prompt using nomic-embed-text (~5ms)
- Compare against pre-computed task-category centroids
- Look up routing table: category → model + scaffold
- Return model ID and scaffold text
No LLM calls for routing. All locally in ~10ms. Low confidence (router accuracy 74% on 26-prompt benchmark) defaults to Opus.
The 5 scaffolds
Each scaffold is validated through blind evaluation. They work by constraining the model's output space to the task structure.
See scaffolds.json for full text and evidence:
- calibrated-scoring: Integer 1-10, cite evidence, not generous/critical
- insight-first: Lead non-obvious, concrete recs, 3-4 sentences
- plan-first: g:goal;c:constraints;s:steps;r:risks prefix
- substance-check: Real gaps not surface, name issue and location
- bug-hunt: Specific bugs, line numbers, severity, one-line fix
Routing table
eval → Haiku + calibrated-scoring
research → Sonnet + insight-first
content → Haiku + insight-first
analytical_review → Haiku + substance-check
search → Haiku + plan-first
coding → Sonnet (raw)
operational → Sonnet (raw)
status_check → Haiku (raw)
conversation → Opus (raw)
safety_critical → Opus (raw)
Low confidence → Opus (safe default).
Cost math
For 10,000 Claude API calls/month:
| Strategy | Cost | Quality |
|---|---|---|
| All Opus | $6,800 | Baseline |
| All Sonnet | $2,800 | Lower on eval, equal on code |
| claude-router | ~$620 | Equal or better on eval/research/content |
Customization
Swap scaffolds, centroids, or routing table:
router = ClaudeRouter(
centroids_path="my_centroids.json",
routing_table_path="my_routing.json",
scaffolds_path="my_scaffolds.json"
)
Limitations
- Requires Ollama locally (for embeddings)
- Centroids trained on one task distribution — test on your workload
- Router misclassifies 26% of tasks — low confidence defaults to Opus
- Anti-findings are real: scaffolds on coding/operational make things worse
- Lite mode (Haiku-first routing for max savings) planned for v1.1
Evidence
Benchmarks: benchmarks/ | Raw citations: scaffolds.json | License: MIT
Key experiments: 4-condition code/research crossover, scaffolds-vs-operational stress test, scaffolded Sonnet beats Opus 75% on research (6/8 blind wins, 140 API calls).
Hermes Labs Ecosystem
claude-router is part of the Hermes Labs open-source suite:
- lintlang — Static linter for AI agent tool descriptions and system prompts
- little-canary — Prompt injection detection
- zer0dex — Dual-layer memory for AI agents
- zer0lint — mem0 extraction diagnostics
- suy-sideguy — Autonomous agent watchdog
- quickthink — Planning scaffolding for local LLMs
Need this calibrated to your pipeline? Open an issue or reach out to Hermes Labs for custom scaffolds and production integration.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file claude_router-1.0.0.tar.gz.
File metadata
- Download URL: claude_router-1.0.0.tar.gz
- Upload date:
- Size: 208.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db71dca924209adaed4730caeee5b9052c283ffd08c40521643fb2b018ebcb0c
|
|
| MD5 |
d9d45103c25833497fe548b9f3f3e8d8
|
|
| BLAKE2b-256 |
4403ff87a3b08bd973690368aa84e0c75447719f753ab1f0d3c2a4a674785d70
|
File details
Details for the file claude_router-1.0.0-py3-none-any.whl.
File metadata
- Download URL: claude_router-1.0.0-py3-none-any.whl
- Upload date:
- Size: 102.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a9c4ebed00468d4c039d2d17e1e77d9d6a38f964cdc40891c6d90cc1b8f476c
|
|
| MD5 |
cc68addf7b05df118c6ba3d7c7bf54e0
|
|
| BLAKE2b-256 |
05fb61f8576e79c93560fd26f08bb4e818c8ff5ebdedd48a63a5b279665ab3fa
|