Skip to main content

Embedding-based scaffold router for Claude API. Routes tasks to the right scaffold using centroid matching. By Hermes Labs.

Project description

claude-router

Route Claude API calls to the cheapest model that works. 5 validated scaffolds, embedding-based task classification in ~10ms. Validated on 300+ blind-judged API calls.

Results

Task Best Setup Cost Quality vs. Baseline
Eval/scoring Haiku + scaffold $0.06 MAE 1.0 (vs Sonnet raw: 1.2)
Research Sonnet + scaffold $0.28 8.49/10 (vs Opus raw: 7.45)
Content Haiku + scaffold $0.06 4/5 blind wins vs Sonnet
Code review Sonnet (raw) $0.28 8.7/10 (vs Opus: 8.1)

Anti-findings

These are the blocker issues. The router handles them automatically:

  • Scaffolds break operational tasks (0/9 success). Haiku treats constraints as meta-instructions instead of executing tasks.
  • Scaffolds hurt coding (4.9 vs 6.4 raw). Don't scaffold code review, design, or debugging.
  • Opus doesn't scaffold. Safety-critical evals need Opus raw (MAE 0.0), not scaffolded.

The routing table avoids these entirely: no scaffolds on operational, coding, safety-critical, or conversation tasks.

Install

Requires: Python 3.10+, requests, numpy, and Ollama running locally with nomic-embed-text.

pip install claude-router
ollama pull nomic-embed-text

Quick start

from claude_router import ClaudeRouter

router = ClaudeRouter()
result = router.route("Evaluate this research paper for methodological rigor")

print(result["model"])           # claude-haiku-4-5
print(result["scaffold_key"])    # calibrated-scoring
print(result["cost_per_1k"])     # 0.0008

# Build prompt with scaffold prepended
prompt = router.build_prompt("Evaluate this research paper...")
# → Pass prompt as system message to Anthropic API

Or CLI:

python router.py "Write a blog post about Q2 results"

How it works

  1. Embed your prompt using nomic-embed-text (~5ms)
  2. Compare against pre-computed task-category centroids
  3. Look up routing table: category → model + scaffold
  4. Return model ID and scaffold text

No LLM calls for routing. All locally in ~10ms. Low confidence (router accuracy 74% on 26-prompt benchmark) defaults to Opus.

The 5 scaffolds

Each scaffold is validated through blind evaluation. They work by constraining the model's output space to the task structure.

See scaffolds.json for full text and evidence:

  • calibrated-scoring: Integer 1-10, cite evidence, not generous/critical
  • insight-first: Lead non-obvious, concrete recs, 3-4 sentences
  • plan-first: g:goal;c:constraints;s:steps;r:risks prefix
  • substance-check: Real gaps not surface, name issue and location
  • bug-hunt: Specific bugs, line numbers, severity, one-line fix

Routing table

eval              → Haiku   + calibrated-scoring
research          → Sonnet  + insight-first
content           → Haiku   + insight-first
analytical_review → Haiku   + substance-check
search            → Haiku   + plan-first

coding            → Sonnet  (raw)
operational       → Sonnet  (raw)
status_check      → Haiku   (raw)
conversation      → Opus    (raw)
safety_critical   → Opus    (raw)

Low confidence → Opus (safe default).

Cost math

For 10,000 Claude API calls/month:

Strategy Cost Quality
All Opus $6,800 Baseline
All Sonnet $2,800 Lower on eval, equal on code
claude-router ~$620 Equal or better on eval/research/content

Customization

Swap scaffolds, centroids, or routing table:

router = ClaudeRouter(
    centroids_path="my_centroids.json",
    routing_table_path="my_routing.json",
    scaffolds_path="my_scaffolds.json"
)

Limitations

  • Requires Ollama locally (for embeddings)
  • Centroids trained on one task distribution — test on your workload
  • Router misclassifies 26% of tasks — low confidence defaults to Opus
  • Anti-findings are real: scaffolds on coding/operational make things worse
  • Lite mode (Haiku-first routing for max savings) planned for v1.1

Evidence

Benchmarks: benchmarks/ | Raw citations: scaffolds.json | License: MIT

Key experiments: 4-condition code/research crossover, scaffolds-vs-operational stress test, scaffolded Sonnet beats Opus 75% on research (6/8 blind wins, 140 API calls).

Hermes Labs Ecosystem

claude-router is part of the Hermes Labs open-source suite:

  • lintlang — Static linter for AI agent tool descriptions and system prompts
  • little-canary — Prompt injection detection
  • zer0dex — Dual-layer memory for AI agents
  • zer0lint — mem0 extraction diagnostics
  • suy-sideguy — Autonomous agent watchdog
  • quickthink — Planning scaffolding for local LLMs

Need this calibrated to your pipeline? Open an issue or reach out to Hermes Labs for custom scaffolds and production integration.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claude_router-0.1.0.tar.gz (208.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

claude_router-0.1.0-py3-none-any.whl (101.7 kB view details)

Uploaded Python 3

File details

Details for the file claude_router-0.1.0.tar.gz.

File metadata

  • Download URL: claude_router-0.1.0.tar.gz
  • Upload date:
  • Size: 208.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for claude_router-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e92956b534bc94a57fd93781038288f3171e745c3e030b59e15a6b3a7a111e65
MD5 43ec47858cfe3ba6a94eea0db240f8db
BLAKE2b-256 c98d13b58278222ce5efa59d84904e8dec5a4eb2aa9ca377fc9775a20863c1e0

See more details on using hashes here.

File details

Details for the file claude_router-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: claude_router-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 101.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for claude_router-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 89b812a11f56c4ef8fcdfbdb264b889119cd1360b7056f74ee50e62fe2c7cad9
MD5 7a78e458826ff023c350f3a1e1baa89b
BLAKE2b-256 c1143109358dcc6eb3f8a44e9c9530133f46928b7c241c8702cae81034f74aac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page