Cost-aware model routing for the Claude API — reference implementation with a Claude Code subagent hook

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

claude-model-router

Reference implementation of cost-aware model routing for the Claude API — plus a Claude Code hook that stops subagents from silently inheriting your most expensive model.

A cheap Haiku 4.5 call classifies each prompt's complexity; the prompt then executes on the cheapest capable tier. Small, tested, and built around the parts naive routers get wrong.

Why naive Claude routers break

Most routing examples treat model IDs as interchangeable strings. On the Claude 5 family they are not:

Trap	What actually happens	Where handled
Refusals are not exceptions	Fable 5 refusals return HTTP 200 with `stop_reason: "refusal"` — a try/except fallback never fires	server-side fallback beta + explicit `stop_reason` check (router.py)
`thinking` differs per model	Fable 5: always on, explicit config → 400. Opus 4.8: omitted = off. Sonnet 5: omitted = adaptive. Haiku 4.5: adaptive → 400	per-model request shape in `build_request()`
`effort` is not universal	`output_config: {"effort": ...}` → 400 on Haiku 4.5	effort omitted on the trivial tier
`content[0].text` crashes	refusals ship empty `content`; thinking blocks precede text blocks	block-type-filtered extraction
Small `max_tokens` + high effort = truncation	thinking tokens spend from `max_tokens`; xhigh turns run minutes	64K `max_tokens` + streaming everywhere
Cache dies on model switch	prompt caches are model-scoped; per-turn routing invalidates the cache every turn	route once per conversation, reuse via `execute()`
Context windows differ	Haiku 4.5 = 200K; every other tier model = 1M	trivial bumped to low above 180K context tokens

Tiers

Tier	Model	Effort	Notes
trivial	`claude-haiku-4-5`	—	reformatting, extraction, classification
low	`claude-sonnet-5`	medium	summaries, single-function edits, simple Q&A
mid	`claude-opus-4-8`	high	debugging, security review, single-file refactors
high	`claude-fable-5`	xhigh	multi-file refactors, migrations, architecture

The keyword-heuristic fallback (used when the classifier call fails) never assigns trivial — misrouting real work to Haiku costs more in retries than the tier saves, so only the classifier may pick it.

Usage

import anthropic
from model_router import run, execute

client = anthropic.Anthropic()

result = run(client, "Refactor the auth module across services")
print(result.route.model, result.route.source)  # claude-fable-5 classifier
print(result.served_by)                          # may be opus-4-8 after a fallback
print(result.text)

Multi-turn: prompt caches are model-scoped, so route once per conversation and reuse the route for follow-ups —

route = result.route
result2 = execute(client, route, full_message_history)

Cost model

1,000 prompts, mixed workload (30% trivial / 40% low / 20% mid / 10% high), list pricing, output token counts held equal across models (conservative — Fable at xhigh thinks longer than smaller models on the same prompt):

Tier	Prompts	Routed model	Routed cost	All-Fable cost
trivial	300	`claude-haiku-4-5`	$0.54	$5.40
low	400	`claude-sonnet-5`	$6.60	$22.00
mid	200	`claude-opus-4-8`	$18.50	$37.00
high	100	`claude-fable-5`	$80.00	$80.00
classifier overhead	1000	`claude-haiku-4-5`	$0.72	—
total			$106.36	$144.40

26% saved on this mix — and the mix is the whole story: the 10% of prompts that legitimately need Fable dominate the bill. A support-bot-shaped workload (mostly trivial/low) saves 70%+. Rerun with your own shape:

python benchmarks/cost_model.py

This is a pricing calculation, not a live benchmark — it spends no tokens and its assumptions are at the top of the script.

Live benchmark (real tokens, real dollars)

benchmarks/live_bench.py runs a 17-prompt mixed workload through both arms — routed vs. all-Fable — with real API calls, and prices each call from the usage block the API actually returned (by the model that actually served it, which matters after a fallback). Classifier overhead is metered from real Haiku usage, not estimated.

ANTHROPIC_API_KEY=... python benchmarks/live_bench.py --tiers trivial,low  # cheap smoke run
ANTHROPIC_API_KEY=... python benchmarks/live_bench.py --out live_results.md  # full run, ~$2-6

It spends real money (the all-Fable arm at xhigh dominates), refuses to start without a key, and flags every prompt where the classifier's tier disagreed with the expected tier — so you can audit misroutes instead of trusting a single savings number. RoutedResult.usage exposes the same real token counts in your own code.

This is the honest version of the standard methodology: the same cost-quality comparison RouterBench and RouteLLM run offline, at a scale one person can afford to reproduce.

Claude Code hook: stop subagents from burning Fable tokens

Separate deliverable, same philosophy. In Claude Code, subagents inherit the session model by default — run your session on an expensive model and every spawned agent (including greps and file listings) bills at that rate.

hooks/subagent-router.py is a PreToolUse hook that denies any Agent spawn missing an explicit model param. The deny reason carries a routing table, so the session model immediately re-issues the spawn with the cheapest capable tier. Self-correcting, one file, no dependencies.

Install:

cp hooks/subagent-router.py ~/.claude/hooks/

Then add to ~/.claude/settings.json:

"hooks": {
  "PreToolUse": [
    {
      "matcher": "Agent",
      "hooks": [
        {
          "type": "command",
          "command": "python3 ~/.claude/hooks/subagent-router.py"
        }
      ]
    }
  ]
}

New sessions then enforce: unrouted spawn → denied with policy → re-spawned with explicit model (haiku for greps/locates, sonnet for single-file work and reviews, opus for multi-file/hard debugging, session model only for judgment-critical synthesis).

Install

pip install claude-model-router

Or from source:

git clone https://github.com/junoseong/claude-model-router
cd claude-model-router
python3 -m venv .venv && .venv/bin/pip install -e '.[dev]'
.venv/bin/pytest

Notes

Fable 5 requires 30-day data retention; zero-data-retention orgs get 400s on the high tier — remap high to claude-opus-4-8 in _TIER_ROUTES.
Sonnet 5 has intro pricing ($2/$10 per MTok) through 2026-08-31; the cost model uses sticker prices ($3/$15).

MIT licensed.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

junoseong

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Jul 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claude_model_router-0.2.0.tar.gz (10.4 kB view details)

Uploaded Jul 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

claude_model_router-0.2.0-py3-none-any.whl (9.6 kB view details)

Uploaded Jul 2, 2026 Python 3

File details

Details for the file claude_model_router-0.2.0.tar.gz.

File metadata

Download URL: claude_model_router-0.2.0.tar.gz
Upload date: Jul 2, 2026
Size: 10.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for claude_model_router-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`39b88a082b703af900d03972cd6ac9126b26ccbf6a596cdcb8607738726246d0`
MD5	`f37b9a10eb02ace0c9d7002fb152c5e0`
BLAKE2b-256	`7a270428110e88ccaf00160cd0a43ed40ee188c0eec786781ed1781bb05a29db`

See more details on using hashes here.

Provenance

The following attestation bundles were made for claude_model_router-0.2.0.tar.gz:

Publisher: publish.yml on junoseong/claude-model-router

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: claude_model_router-0.2.0.tar.gz
- Subject digest: 39b88a082b703af900d03972cd6ac9126b26ccbf6a596cdcb8607738726246d0
- Sigstore transparency entry: 2048611626
- Sigstore integration time: Jul 2, 2026
Source repository:
- Permalink: junoseong/claude-model-router@9cc60f4d78e0e55552bb35b1552e00100f660f79
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/junoseong
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@9cc60f4d78e0e55552bb35b1552e00100f660f79
- Trigger Event: release

File details

Details for the file claude_model_router-0.2.0-py3-none-any.whl.

File metadata

Download URL: claude_model_router-0.2.0-py3-none-any.whl
Upload date: Jul 2, 2026
Size: 9.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for claude_model_router-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1d82c6a1130571e37ebc1caa06b8b4e52b26a82cb5e995b38005174638676cda`
MD5	`720cfd39912b9cfebc9b326d6d3b4325`
BLAKE2b-256	`6dd318b540046d58a265cb8612473333fa62923d8b9154d2108b09e0ce42f7ee`

See more details on using hashes here.

Provenance

The following attestation bundles were made for claude_model_router-0.2.0-py3-none-any.whl:

Publisher: publish.yml on junoseong/claude-model-router

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: claude_model_router-0.2.0-py3-none-any.whl
- Subject digest: 1d82c6a1130571e37ebc1caa06b8b4e52b26a82cb5e995b38005174638676cda
- Sigstore transparency entry: 2048611633
- Sigstore integration time: Jul 2, 2026
Source repository:
- Permalink: junoseong/claude-model-router@9cc60f4d78e0e55552bb35b1552e00100f660f79
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/junoseong
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@9cc60f4d78e0e55552bb35b1552e00100f660f79
- Trigger Event: release

claude-model-router 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

claude-model-router

Why naive Claude routers break

Tiers

Usage

Cost model

Live benchmark (real tokens, real dollars)

Claude Code hook: stop subagents from burning Fable tokens

Install

Notes

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance