A 3-layer cascade classifier that routes each task to the cheapest model that can handle it well — before the agent makes an API call.
Project description
dynamic-model-router
A 3-layer cascade classifier that routes each task to the cheapest model that can handle it well — before the agent makes an API call.
from classifier import classify
decision = classify("What is 2+2?") # → low tier (cheap)
decision = classify("Design a CQRS architecture for…") # → high tier (capable)
print(decision.tier, decision.model_name)
That's the whole pitch. Cost goes down 60–80% on real workloads with no quality loss.
📚 Table of contents
- Install
- The 3 layers — in plain English
- 60-second quickstart
- Layer 1 — Add your own keywords
- Layer 3 — Train on your data
- Track & inspect what's happening
- Decision log — three modes
- Layer 2 — LLM fallback (advanced)
- Model registry
- Integrations
- CLI reference
- Production checklist
Install
pip install dynamic-model-router # core
pip install 'dynamic-model-router[ml]' # + Layer 3 (recommended)
pip install 'dynamic-model-router[ml,google]' # + Gemini provider
pip install 'dynamic-model-router[ml,google,anthropic,openai]' # all 3
Set one API key in .env (Google has a free tier — easiest start):
echo 'GOOGLE_API_KEY=your-key-here' > .env
Verify your install:
dmr doctor
The 3 layers — in plain English
Every task you classify walks down a ladder. The first layer that's confident wins. Most tasks stop at Layer 1.
| Layer | What it does | Cost | Speed | |
|---|---|---|---|---|
| 🟦 | Layer 1 — Keywords | Looks at the words in your task. "implement", "function" → coding. "summarize" → doc creation. "diagnose", "patient" → medical reasoning. | Free | <1 ms |
| 🟩 | Layer 3 — ML model | A small neural net trained on your data (or our defaults). Catches things keywords miss — like sentence structure, intent, complexity. | Free | ~15 ms |
| 🟨 | Layer 2 — LLM fallback | When the first two are unsure, asks an LLM to classify the task. Same provider you'll route to. | $$ | ~500 ms |
The cascade: keywords confident? → ship. Otherwise: ML confident? → ship. Otherwise: ask an LLM. So every customization you make to Layer 1 (cheap, deterministic) saves you Layer 2 calls (slow, billed).
What each layer outputs is the same: (task_type, complexity, confidence). Together those map to (provider, tier, model) via a configurable matrix.
60-second quickstart
from classifier import Router
# Zero config. Layer 3 turns on automatically once you've trained it.
router = Router(layer3_enabled="auto")
decision = router.classify("Implement Dijkstra's algorithm in Python")
print(decision.model_name) # → gemini-2.5-flash
print(decision.tier.value) # → low
print(decision.layer_used) # → layer1
print(decision.reasoning) # → keyword match: "implement"
Drop that decision.model_name into whatever SDK you use:
from google import genai
client = genai.Client()
response = client.models.generate_content(
model=decision.model_name,
contents="Implement Dijkstra's algorithm in Python",
)
Or use one of the 11 framework integrations — LangChain, CrewAI, AutoGen, ADK, LlamaIndex, Pydantic AI, DSPy, Haystack, Semantic Kernel, smolagents, OpenAI Agents.
Layer 1 — Add your own keywords (no code needed)
Layer 1 is just: "if the task contains these words, it's probably this kind of task." Adding domain vocabulary is the single highest-leverage customization you can make.
The easy way — dmr keywords
# Add a few legal-domain keywords
dmr keywords add --domain legal --type reasoning \
--keywords "tort,liable,precedent,indemnification"
# See what you've added
dmr keywords list
# Found a wrong one?
dmr keywords remove --domain legal --keyword "tort"
That's it. Packs are saved to ~/.dmr/keywords/<domain>.yaml and auto-loaded by every new Router() — no code change.
Don't know what keywords to add? Mine them from your logs
Once your router has handled some real traffic, ask it which words it's seeing:
dmr keywords suggest --since 30d --top 15
Top distinctive n-grams per task_type (not already in any pack):
[reasoning]
2.41 n=37 differential diagnosis
2.18 n=29 clinical scenario
1.94 n=42 contraindication
[doc_creation]
2.05 n=51 progress note
1.78 n=33 discharge summary
Pick the strong ones and dmr keywords add them.
Or build a pack programmatically
from classifier import KeywordPack, TaskType, Router
biotech = (KeywordPack.builder("biotech")
.add(TaskType.REASONING, ["protein", "CRISPR", "in-vitro"])
.escalator("genome-wide", weight=2) # bumps complexity
.build())
router = Router(extra_keyword_packs=[biotech])
Layer 3 — Train on your data (one command)
You don't need labeled data to start. The package logs every routing decision to routing_decisions.jsonl, and dmr train --auto turns that log into training data using 8 weak-supervision rules (Snorkel-style — short prompts, user retries, model escalations, etc.).
Workflow
Day 1. Install. Use the router. L1 + L2 work immediately. L3 is silently disabled.
router = Router(layer3_enabled="auto") # auto = enable when a model exists
Day 30. You've logged a few hundred decisions. dmr doctor notices:
[!] L3 model file WARN missing, but 547 decisions logged
→ run `dmr train --auto` to enable Layer 3
One command to bootstrap Layer 3 from those logs:
dmr train --auto
[1/3] Auto-labeling decision/outcome telemetry since 2026-04-09...
Got 312 confident labels:
task_type reasoning 104
task_type doc_creation 98
task_type code_creation 67
complexity simple 86
complexity standard 162
complexity complex 64
[2/3] Training Layer 3 head (frozen MiniLM + calibrated MLPs)...
[3/3] Done.
task_type accuracy: 0.831
complexity accuracy: 0.776
Layer 3 is now active. New `Router()` instances will pick it up
automatically when constructed with `layer3_enabled='auto'` (default).
That's it. Re-run any time you want — each run replaces the model.
Already have labeled data?
dmr train --data my_examples.jsonl
JSONL format:
{"task": "Implement Dijkstra in Python", "task_type": "code_creation", "complexity": "standard"}
{"task": "Hello", "task_type": "conversation", "complexity": "simple"}
No production data and want a head start?
dmr generate-data --domain healthcare --per-slot 50 --out healthcare.jsonl
dmr train --data healthcare.jsonl
(Uses Gemini to synthesize realistic examples for your domain.)
Tune Layer 3 in code
Router(
layer3_enabled="auto", # default
layer3_threshold=0.85, # higher = stricter
layer3_embedding_model="BAAI/bge-large-en-v1.5", # swap encoder
)
Track & inspect what's happening
Every classification is logged. The package gives you simple commands to inspect what the router is doing.
dmr doctor — health check + readiness
dmr doctor
[+] Python version OK 3.12.7
[+] dep:pydantic_settings OK installed
[+] opt:google.genai OK installed (Layer 2 fallback)
[+] opt:sentence_transformers OK installed (Layer 3 ML head)
[+] key:google OK configured
[!] key:anthropic WARN ANTHROPIC_API_KEY not set
[+] L3 model file OK head_v1.joblib (3,166 KB)
[+] classify smoke test OK tier=low model=gemini-2.5-flash
Result: 12 ok, 1 warning(s), 0 failure(s)
Run it in CI — fail your build on [x].
dmr config show — what's actually loaded
dmr config show
dynamic-model-router v0.2.0
[settings]
default_provider google
layer1_enabled True
layer2_enabled True
layer3_enabled True
cache_enabled True
monthly_budget_usd $1000.0
[registry]
providers google, anthropic, openai
models 8
[layer 3]
model file head_v1.joblib (3,166 KB)
trained on 2026 examples
task_type accuracy 0.789
complexity accuracy 0.796
[keyword packs]
registered healthcare, legal, your_custom
dmr stats — what's it actually routing?
dmr stats # tier distribution + layer hit rates (default 24h)
dmr stats cost --since 7d
dmr stats disagreements
Routing summary — last 24 hours
Total decisions 1,247
Layer 1 (free) 892 (71.5%)
Layer 3 (ML) 231 (18.5%)
Layer 2 (LLM) 124 (10.0%)
Tier distribution
low 687 (55.1%) $0.86
medium 478 (38.3%) $4.12
high 82 ( 6.6%) $9.74
─────
$14.72
dmr config validate — schema-check your dmr.yaml
dmr config validate
Decision log — three modes
The router emits two streams: decisions (what was routed where) and outcomes (what happened — tokens, cost, success). How they're delivered depends on what you turn on.
Mode 1 — Default (no setup)
One quiet INFO line per event via standard Python logging. No files. No DB. Just like any well-behaved library:
INFO dmr.decisions: DMR decision: tier=low model=gemini-2.5-flash layer=layer1 conf=0.91 lat=2ms
INFO dmr.outcomes: DMR outcome: tokens=42/180 wall=412ms success=True cost=$0.000023
Silence it: logging.getLogger("dmr").setLevel(logging.WARNING).
Mode 2 — Full structured telemetry
Set DMR_TELEMETRY=1. Same logger, richer payload — now every event is a full JSON event at logging.DEBUG. Still no files written. If you want persistence, see Mode 3.
DMR_TELEMETRY=1 python app.py
{"timestamp": "2026-05-09T14:23:11Z", "decision_id": "abc123...", "router_version": "0.4.0",
"task_preview": "Implement…", "tier": "low", "model": "gemini-2.5-flash", "task_type": "code_creation",
"complexity": "standard", "confidence": 0.91, "layer": "layer1", "latency_ms": 0.4,
"provider": "google", "compliance_flag": false, "cached": false}
PII (SSNs, emails, API keys, JWTs, phone numbers, etc.) is auto-redacted from task_preview and error_message. Route the dmr.decisions and dmr.outcomes Python loggers wherever you want — file handler, syslog, OTLP, Datadog, etc.
Mode 3 — Pluggable backend (you own the storage)
The package never writes files automatically. If you want persistence, wire a backend — that's the only way data lands anywhere outside Python logging.
Any object with a log(entry: dict) method works:
from classifier import Router
from examples.custom_backends.sqlite_backend import SQLiteBackend
backend = SQLiteBackend("my_telemetry.db")
router = Router(decision_logger=backend, outcome_logger=backend)
Ready-made backends in examples/custom_backends/:
| Storage | File | Extra deps |
|---|---|---|
| SQLite (local, zero-dep) | sqlite_backend.py | none |
| PostgreSQL | postgres_backend.py | psycopg2-binary |
| Google BigQuery | bigquery_backend.py | google-cloud-bigquery |
| AWS DynamoDB | dynamodb_backend.py | boto3 |
| Google Cloud Storage | gcs_backend.py | google-cloud-storage |
Built-in (no extra files needed): JSONLLoggerBackend, StdoutLoggerBackend, WebhookLoggerBackend, KafkaLoggerBackend, S3LoggerBackend.
Fan out to multiple sinks with MultiLoggerBackend:
from classifier import Router, MultiLoggerBackend, StdoutLoggerBackend
from examples.custom_backends.sqlite_backend import SQLiteBackend
backend = MultiLoggerBackend([
SQLiteBackend("local.db"), # local queryable copy
StdoutLoggerBackend(), # also stream to stdout for log collectors
])
router = Router(decision_logger=backend, outcome_logger=backend)
A broken backend never blocks the others — failures are caught and logged at WARNING.
What's in each event
Decision event (one per router.classify()):
| Field | Type | Notes |
|---|---|---|
decision_id |
str | 16-char hex — join key to outcomes |
timestamp |
ISO 8601 | UTC |
router_version |
str | package __version__ |
task_preview |
str | first 200 chars, PII-redacted |
task_length |
int | full task length |
tier |
str | low/medium/high |
model, provider |
str | the routed model |
task_type, complexity |
str | classifier output |
confidence |
float | 0–1 |
layer |
str | which layer decided: layer1/layer2/layer3 |
latency_ms |
float | classification time |
compliance_flag |
bool | PII/PHI detected in task |
disagreement |
bool | L1 vs L3 disagree |
exploration |
bool | random sample for drift detection |
cached, cached_from |
bool, str | cache-hit metadata |
Outcome event (call router.report_outcome(...) after your LLM call returns):
| Field | Type | Notes |
|---|---|---|
decision_id |
str | join key |
tokens_in, tokens_out |
int | usage |
tokens_estimated |
bool | True if heuristic (vs provider-reported) |
wall_ms |
float | full LLM call time |
success |
bool | call completed |
cost_usd |
float | computed from model rates |
user_feedback |
str | up/down/None |
user_retried, user_escalated_model, edit_distance |
mixed | optional signals |
error_message |
str | PII-redacted |
Join decisions to outcomes via decision_id for cost-per-tier / accuracy / cache-hit-rate dashboards.
Try it in 30 seconds
python examples/test_telemetry.py # Mode 1 — quiet
DMR_TELEMETRY=1 python examples/test_telemetry.py # Mode 2 — full JSON
python examples/test_telemetry.py --db # Mode 3 — SQLite backend + analytics
Layer 2 — LLM fallback (advanced)
Layer 2 only fires when L1 + L3 are both uncertain (~10% of traffic in practice). Defaults to Gemini Flash, but everything is overridable:
Router(
layer2_provider="anthropic",
layer2_model="claude-haiku-4-5-20251001",
l2_retry_policy={"max_attempts": 5, "initial_delay": 0.5, "backoff": 2.0},
l2_circuit_breaker={"failure_threshold": 3, "cooldown_secs": 120},
layer2_prompt_template=open("my_prompt.txt").read(),
budget_usd=100, # auto-downgrades at 80%, halts at 100%
)
Disable it entirely if you want a pure offline router:
Router(layer2_enabled=False)
Model registry
No model name or price is hardcoded. Everything lives in YAML.
dmr models # see what's loaded
dmr models load my-models.yaml --replace
dmr models export --output snapshot.yaml
# my-models.yaml
providers:
groq:
api_key_env: GROQ_API_KEY
tiers:
low: llama-3.3-8b-instant
medium: llama-3.3-70b-versatile
high: llama-3.3-70b-versatile
models:
llama-3.3-8b-instant:
cost: { input_per_1m: 0.05, output_per_1m: 0.08 }
capabilities: { context_window: 128000, supports_function_calling: true }
Or programmatically:
from classifier import register_provider, register_model_cost, ModelTier
register_provider("groq", {
ModelTier.LOW: "llama-3.3-8b-instant",
ModelTier.HIGH: "llama-3.3-70b-versatile",
})
register_model_cost("llama-3.3-70b-versatile", input_per_1m=0.59, output_per_1m=0.79)
Override priority: Router(registry=...) → DMR_REGISTRY env var → bundled default.yaml.
Integrations
| Framework | Module | One-line use |
|---|---|---|
| LangChain | classifier.integrations.langchain |
get_chat_model(task) or DynamicChatModel() |
| CrewAI | classifier.integrations.crewai |
pick_llm_for_task(task) or DynamicLLM() |
| AutoGen | classifier.integrations.autogen |
get_autogen_llm_config(task) |
| OpenAI Agents | classifier.integrations.autogen |
get_openai_agent_model(task) |
| Google ADK | classifier.integrations.adk |
before_model_callback=dynamic_model_selector |
| LlamaIndex | classifier.integrations.llamaindex |
get_llm(task) or DynamicLLM() |
| Pydantic AI | classifier.integrations.pydantic_ai |
get_model_string(task) or get_agent(task) |
| DSPy | classifier.integrations.dspy |
get_lm(task) or with route(task): ... |
| Haystack | classifier.integrations.haystack |
get_generator(task) |
| Semantic Kernel | classifier.integrations.semantic_kernel |
get_chat_service(task) |
| smolagents (HF) | classifier.integrations.smolagents |
get_model(task) or DynamicModel() |
# CrewAI example — every call this agent makes is routed dynamically
from crewai import Agent
from classifier.integrations.crewai import DynamicLLM
agent = Agent(role="Analyst", goal="...", llm=DynamicLLM())
CLI reference
# Classify
dmr classify "task text" # one-shot
dmr classify --preset healthcare "Patient MRN…"
# Train Layer 3
dmr train --auto # bootstrap from logs
dmr train --data examples.jsonl # train on labeled JSONL
dmr generate-data --domain legal --per-slot 50 # synthesize via Gemini
# Customize Layer 1 keywords
dmr keywords add --domain legal --type reasoning --keywords "tort,liable"
dmr keywords list
dmr keywords remove --domain legal --keyword "tort"
dmr keywords suggest --since 30d # mine from your logs
# Inspect
dmr config show # effective config + L3 status
dmr config validate # validate dmr.yaml
dmr doctor # env / dep / readiness check
dmr stats # routing distribution
dmr stats cost --since 7d # cost breakdown
dmr models # registry inventory
# Eval
dmr eval --data test.jsonl # accuracy + tier distribution
# Other
dmr init # scaffold dmr.yaml
dmr presets # list domain presets
dmr benchmark # local p50/p95/p99 latency
dmr version
Production checklist
Before going live with serious traffic:
- Override the bundled registry. Bundled prices go stale fast.
dmr models export > my-models.yaml, edit, thenRouter.from_registry("my-models.yaml"). - Train Layer 3 on your data. Run
dmr train --autoafter a few hundred logged decisions. Reduces L2 calls another 60–80%. - Pin a small budget initially.
Router(budget_usd=100)and watchdmr stats cost. - Set a tight L2 circuit breaker.
failure_threshold=3, cooldown_secs=120so a provider outage doesn't drain your wallet. - Configure decision logging to an immutable backend (S3 + object lock, or write-only Kafka) for audit trails.
- Run
dmr doctorin CI. Fail the build on any[x]. - Use
ShadowModewhen changing routing config — runs old and new in parallel, logs diffs without affecting users. - Pin the package version in your lock file. Semver — minor bumps may include behavior changes for unset config defaults.
We don't phone home
dynamic-model-routercollects zero telemetry on its own. No usage data, model names, error reports — nothing about your usage ever leaves your machine to us or anyone else.
The only network calls happen when you ask for them: Layer 2 → your LLM provider, Router(registry="https://...") → that URL, or your configured logger backend forwarding decisions to your DB.
(Not to be confused with DMR_TELEMETRY=1 — that's a flag you set to get richer logs about your own routing. The data stays in your environment.)
License
MIT — see LICENSE.
Security
Found a vulnerability? See SECURITY.md. Do not open a public issue.
Contributing
PRs welcome — see CONTRIBUTING.md. All contributors agree to the Code of Conduct.
Changelog & roadmap
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dynamic_model_router-0.4.0.tar.gz.
File metadata
- Download URL: dynamic_model_router-0.4.0.tar.gz
- Upload date:
- Size: 3.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d69ef26ce5bccfb3584ffd2da288ec16cbb9065b3c05d11d66685b1c6e2ad1e
|
|
| MD5 |
7510273ec504731c73f39e71593e5c95
|
|
| BLAKE2b-256 |
67cb188bad5bc36053aef7fa021f4ea1437bf87e31b946aa041e7a436a31eb15
|
Provenance
The following attestation bundles were made for dynamic_model_router-0.4.0.tar.gz:
Publisher:
release.yml on manthan9891994/agents-multi-model-support
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dynamic_model_router-0.4.0.tar.gz -
Subject digest:
5d69ef26ce5bccfb3584ffd2da288ec16cbb9065b3c05d11d66685b1c6e2ad1e - Sigstore transparency entry: 1501761520
- Sigstore integration time:
-
Permalink:
manthan9891994/agents-multi-model-support@6f09212faaec3a59ab8ac2e15bd49f212dc3cc08 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/manthan9891994
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6f09212faaec3a59ab8ac2e15bd49f212dc3cc08 -
Trigger Event:
release
-
Statement type:
File details
Details for the file dynamic_model_router-0.4.0-py3-none-any.whl.
File metadata
- Download URL: dynamic_model_router-0.4.0-py3-none-any.whl
- Upload date:
- Size: 3.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7634f016042c511bb47cedb06b039503bdfed78d2f3100a2646f4b17552b091
|
|
| MD5 |
70e252c469ac9b9073a1b9bdd8d3f158
|
|
| BLAKE2b-256 |
c1d5d452e95e5542da0d2c34748f013068fa61400b92a55f7b032201dcf68569
|
Provenance
The following attestation bundles were made for dynamic_model_router-0.4.0-py3-none-any.whl:
Publisher:
release.yml on manthan9891994/agents-multi-model-support
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dynamic_model_router-0.4.0-py3-none-any.whl -
Subject digest:
b7634f016042c511bb47cedb06b039503bdfed78d2f3100a2646f4b17552b091 - Sigstore transparency entry: 1501761646
- Sigstore integration time:
-
Permalink:
manthan9891994/agents-multi-model-support@6f09212faaec3a59ab8ac2e15bd49f212dc3cc08 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/manthan9891994
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6f09212faaec3a59ab8ac2e15bd49f212dc3cc08 -
Trigger Event:
release
-
Statement type: