A 3-layer cascade classifier that routes each task to the cheapest model that can handle it well — before the agent makes an API call.

These details have not been verified by PyPI

Project links

Project description

dynamic-model-router

A 3-layer cascade classifier that routes each task to the cheapest model that can handle it well — before the agent makes an API call.

from classifier import classify

decision = classify("What is 2+2?")                    # → low tier (cheap)
decision = classify("Design a CQRS architecture for…") # → high tier (capable)
print(decision.tier, decision.model_name)

That's the whole pitch. Cost goes down 60–80% on real workloads with no quality loss.

Why

You're paying for gpt-4o or claude-opus-4-7 to answer "Hello, how are you?". An LLM router should pick the right model per task. Existing routers are either:

Hardcoded ("if len(prompt) > X use big model") — too dumb
LLM-based (every routing decision is itself an LLM call) — adds latency + cost
Single-vendor (LiteLLM, etc.) — locked in

dynamic-model-router is 3 cascading classifiers that get progressively more accurate but more expensive, stopping at the first one that's confident. Most calls never leave Layer 1 (free, <1ms).

How it works

┌─────────┐   high confidence   ┌──────────┐
│ Layer 1 │ ──────────────────▶ │  Pick    │
│ keyword │                     │  model   │
│  <1ms   │                     │  & GO    │
└────┬────┘                     └──────────┘
     │ low confidence
     ▼
┌─────────┐   high confidence
│ Layer 3 │ ──────────────────▶ (same)
│   ML    │
│ ~15ms   │
└────┬────┘
     │ low confidence
     ▼
┌─────────┐
│ Layer 2 │ ──────────────────▶ (same)
│   LLM   │
│ ~500ms  │
└─────────┘

Each layer outputs (task_type, complexity, confidence) — together those map to (provider, tier, model) via a configurable matrix.

Install

# Core (Layer 1 only — keyword router, no ML, no LLM fallback)
pip install dynamic-model-router

# With Layer 3 (ML head) — recommended
pip install 'dynamic-model-router[ml]'

# With one or more providers
pip install 'dynamic-model-router[google,anthropic,openai]'

# With agent framework integrations
pip install 'dynamic-model-router[ml,crewai]'         # CrewAI
pip install 'dynamic-model-router[ml,adk,google]'     # Google ADK

# Production extras
pip install 'dynamic-model-router[redis,kafka,s3,otel,tokenizers]'

# Everything
pip install 'dynamic-model-router[all_extensions]'

Step-by-step quickstart

1️⃣ Install + set an API key

pip install 'dynamic-model-router[ml,google]'

# Choose any provider — Google's free tier is the easiest start.
echo 'GOOGLE_API_KEY=your-key-here' > .env

2️⃣ Verify your install

dmr doctor

You should see all green / yellow checks. Any red [FAIL] should be fixed before going further.

3️⃣ Classify your first task

from classifier import classify

decision = classify("Write a Python function to merge two sorted lists.")
print(f"Use model: {decision.model_name}")
print(f"Tier:      {decision.tier.value}")
print(f"Why:       {decision.reasoning}")

4️⃣ Route an actual LLM call

from classifier import Router
from google import genai

router = Router()

def smart_completion(task: str) -> str:
    decision = router.classify(task)
    client   = genai.Client(api_key=os.environ["GOOGLE_API_KEY"])
    response = client.models.generate_content(model=decision.model_name, contents=task)
    return response.text

print(smart_completion("Hi"))                            # gemini-2.5-flash
print(smart_completion("Design a distributed lock…"))    # gemini-2.5-pro

5️⃣ Train Layer 3 on your domain (optional but recommended)

# Generate sample data (or bring your own JSONL with task/task_type/complexity)
dmr generate-data --domain healthcare --per-slot 50 --out healthcare.jsonl

# Train a domain-specific classifier head (~30 seconds on CPU)
dmr train --data healthcare.jsonl

6️⃣ Customize per-domain

from classifier import Router, KeywordPack, TaskType

# Healthcare keywords + HIPAA PII patterns
router = Router.from_preset("healthcare")

# Or build your own
legal_pack = (
    KeywordPack.builder("legal")
    .add(TaskType.REASONING, ["precedent", "tort", "indemnification"])
    .add(TaskType.DOC_CREATION, ["clause", "agreement", "NDA"])
    .build()
)
router = Router(extra_keyword_packs=[legal_pack])

7️⃣ Production: drop in a `dmr.yaml`

dmr init                    # scaffolds dmr.yaml in cwd
$EDITOR dmr.yaml            # tweak providers, layers, thresholds, costs

router = Router.from_yaml("dmr.yaml")

Configuration — layer by layer

The package ships zero hardcoded model names, prices, or capabilities — everything is overridable. Below is the cheat sheet, organised by layer.

🔵 Layer 1 — Keyword Heuristics (always on, <1ms)

What	How
Add domain keywords	`Router(extra_keyword_packs=[KeywordPack.builder("…").add(...).build()])`
Tune scoring weights	`Router(l1_weights={"primary": 5.0, "secondary": 1.0, "escalator": 2.0})`
Disable entirely	`Router(layer1_enabled=False)`
Set escalation threshold	`Router(escalation_threshold=0.75)` (below this, fall through to L3/L2)

pack = (KeywordPack.builder("biotech")
        .add(TaskType.REASONING, ["protein", "CRISPR", "in-vitro"])
        .escalator("genome-wide", weight=2)
        .build())
router = Router(extra_keyword_packs=[pack])

🟢 Layer 3 — ML Classifier (frozen MiniLM + MLP head, ~15ms)

What	How
Train on your data	`router.train(data="my_examples.jsonl")` or `dmr train --data ...`
Swap the embedding model	`Router(layer3_embedding_model="BAAI/bge-large-en-v1.5")`
Plug in a custom strategy	`register_l3_strategy("my_pipeline", lambda task, hist: ...)`
Set abstain threshold	`Router(layer3_threshold=0.85)`
Disable	`Router(layer3_enabled=False)`

JSONL format for training:

{"task": "Implement Dijkstra in Python", "task_type": "code_creation", "complexity": "standard"}
{"task": "Hello", "task_type": "conversation", "complexity": "simple"}

🟡 Layer 2 — LLM Fallback (Gemini Flash by default, ~500ms)

What	How
Switch provider	`Router(layer2_provider="anthropic", layer2_model="claude-haiku-4-5-20251001")`
Custom prompt	`Router(layer2_prompt_template=open("my_prompt.txt").read())`
Retry policy	`Router(l2_retry_policy={"max_attempts": 5, "initial_delay": 0.5, "backoff": 2.0})`
Circuit breaker	`Router(l2_circuit_breaker={"failure_threshold": 3, "cooldown_secs": 120})`
Disable	`Router(layer2_enabled=False)`
Budget cap	`Router(budget_usd=100)` (auto-downgrades to MEDIUM at 80%, halts at 100%)

⚙️ Cross-cutting

What	How
Per-instance overrides	`Router(provider=..., tier_matrix=..., model_registry=...)`
Hooks	`Router(pre_classify_hooks=[…], post_classify_hooks=[…], on_error_hooks=[…])`
Custom router escape hatch	`Router(custom_classifier=lambda task, ctx: my_decision)`
Cache backend	`Router(cache_backend=RedisCacheBackend(host="…"))`
Decision logger	`Router(decision_logger=KafkaLoggerBackend(brokers=[…], topic="…"))`
Multi-tenant per-call	`router.classify(task, tenant_config={"providers":["anthropic"], …})`
A/B testing	`ABTest(control=Router(), treatment=Router(...), split=0.05)`
Shadow mode	`ShadowMode(primary=current, shadow=new, on_diff=log_diff)`
PII policy	`Router(pii_policy={"min_tier": ModelTier.HIGH, "block": False})`
Latency SLA	`Router(latency_budget_ms=1500)`
Data residency	`Router(residency="EU")`
Custom tokenizer	`register_tokenizer("model-name", lambda t: my_count(t))`
Layer plugin	`register_layer(MyCustomLayer())`

The model registry

No model name or price is hardcoded. All of it lives in YAML — bundled default.yaml is a snapshot you should override in production.

Inspect what's registered

dmr models                    # list providers + models + costs + capabilities

Override entirely with your own YAML

dmr models load my-models.yaml --replace

# my-models.yaml
version: "2026.05.01"
providers:
  groq:
    api_key_env: GROQ_API_KEY
    tiers:
      low:    llama-3.3-8b-instant
      medium: llama-3.3-70b-versatile
      high:   llama-3.3-70b-versatile
  bedrock:
    api_key_env: AWS_ACCESS_KEY_ID
    tiers:
      low:    anthropic.claude-haiku-4-5-20251001
      high:   anthropic.claude-opus-4-7

models:
  llama-3.3-8b-instant:
    cost: { input_per_1m: 0.05, output_per_1m: 0.08 }
    capabilities:
      context_window: 128000
      supports_function_calling: true
  llama-3.3-70b-versatile:
    cost: { input_per_1m: 0.59, output_per_1m: 0.79 }
    capabilities:
      context_window: 128000
      supports_function_calling: true

Or programmatically

from classifier import register_provider, register_model_cost, ModelTier

register_provider("groq", {
    ModelTier.LOW:    "llama-3.3-8b-instant",
    ModelTier.HIGH:   "llama-3.3-70b-versatile",
})
register_model_cost("llama-3.3-70b-versatile", input_per_1m=0.59, output_per_1m=0.79)

Override sources (priority order)

Router(registry="path-or-url")
Router.from_registry("path-or-url")
DMR_REGISTRY=/path/to/my-models.yaml env var (loaded at import)
DMR_NO_DEFAULT_REGISTRY=1 env var (start completely empty)
Bundled default.yaml (snapshot — verify before production!)

Integrations

Framework	Module	Pattern
LangChain	`classifier.integrations.langchain`	`get_chat_model(task)` or `DynamicChatModel()`
CrewAI	`classifier.integrations.crewai`	`pick_llm_for_task(task)` or `DynamicLLM()`
AutoGen	`classifier.integrations.autogen`	`get_autogen_llm_config(task)`
OpenAI Agents SDK	`classifier.integrations.autogen`	`get_openai_agent_model(task)`
Google ADK	`classifier.integrations.adk`	`before_model_callback=dynamic_model_selector`
LlamaIndex	`classifier.integrations.llamaindex`	`get_llm(task)` or `DynamicLLM()`
Pydantic AI	`classifier.integrations.pydantic_ai`	`get_model_string(task)` or `get_agent(task, **kw)`
DSPy	`classifier.integrations.dspy`	`get_lm(task)` or `with route(task): ...`
Haystack	`classifier.integrations.haystack`	`get_generator(task)`
Semantic Kernel	`classifier.integrations.semantic_kernel`	`get_chat_service(task)`
smolagents (HF)	`classifier.integrations.smolagents`	`get_model(task)` or `DynamicModel()`

# CrewAI example
from crewai import Agent
from classifier.integrations.crewai import DynamicLLM

agent = Agent(role="Analyst", goal="...", llm=DynamicLLM())
# Each call this agent makes is routed to the right tier dynamically.

# Decorator — any function gets dynamic model selection
from classifier import route_model

@route_model(provider="anthropic")
def call_claude(task: str, model_name: str = "claude-haiku-4-5-20251001"):
    # model_name is auto-injected by the router
    ...

CLI reference

dmr classify "task text"            # one-shot classification
dmr classify --preset healthcare "Patient MRN 12345 has chest pain"

dmr train --data examples.jsonl     # train Layer 3 on your data
dmr eval  --data test.jsonl         # accuracy + tier distribution
dmr generate-data --domain legal --per-slot 50    # synthetic training data via Gemini

dmr models                          # list registered providers/models/costs
dmr models load my-models.yaml --replace
dmr models export --output snapshot.yaml
dmr models pull https://example.com/community-registry.yaml

dmr stats                           # routing distribution from decision log
dmr stats cost --since 7d           # cost breakdown over last week

dmr doctor                          # diagnose env / config / dependencies
dmr version                         # package + Python + dep versions
dmr benchmark                       # local p50/p95/p99 latency
dmr init                            # scaffold dmr.yaml in cwd
dmr presets                         # list domain presets

Telemetry

dynamic-model-router does not collect any telemetry. No usage data, no model names, no error reports leave your machine. Ever.

The package never makes a network call you didn't ask for. The only network calls happen when:

You explicitly construct a Router and call .classify() with layer2_enabled=True — then Layer 2 calls the provider you chose.
You explicitly call Router.load_registry("https://...") — then we fetch that URL.
Your decision-logger backend is configured to forward (e.g. WebhookLoggerBackend).

If you discover any unexpected outbound traffic, that is a security bug — please file a security advisory.

Production checklist

Before going live with serious traffic:

Override the bundled registry. dmr models export > my-models.yaml, edit, then Router.from_registry("my-models.yaml"). Bundled prices go stale fast.
Set up secrets properly. Use a secret manager — not .env in your repo. Rotate quarterly.
Train Layer 3 on your data. A head_v1.joblib trained on your domain reduces L2 (LLM) calls by another 60–80%.
Pin a small budget initially (Router(budget_usd=100)) and watch dmr stats cost.
Enable strict PII scrubbing (pii_scrub_strict=true in settings, plus domain-specific extra_pii_patterns).
Set a tight L2 circuit breaker (failure_threshold=3, cooldown_secs=120) so a provider outage doesn't drain your wallet.
Configure decision logging to an immutable backend (S3 with object lock, or a write-only Kafka topic) for audit trails.
Run dmr doctor in CI — fail the build if any check is FAIL.
Use ShadowMode to validate every routing change before flipping the switch.
Subscribe to the security advisory for vulnerability notifications.
Pin the package version in your lock file. The package follows semver; minor bumps may include behaviour changes for unset config defaults.

License

MIT — see LICENSE.

Security

Found a vulnerability? See SECURITY.md. Please do not open a public issue.

Contributing

PRs welcome — see CONTRIBUTING.md. All contributors agree to the Code of Conduct.

Changelog

See CHANGELOG.md for release history.

Roadmap

See ROADMAP.md for upcoming features and the path from 0.1 → 1.0.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.0

May 11, 2026

This version

0.1.0

May 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dynamic_model_router-0.1.0.tar.gz (3.2 MB view details)

Uploaded May 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dynamic_model_router-0.1.0-py3-none-any.whl (3.2 MB view details)

Uploaded May 7, 2026 Python 3

File details

Details for the file dynamic_model_router-0.1.0.tar.gz.

File metadata

Download URL: dynamic_model_router-0.1.0.tar.gz
Upload date: May 7, 2026
Size: 3.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for dynamic_model_router-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ade722f72b9f745c2421b0f86ad8f7b0782141f1a9f7d184242fe2774eb1ec6f`
MD5	`d763af7513b48e0964b169250027f213`
BLAKE2b-256	`f1bc8601aeb964b6581c6f6f1766c692c4a62587c865493e16de990883232076`

See more details on using hashes here.

File details

Details for the file dynamic_model_router-0.1.0-py3-none-any.whl.

File metadata

Download URL: dynamic_model_router-0.1.0-py3-none-any.whl
Upload date: May 7, 2026
Size: 3.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for dynamic_model_router-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e9435b67439784c58010336ba8a24cfdf78e4189bf021c40ddb58a7c8ff13fee`
MD5	`b79a3352775fa6f7616b234094cc7d09`
BLAKE2b-256	`b95114b4a0b3fe6eb5ebc456824230726621fda4d66fcc87829f3cfac588ed0d`

See more details on using hashes here.

dynamic-model-router 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

dynamic-model-router

📚 Table of contents

Why

How it works

Install

Step-by-step quickstart

1️⃣ Install + set an API key

2️⃣ Verify your install

3️⃣ Classify your first task

4️⃣ Route an actual LLM call

5️⃣ Train Layer 3 on your domain (optional but recommended)

6️⃣ Customize per-domain

7️⃣ Production: drop in a dmr.yaml

Configuration — layer by layer

🔵 Layer 1 — Keyword Heuristics (always on, <1ms)

🟢 Layer 3 — ML Classifier (frozen MiniLM + MLP head, ~15ms)

🟡 Layer 2 — LLM Fallback (Gemini Flash by default, ~500ms)

⚙️ Cross-cutting

The model registry

Inspect what's registered

Override entirely with your own YAML

Or programmatically

Override sources (priority order)

Integrations

CLI reference

Telemetry

Production checklist

License

Security

Contributing

Changelog

Roadmap

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

7️⃣ Production: drop in a `dmr.yaml`