A 3-layer cascade classifier that routes each task to the cheapest model that can handle it well — before the agent makes an API call.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

manthan9894

These details have not been verified by PyPI

Project description

dynamic-model-router

A 3-layer cascade classifier that routes each task to the cheapest model that can handle it well — before the agent makes an API call.

from classifier import classify

decision = classify("What is 2+2?")                    # → low tier (cheap)
decision = classify("Design a CQRS architecture for…") # → high tier (capable)
print(decision.tier, decision.model_name)

That's the whole pitch. Cost goes down 60–80% on real workloads with no quality loss.

Install

pip install dynamic-model-router          # core
pip install 'dynamic-model-router[ml]'    # + Layer 3 (recommended)
pip install 'dynamic-model-router[ml,google]'           # + Gemini provider
pip install 'dynamic-model-router[ml,google,anthropic,openai]'   # all 3

Set one API key in .env (Google has a free tier — easiest start):

echo 'GOOGLE_API_KEY=your-key-here' > .env

Verify your install:

dmr doctor

The 3 layers — in plain English

Every task you classify walks down a ladder. The first layer that's confident wins. Most tasks stop at Layer 1.

	Layer	What it does	Cost	Speed
🟦	Layer 1 — Keywords	Looks at the words in your task. "implement", "function" → coding. "summarize" → doc creation. "diagnose", "patient" → medical reasoning.	Free	<1 ms
🟩	Layer 3 — ML model	A small neural net trained on your data (or our defaults). Catches things keywords miss — like sentence structure, intent, complexity.	Free	~15 ms
🟨	Layer 2 — LLM fallback	When the first two are unsure, asks an LLM to classify the task. Same provider you'll route to.	$$	~500 ms

The cascade: keywords confident? → ship. Otherwise: ML confident? → ship. Otherwise: ask an LLM. So every customization you make to Layer 1 (cheap, deterministic) saves you Layer 2 calls (slow, billed).

What each layer outputs is the same: (task_type, complexity, confidence). Together those map to (provider, tier, model) via a configurable matrix.

60-second quickstart

from classifier import Router

# Zero config. Layer 3 turns on automatically once you've trained it.
router = Router(layer3_enabled="auto")

decision = router.classify("Implement Dijkstra's algorithm in Python")
print(decision.model_name)   # → gemini-2.5-flash
print(decision.tier.value)   # → low
print(decision.layer_used)   # → layer1
print(decision.reasoning)    # → keyword match: "implement"

Drop that decision.model_name into whatever SDK you use:

from google import genai
client = genai.Client()
response = client.models.generate_content(
    model=decision.model_name,
    contents="Implement Dijkstra's algorithm in Python",
)

Or use one of the 11 framework integrations — LangChain, CrewAI, AutoGen, ADK, LlamaIndex, Pydantic AI, DSPy, Haystack, Semantic Kernel, smolagents, OpenAI Agents.

Layer 1 — Add your own keywords (no code needed)

Layer 1 is just: "if the task contains these words, it's probably this kind of task." Adding domain vocabulary is the single highest-leverage customization you can make.

The easy way — `dmr keywords`

# Add a few legal-domain keywords
dmr keywords add --domain legal --type reasoning \
                 --keywords "tort,liable,precedent,indemnification"

# See what you've added
dmr keywords list

# Found a wrong one?
dmr keywords remove --domain legal --keyword "tort"

That's it. Packs are saved to ~/.dmr/keywords/<domain>.yaml and auto-loaded by every new Router() — no code change.

Don't know what keywords to add? Mine them from your logs

Once your router has handled some real traffic, ask it which words it's seeing:

dmr keywords suggest --since 30d --top 15

Top distinctive n-grams per task_type (not already in any pack):

  [reasoning]
     2.41   n=37    differential diagnosis
     2.18   n=29    clinical scenario
     1.94   n=42    contraindication

  [doc_creation]
     2.05   n=51    progress note
     1.78   n=33    discharge summary

Pick the strong ones and dmr keywords add them.

Or build a pack programmatically

from classifier import KeywordPack, TaskType, Router

biotech = (KeywordPack.builder("biotech")
           .add(TaskType.REASONING, ["protein", "CRISPR", "in-vitro"])
           .escalator("genome-wide", weight=2)   # bumps complexity
           .build())

router = Router(extra_keyword_packs=[biotech])

Layer 3 — Train on your data (one command)

You don't need labeled data to start. The package logs every routing decision to routing_decisions.jsonl, and dmr train --auto turns that log into training data using 8 weak-supervision rules (Snorkel-style — short prompts, user retries, model escalations, etc.).

Workflow

Day 1. Install. Use the router. L1 + L2 work immediately. L3 is silently disabled.

router = Router(layer3_enabled="auto")    # auto = enable when a model exists

Day 30. You've logged a few hundred decisions. dmr doctor notices:

[!] L3 model file  WARN  missing, but 547 decisions logged
                          → run `dmr train --auto` to enable Layer 3

One command to bootstrap Layer 3 from those logs:

dmr train --auto

[1/3] Auto-labeling decision/outcome telemetry since 2026-04-09...
  Got 312 confident labels:
    task_type   reasoning            104
    task_type   doc_creation          98
    task_type   code_creation         67
    complexity  simple                86
    complexity  standard             162
    complexity  complex               64

[2/3] Training Layer 3 head (frozen MiniLM + calibrated MLPs)...
[3/3] Done.

  task_type accuracy:    0.831
  complexity accuracy:   0.776

  Layer 3 is now active. New `Router()` instances will pick it up
  automatically when constructed with `layer3_enabled='auto'` (default).

That's it. Re-run any time you want — each run replaces the model.

Already have labeled data?

dmr train --data my_examples.jsonl

JSONL format:

{"task": "Implement Dijkstra in Python", "task_type": "code_creation", "complexity": "standard"}
{"task": "Hello", "task_type": "conversation", "complexity": "simple"}

No production data and want a head start?

dmr generate-data --domain healthcare --per-slot 50 --out healthcare.jsonl
dmr train --data healthcare.jsonl

(Uses Gemini to synthesize realistic examples for your domain.)

Tune Layer 3 in code

Router(
    layer3_enabled="auto",                                     # default
    layer3_threshold=0.85,                                      # higher = stricter
    layer3_embedding_model="BAAI/bge-large-en-v1.5",           # swap encoder
)

Track & inspect what's happening

Every classification is logged. The package gives you simple commands to inspect what the router is doing.

`dmr doctor` — health check + readiness

dmr doctor

  [+] Python version              OK   3.12.7
  [+] dep:pydantic_settings       OK   installed
  [+] opt:google.genai            OK   installed (Layer 2 fallback)
  [+] opt:sentence_transformers   OK   installed (Layer 3 ML head)
  [+] key:google                  OK   configured
  [!] key:anthropic               WARN ANTHROPIC_API_KEY not set
  [+] L3 model file               OK   head_v1.joblib (3,166 KB)
  [+] classify smoke test         OK   tier=low model=gemini-2.5-flash

  Result: 12 ok, 1 warning(s), 0 failure(s)

Run it in CI — fail your build on [x].

`dmr config show` — what's actually loaded

dmr config show

  dynamic-model-router  v0.2.0

  [settings]
    default_provider          google
    layer1_enabled            True
    layer2_enabled            True
    layer3_enabled            True
    cache_enabled             True
    monthly_budget_usd        $1000.0

  [registry]
    providers                 google, anthropic, openai
    models                    8

  [layer 3]
    model file                head_v1.joblib (3,166 KB)
    trained on                2026 examples
    task_type accuracy        0.789
    complexity accuracy       0.796

  [keyword packs]
    registered                healthcare, legal, your_custom

`dmr stats` — what's it actually routing?

dmr stats              # tier distribution + layer hit rates (default 24h)
dmr stats cost --since 7d
dmr stats disagreements

Routing summary — last 24 hours
  Total decisions          1,247
  Layer 1 (free)           892   (71.5%)
  Layer 3 (ML)             231   (18.5%)
  Layer 2 (LLM)            124   (10.0%)

  Tier distribution
    low                    687   (55.1%)   $0.86
    medium                 478   (38.3%)   $4.12
    high                    82   ( 6.6%)   $9.74
                                            ─────
                                            $14.72

`dmr config validate` — schema-check your `dmr.yaml`

dmr config validate

Decision log — three modes

The router emits two streams: decisions (what was routed where) and outcomes (what happened — tokens, cost, success). How they're delivered depends on what you turn on.

Mode 1 — Default (no setup)

One quiet INFO line per event via standard Python logging. No files. No DB. Just like any well-behaved library:

INFO dmr.decisions: DMR decision: tier=low  model=gemini-2.5-flash layer=layer1 conf=0.91 lat=2ms
INFO dmr.outcomes:  DMR outcome:  tokens=42/180 wall=412ms success=True cost=$0.000023

Silence it: logging.getLogger("dmr").setLevel(logging.WARNING).

Mode 2 — Full structured telemetry

Set DMR_TELEMETRY=1. Same logger, richer payload — now every event is a full JSON event at logging.DEBUG. Still no files written. If you want persistence, see Mode 3.

DMR_TELEMETRY=1 python app.py

{"timestamp": "2026-05-09T14:23:11Z", "decision_id": "abc123...", "router_version": "0.4.0",
 "task_preview": "Implement…", "tier": "low", "model": "gemini-2.5-flash", "task_type": "code_creation",
 "complexity": "standard", "confidence": 0.91, "layer": "layer1", "latency_ms": 0.4,
 "provider": "google", "compliance_flag": false, "cached": false}

PII (SSNs, emails, API keys, JWTs, phone numbers, etc.) is auto-redacted from task_preview and error_message. Route the dmr.decisions and dmr.outcomes Python loggers wherever you want — file handler, syslog, OTLP, Datadog, etc.

Mode 3 — Pluggable backend (you own the storage)

The package never writes files automatically. If you want persistence, wire a backend — that's the only way data lands anywhere outside Python logging.

Any object with a log(entry: dict) method works:

from classifier import Router
from examples.custom_backends.sqlite_backend import SQLiteBackend

backend = SQLiteBackend("my_telemetry.db")
router = Router(decision_logger=backend, outcome_logger=backend)

Ready-made backends in examples/custom_backends/:

Storage	File	Extra deps
SQLite (local, zero-dep)	sqlite_backend.py	none
PostgreSQL	postgres_backend.py	`psycopg2-binary`
Google BigQuery	bigquery_backend.py	`google-cloud-bigquery`
AWS DynamoDB	dynamodb_backend.py	`boto3`
Google Cloud Storage	gcs_backend.py	`google-cloud-storage`

Built-in (no extra files needed): JSONLLoggerBackend, StdoutLoggerBackend, WebhookLoggerBackend, KafkaLoggerBackend, S3LoggerBackend.

Fan out to multiple sinks with MultiLoggerBackend:

from classifier import Router, MultiLoggerBackend, StdoutLoggerBackend
from examples.custom_backends.sqlite_backend import SQLiteBackend

backend = MultiLoggerBackend([
    SQLiteBackend("local.db"),     # local queryable copy
    StdoutLoggerBackend(),         # also stream to stdout for log collectors
])
router = Router(decision_logger=backend, outcome_logger=backend)

A broken backend never blocks the others — failures are caught and logged at WARNING.

What's in each event

Decision event (one per router.classify()):

Field	Type	Notes
`decision_id`	str	16-char hex — join key to outcomes
`timestamp`	ISO 8601	UTC
`router_version`	str	package `__version__`
`task_preview`	str	first 200 chars, PII-redacted
`task_length`	int	full task length
`tier`	str	`low`/`medium`/`high`
`model`, `provider`	str	the routed model
`task_type`, `complexity`	str	classifier output
`confidence`	float	0–1
`layer`	str	which layer decided: `layer1`/`layer2`/`layer3`
`latency_ms`	float	classification time
`compliance_flag`	bool	PII/PHI detected in task
`disagreement`	bool	L1 vs L3 disagree
`exploration`	bool	random sample for drift detection
`cached`, `cached_from`	bool, str	cache-hit metadata

Outcome event (call router.report_outcome(...) after your LLM call returns):

Field	Type	Notes
`decision_id`	str	join key
`tokens_in`, `tokens_out`	int	usage
`tokens_estimated`	bool	True if heuristic (vs provider-reported)
`wall_ms`	float	full LLM call time
`success`	bool	call completed
`cost_usd`	float	computed from model rates
`user_feedback`	str	`up`/`down`/None
`user_retried`, `user_escalated_model`, `edit_distance`	mixed	optional signals
`error_message`	str	PII-redacted

Join decisions to outcomes via decision_id for cost-per-tier / accuracy / cache-hit-rate dashboards.

Try it in 30 seconds

python examples/test_telemetry.py              # Mode 1 — quiet
DMR_TELEMETRY=1 python examples/test_telemetry.py   # Mode 2 — full JSON
python examples/test_telemetry.py --db         # Mode 3 — SQLite backend + analytics

Layer 2 — LLM fallback (advanced)

Layer 2 only fires when L1 + L3 are both uncertain (~10% of traffic in practice). Defaults to Gemini Flash, but everything is overridable:

Router(
    layer2_provider="anthropic",
    layer2_model="claude-haiku-4-5-20251001",
    l2_retry_policy={"max_attempts": 5, "initial_delay": 0.5, "backoff": 2.0},
    l2_circuit_breaker={"failure_threshold": 3, "cooldown_secs": 120},
    layer2_prompt_template=open("my_prompt.txt").read(),
    budget_usd=100,           # auto-downgrades at 80%, halts at 100%
)

Disable it entirely if you want a pure offline router:

Router(layer2_enabled=False)

Model registry

No model name or price is hardcoded. Everything lives in YAML.

dmr models                              # see what's loaded
dmr models load my-models.yaml --replace
dmr models export --output snapshot.yaml

# my-models.yaml
providers:
  groq:
    api_key_env: GROQ_API_KEY
    tiers:
      low:    llama-3.3-8b-instant
      medium: llama-3.3-70b-versatile
      high:   llama-3.3-70b-versatile
models:
  llama-3.3-8b-instant:
    cost: { input_per_1m: 0.05, output_per_1m: 0.08 }
    capabilities: { context_window: 128000, supports_function_calling: true }

Or programmatically:

from classifier import register_provider, register_model_cost, ModelTier

register_provider("groq", {
    ModelTier.LOW:  "llama-3.3-8b-instant",
    ModelTier.HIGH: "llama-3.3-70b-versatile",
})
register_model_cost("llama-3.3-70b-versatile", input_per_1m=0.59, output_per_1m=0.79)

Override priority: Router(registry=...) → DMR_REGISTRY env var → bundled default.yaml.

Integrations

Framework	Module	One-line use
LangChain	`classifier.integrations.langchain`	`get_chat_model(task)` or `DynamicChatModel()`
CrewAI	`classifier.integrations.crewai`	`pick_llm_for_task(task)` or `DynamicLLM()`
AutoGen	`classifier.integrations.autogen`	`get_autogen_llm_config(task)`
OpenAI Agents	`classifier.integrations.autogen`	`get_openai_agent_model(task)`
Google ADK	`classifier.integrations.adk`	`before_model_callback=dynamic_model_selector`
LlamaIndex	`classifier.integrations.llamaindex`	`get_llm(task)` or `DynamicLLM()`
Pydantic AI	`classifier.integrations.pydantic_ai`	`get_model_string(task)` or `get_agent(task)`
DSPy	`classifier.integrations.dspy`	`get_lm(task)` or `with route(task): ...`
Haystack	`classifier.integrations.haystack`	`get_generator(task)`
Semantic Kernel	`classifier.integrations.semantic_kernel`	`get_chat_service(task)`
smolagents (HF)	`classifier.integrations.smolagents`	`get_model(task)` or `DynamicModel()`

# CrewAI example — every call this agent makes is routed dynamically
from crewai import Agent
from classifier.integrations.crewai import DynamicLLM

agent = Agent(role="Analyst", goal="...", llm=DynamicLLM())

CLI reference

# Classify
dmr classify "task text"                       # one-shot
dmr classify --preset healthcare "Patient MRN…"

# Train Layer 3
dmr train --auto                               # bootstrap from logs
dmr train --data examples.jsonl                # train on labeled JSONL
dmr generate-data --domain legal --per-slot 50 # synthesize via Gemini

# Customize Layer 1 keywords
dmr keywords add --domain legal --type reasoning --keywords "tort,liable"
dmr keywords list
dmr keywords remove --domain legal --keyword "tort"
dmr keywords suggest --since 30d               # mine from your logs

# Inspect
dmr config show                                # effective config + L3 status
dmr config validate                            # validate dmr.yaml
dmr doctor                                     # env / dep / readiness check
dmr stats                                      # routing distribution
dmr stats cost --since 7d                      # cost breakdown
dmr models                                     # registry inventory

# Eval
dmr eval --data test.jsonl                     # accuracy + tier distribution

# Other
dmr init                                       # scaffold dmr.yaml
dmr presets                                    # list domain presets
dmr benchmark                                  # local p50/p95/p99 latency
dmr version

Production checklist

Before going live with serious traffic:

Override the bundled registry. Bundled prices go stale fast. dmr models export > my-models.yaml, edit, then Router.from_registry("my-models.yaml").
Train Layer 3 on your data. Run dmr train --auto after a few hundred logged decisions. Reduces L2 calls another 60–80%.
Pin a small budget initially. Router(budget_usd=100) and watch dmr stats cost.
Set a tight L2 circuit breaker. failure_threshold=3, cooldown_secs=120 so a provider outage doesn't drain your wallet.
Configure decision logging to an immutable backend (S3 + object lock, or write-only Kafka) for audit trails.
Run dmr doctor in CI. Fail the build on any [x].
Use ShadowMode when changing routing config — runs old and new in parallel, logs diffs without affecting users.
Pin the package version in your lock file. Semver — minor bumps may include behavior changes for unset config defaults.

We don't phone home

dynamic-model-router collects zero telemetry on its own. No usage data, model names, error reports — nothing about your usage ever leaves your machine to us or anyone else.

The only network calls happen when you ask for them: Layer 2 → your LLM provider, Router(registry="https://...") → that URL, or your configured logger backend forwarding decisions to your DB.

(Not to be confused with DMR_TELEMETRY=1 — that's a flag you set to get richer logs about your own routing. The data stays in your environment.)

License

MIT — see LICENSE.

Security

Found a vulnerability? See SECURITY.md. Do not open a public issue.

Contributing

PRs welcome — see CONTRIBUTING.md. All contributors agree to the Code of Conduct.

Changelog & roadmap

CHANGELOG.md · ROADMAP.md

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

manthan9894

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.4.0

May 11, 2026

0.1.0

May 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dynamic_model_router-0.4.0.tar.gz (3.2 MB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dynamic_model_router-0.4.0-py3-none-any.whl (3.2 MB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file dynamic_model_router-0.4.0.tar.gz.

File metadata

Download URL: dynamic_model_router-0.4.0.tar.gz
Upload date: May 11, 2026
Size: 3.2 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dynamic_model_router-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`5d69ef26ce5bccfb3584ffd2da288ec16cbb9065b3c05d11d66685b1c6e2ad1e`
MD5	`7510273ec504731c73f39e71593e5c95`
BLAKE2b-256	`67cb188bad5bc36053aef7fa021f4ea1437bf87e31b946aa041e7a436a31eb15`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dynamic_model_router-0.4.0.tar.gz:

Publisher: release.yml on manthan9891994/agents-multi-model-support

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dynamic_model_router-0.4.0.tar.gz
- Subject digest: 5d69ef26ce5bccfb3584ffd2da288ec16cbb9065b3c05d11d66685b1c6e2ad1e
- Sigstore transparency entry: 1501761520
- Sigstore integration time: May 11, 2026
Source repository:
- Permalink: manthan9891994/agents-multi-model-support@6f09212faaec3a59ab8ac2e15bd49f212dc3cc08
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/manthan9891994
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@6f09212faaec3a59ab8ac2e15bd49f212dc3cc08
- Trigger Event: release

File details

Details for the file dynamic_model_router-0.4.0-py3-none-any.whl.

File metadata

Download URL: dynamic_model_router-0.4.0-py3-none-any.whl
Upload date: May 11, 2026
Size: 3.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dynamic_model_router-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b7634f016042c511bb47cedb06b039503bdfed78d2f3100a2646f4b17552b091`
MD5	`70e252c469ac9b9073a1b9bdd8d3f158`
BLAKE2b-256	`c1d5d452e95e5542da0d2c34748f013068fa61400b92a55f7b032201dcf68569`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dynamic_model_router-0.4.0-py3-none-any.whl:

Publisher: release.yml on manthan9891994/agents-multi-model-support

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dynamic_model_router-0.4.0-py3-none-any.whl
- Subject digest: b7634f016042c511bb47cedb06b039503bdfed78d2f3100a2646f4b17552b091
- Sigstore transparency entry: 1501761646
- Sigstore integration time: May 11, 2026
Source repository:
- Permalink: manthan9891994/agents-multi-model-support@6f09212faaec3a59ab8ac2e15bd49f212dc3cc08
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/manthan9891994
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@6f09212faaec3a59ab8ac2e15bd49f212dc3cc08
- Trigger Event: release

dynamic-model-router 0.4.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

dynamic-model-router

📚 Table of contents

Install

The 3 layers — in plain English

60-second quickstart

Layer 1 — Add your own keywords (no code needed)

The easy way — dmr keywords

Don't know what keywords to add? Mine them from your logs

Or build a pack programmatically

Layer 3 — Train on your data (one command)

Workflow

Already have labeled data?

No production data and want a head start?

Tune Layer 3 in code

Track & inspect what's happening

dmr doctor — health check + readiness

dmr config show — what's actually loaded

dmr stats — what's it actually routing?

dmr config validate — schema-check your dmr.yaml

Decision log — three modes

Mode 1 — Default (no setup)

Mode 2 — Full structured telemetry

Mode 3 — Pluggable backend (you own the storage)

What's in each event

Try it in 30 seconds

Layer 2 — LLM fallback (advanced)

Model registry

Integrations

CLI reference

Production checklist

We don't phone home

License

Security

Contributing

Changelog & roadmap

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

The easy way — `dmr keywords`

`dmr doctor` — health check + readiness

`dmr config show` — what's actually loaded

`dmr stats` — what's it actually routing?

`dmr config validate` — schema-check your `dmr.yaml`