Provider-agnostic LLM router. Pick the cheapest capable model per prompt with rule-based scoring. Wraps LiteLLM for format conversion + streaming.
Project description
smart-llm-router
Provider-agnostic LLM router. Pick the cheapest capable model per prompt with rule-based scoring. Wraps LiteLLM for format conversion, streaming, tool calls, and 100+ provider integrations.
Why
Every LLM proxy today routes based on a model name you pick. This one picks the model for you — locally, in <1ms, with zero ML — by scoring the prompt across 14 dimensions (code presence, reasoning markers, multi-step patterns, multilingual keywords, etc.) and mapping to one of four tiers (SIMPLE / MEDIUM / COMPLEX / REASONING).
You bring an upstream (OpenRouter, Together, Fireworks, Groq, Anthropic direct, vLLM, Ollama — anything OpenAI-compatible). It does the rest.
Install
pip install smart-llm-router
Two console scripts ship with the package: smart-llm-router (full name) and slr (short alias).
Quick start with OpenRouter (default upstream)
# 1. Get an OpenRouter key at https://openrouter.ai/keys
export OPENROUTER_API_KEY=sk-or-v1-...
export LITELLM_MASTER_KEY=sk-anything # gates the proxy itself
# 2. Start the proxy on :4000 (uses bundled OpenRouter config by default)
smart-llm-router start
In another terminal — any OpenAI-compatible client works:
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:4000/v1", api_key="sk-anything")
# Smart routing — rule-based scorer picks the cheapest capable model
resp = client.chat.completions.create(
model="smart/auto",
messages=[{"role": "user", "content": "prove that sqrt(2) is irrational step by step"}],
)
# → routed to REASONING tier (e.g. deepseek/deepseek-r1)
Or curl:
curl http://127.0.0.1:4000/v1/chat/completions \
-H "Authorization: Bearer sk-anything" \
-H "Content-Type: application/json" \
-d '{"model":"smart/auto","messages":[{"role":"user","content":"hi"}]}'
Inspect routing without dispatching
slr test "what is the capital of france"
# → SIMPLE / google/gemini-2.5-flash-lite / 100% savings vs claude-sonnet-4.6
slr test "Prove that sqrt(2) is irrational step by step"
# → REASONING / deepseek/deepseek-r1 / 90% savings
slr test "design a high-availability microservices architecture" --profile premium
# → COMPLEX / anthropic/claude-opus-4.7
slr models --profile auto # show the tier→model table
Pointing at a different upstream
The bundled config targets OpenRouter, but anything OpenAI-compatible works (Together, Fireworks, Groq, DeepInfra, vLLM, Ollama, OpenAI direct). Copy the bundled YAML and edit api_base / api_key:
# Copy the bundled config to your working directory
python -c "from importlib.resources import files; import shutil; shutil.copy(files('smart_llm_router') / 'default_config.yaml', './smart-llm-router.yaml')"
# Edit smart-llm-router.yaml — swap api_base / api_key per model_list entry
# Then start with --config
smart-llm-router start --config smart-llm-router.yaml
Available routing profiles
model value |
Behavior |
|---|---|
smart/auto |
Rule-based scoring → cheapest capable model |
smart/eco |
Rule-based scoring → cheapest tier table (free + lite models) |
smart/premium |
Rule-based scoring → quality-first tier table (Claude Sonnet/Opus, GPT-4o, o1) |
smart/agentic |
Rule-based scoring → tool-use-friendly tier table (auto-engaged when tools[] present) |
smart/free |
Forces only free/local models |
<provider>/<model> |
Bypasses routing, dispatches directly |
Pin a specific model (no routing)
Pass a concrete model ID and the router leaves it alone:
client.chat.completions.create(
model="anthropic/claude-sonnet-4.6", # always Sonnet
messages=[...]
)
client.chat.completions.create(
model="anthropic/claude-opus-4.7", # always Opus
messages=[...]
)
client.chat.completions.create(
model="openai/gpt-4o", # always GPT-4o
messages=[...]
)
Models pre-wired in the bundled config: anthropic/claude-haiku-4.5, anthropic/claude-sonnet-4.6, anthropic/claude-opus-4.6, anthropic/claude-opus-4.7, openai/gpt-4o, openai/gpt-4o-mini, openai/o1, openai/o3, openai/o3-mini, openai/o4-mini, google/gemini-2.5-flash-lite, google/gemini-2.5-flash, google/gemini-2.5-pro, google/gemini-2.0-flash-lite-001, deepseek/deepseek-chat, deepseek/deepseek-r1, meta-llama/llama-3.3-70b-instruct. Add more by editing the model_list in your config YAML.
Use with Claude Code
Claude Code respects ANTHROPIC_BASE_URL. Point it at the proxy:
export ANTHROPIC_BASE_URL=http://127.0.0.1:4000
export ANTHROPIC_AUTH_TOKEN=sk-anything # the proxy's master key
claude
Then inside Claude Code: /model anthropic/claude-opus-4.7 to pin Opus, or /model smart/premium to let the router pick the best Claude per request.
How it works
- Client sends OpenAI/Anthropic/Gemini-format request to
localhost:4000. - LiteLLM Proxy parses;
SmartRouterHook.async_pre_call_hookintercepts. - If
modelis asmart/*profile, the rule-based router scores the prompt and picks a concrete upstream model ID. - LiteLLM dispatches to the configured upstream — handling format conversion, streaming, tool calls, retries, etc.
Routing internals
The classifier (smart_llm_router/router/rules.py) scores each prompt across 14 weighted dimensions:
| Dimension | Weight | Detects |
|---|---|---|
| reasoningMarkers | 0.18 | prove, theorem, step by step, 证明, теорема, ... |
| codePresence | 0.15 | ```, function, class, SELECT, 异步, ... |
| multiStepPatterns | 0.12 | "first ... then", "step 1", "1. " |
| technicalTerms | 0.10 | algorithm, architecture, kubernetes, ... |
| tokenCount | 0.08 | <50 tok ⇒ -1, >500 ⇒ +1 |
| creativeMarkers | 0.05 | "write a story/poem" |
| questionComplexity | 0.05 | count of ? |
| constraintCount | 0.04 | "must", "exactly", "at most" |
| agenticTask | 0.04 | "edit file", "deploy", "install", "verify" |
| imperativeVerbs | 0.03 | "implement", "build", "fix" |
| outputFormat | 0.03 | json, yaml, table, schema |
| referenceComplexity | 0.02 | "above", "below", "the docs" |
| domainSpecificity | 0.02 | quantum, fpga, homomorphic, ... |
| simpleIndicators | 0.02 | "what is", "hello" → negative |
| negationComplexity | 0.01 | "not", "without", "except" |
Keyword sets are multilingual — EN + ZH + JA + RU + DE + ES + PT + KO + AR — so the same scorer works across 9 languages without translation.
The score maps to a tier through three boundaries:
< 0.0 → SIMPLE 0.3-0.5 → COMPLEX
0.0-0.3 → MEDIUM > 0.5 → REASONING
Plus three hard overrides: 2+ reasoning keywords ⇒ force REASONING; >100k tokens ⇒ force COMPLEX; system prompt mentioning json/schema ⇒ floor at MEDIUM.
Attribution
The 14-dimension rule-based router in smart_llm_router/router/ is ported from ClawRouter (MIT). Format conversion and streaming come from LiteLLM (MIT).
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smart_llm_router-0.1.2.tar.gz.
File metadata
- Download URL: smart_llm_router-0.1.2.tar.gz
- Upload date:
- Size: 31.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cdfc88c2cdb1a22bf0caa167e0c7660ba7878d685f756ebfe4b694c78b8b2f0e
|
|
| MD5 |
e4de8037bf92e9dd5db6bb0892c23a43
|
|
| BLAKE2b-256 |
851fc7c6a99ea8a3dbfb034af3ea823df3b7cabbcb559221e15c94c090a2b5f2
|
File details
Details for the file smart_llm_router-0.1.2-py3-none-any.whl.
File metadata
- Download URL: smart_llm_router-0.1.2-py3-none-any.whl
- Upload date:
- Size: 29.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e5461c723a282e9a502dfa8962e21e636488b3b76059ea1bc0be21b9894dac2
|
|
| MD5 |
6fac359432521fb881199f3ed1bcc7da
|
|
| BLAKE2b-256 |
3fa3bd8e74956367bdf73bccb79fe273d7e425d2696826caeda68e97780805b6
|