Skip to main content

Provider-agnostic LLM router. Pick the cheapest capable model per prompt with rule-based scoring. Wraps LiteLLM for format conversion + streaming.

Project description

smart-llm-router

Provider-agnostic LLM router. Pick the cheapest capable model per prompt with rule-based scoring. Wraps LiteLLM for format conversion, streaming, tool calls, and 100+ provider integrations.

Why

Every LLM proxy today routes based on a model name you pick. This one picks the model for you — locally, in <1ms, with zero ML — by scoring the prompt across 14 dimensions (code presence, reasoning markers, multi-step patterns, multilingual keywords, etc.) and mapping to one of four tiers (SIMPLE / MEDIUM / COMPLEX / REASONING).

You bring an upstream (OpenRouter, Together, Fireworks, Groq, Anthropic direct, vLLM, Ollama — anything OpenAI-compatible). It does the rest.

Install

pip install smart-llm-router

Two console scripts ship with the package: smart-llm-router (full name) and slr (short alias).

Quick start with OpenRouter (default upstream)

# 1. Get an OpenRouter key at https://openrouter.ai/keys
export OPENROUTER_API_KEY=sk-or-v1-...
export LITELLM_MASTER_KEY=sk-anything    # gates the proxy itself

# 2. Start the proxy on :4000 (uses bundled OpenRouter config by default)
smart-llm-router start

In another terminal — any OpenAI-compatible client works:

from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:4000/v1", api_key="sk-anything")

# Smart routing — rule-based scorer picks the cheapest capable model
resp = client.chat.completions.create(
    model="smart/auto",
    messages=[{"role": "user", "content": "prove that sqrt(2) is irrational step by step"}],
)
# → routed to REASONING tier (e.g. deepseek/deepseek-r1)

Or curl:

curl http://127.0.0.1:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-anything" \
  -H "Content-Type: application/json" \
  -d '{"model":"smart/auto","messages":[{"role":"user","content":"hi"}]}'

Inspect routing without dispatching

slr test "what is the capital of france"
# → SIMPLE / google/gemini-2.5-flash-lite / 100% savings vs claude-sonnet-4.6

slr test "Prove that sqrt(2) is irrational step by step"
# → REASONING / deepseek/deepseek-r1 / 90% savings

slr test "design a high-availability microservices architecture" --profile premium
# → COMPLEX / anthropic/claude-opus-4.7

slr models --profile auto    # show the tier→model table

Pointing at a different upstream

The bundled config targets OpenRouter, but anything OpenAI-compatible works (Together, Fireworks, Groq, DeepInfra, vLLM, Ollama, OpenAI direct). Copy the bundled YAML and edit api_base / api_key:

# Copy the bundled config to your working directory
python -c "from importlib.resources import files; import shutil; shutil.copy(files('smart_llm_router') / 'default_config.yaml', './smart-llm-router.yaml')"

# Edit smart-llm-router.yaml — swap api_base / api_key per model_list entry
# Then start with --config
smart-llm-router start --config smart-llm-router.yaml

Available routing profiles

model value Behavior
smart/auto Rule-based scoring → cheapest capable model
smart/eco Rule-based scoring → cheapest tier table (free + lite models)
smart/premium Rule-based scoring → quality-first tier table (Claude Sonnet/Opus, GPT-4o, o1)
smart/agentic Rule-based scoring → tool-use-friendly tier table (auto-engaged when tools[] present)
smart/free Forces only free/local models
<provider>/<model> Bypasses routing, dispatches directly

Pin a specific model (no routing)

Pass a concrete model ID and the router leaves it alone:

client.chat.completions.create(
    model="anthropic/claude-sonnet-4.6",   # always Sonnet
    messages=[...]
)

client.chat.completions.create(
    model="anthropic/claude-opus-4.7",     # always Opus
    messages=[...]
)

client.chat.completions.create(
    model="openai/gpt-4o",                 # always GPT-4o
    messages=[...]
)

Models pre-wired in the bundled config: anthropic/claude-haiku-4.5, anthropic/claude-sonnet-4.6, anthropic/claude-opus-4.6, anthropic/claude-opus-4.7, openai/gpt-4o, openai/gpt-4o-mini, openai/o1, openai/o3, openai/o3-mini, openai/o4-mini, google/gemini-2.5-flash-lite, google/gemini-2.5-flash, google/gemini-2.5-pro, google/gemini-2.0-flash-lite-001, deepseek/deepseek-chat, deepseek/deepseek-r1, meta-llama/llama-3.3-70b-instruct. Add more by editing the model_list in your config YAML.

Use with Claude Code

Claude Code respects ANTHROPIC_BASE_URL. Point it at the proxy:

export ANTHROPIC_BASE_URL=http://127.0.0.1:4000
export ANTHROPIC_AUTH_TOKEN=sk-anything   # the proxy's master key
claude

Then inside Claude Code: /model anthropic/claude-opus-4.7 to pin Opus, or /model smart/premium to let the router pick the best Claude per request.

How it works

  1. Client sends OpenAI/Anthropic/Gemini-format request to localhost:4000.
  2. LiteLLM Proxy parses; SmartRouterHook.async_pre_call_hook intercepts.
  3. If model is a smart/* profile, the rule-based router scores the prompt and picks a concrete upstream model ID.
  4. LiteLLM dispatches to the configured upstream — handling format conversion, streaming, tool calls, retries, etc.

Routing internals

The classifier (smart_llm_router/router/rules.py) scores each prompt across 14 weighted dimensions:

Dimension Weight Detects
reasoningMarkers 0.18 prove, theorem, step by step, 证明, теорема, ...
codePresence 0.15 ```, function, class, SELECT, 异步, ...
multiStepPatterns 0.12 "first ... then", "step 1", "1. "
technicalTerms 0.10 algorithm, architecture, kubernetes, ...
tokenCount 0.08 <50 tok ⇒ -1, >500 ⇒ +1
creativeMarkers 0.05 "write a story/poem"
questionComplexity 0.05 count of ?
constraintCount 0.04 "must", "exactly", "at most"
agenticTask 0.04 "edit file", "deploy", "install", "verify"
imperativeVerbs 0.03 "implement", "build", "fix"
outputFormat 0.03 json, yaml, table, schema
referenceComplexity 0.02 "above", "below", "the docs"
domainSpecificity 0.02 quantum, fpga, homomorphic, ...
simpleIndicators 0.02 "what is", "hello" → negative
negationComplexity 0.01 "not", "without", "except"

Keyword sets are multilingual — EN + ZH + JA + RU + DE + ES + PT + KO + AR — so the same scorer works across 9 languages without translation.

The score maps to a tier through three boundaries:

< 0.0   → SIMPLE        0.3-0.5 → COMPLEX
0.0-0.3 → MEDIUM        > 0.5   → REASONING

Plus three hard overrides: 2+ reasoning keywords ⇒ force REASONING; >100k tokens ⇒ force COMPLEX; system prompt mentioning json/schema ⇒ floor at MEDIUM.

Attribution

The 14-dimension rule-based router in smart_llm_router/router/ is ported from ClawRouter (MIT). Format conversion and streaming come from LiteLLM (MIT).

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smart_llm_router-0.1.2.tar.gz (31.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smart_llm_router-0.1.2-py3-none-any.whl (29.8 kB view details)

Uploaded Python 3

File details

Details for the file smart_llm_router-0.1.2.tar.gz.

File metadata

  • Download URL: smart_llm_router-0.1.2.tar.gz
  • Upload date:
  • Size: 31.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for smart_llm_router-0.1.2.tar.gz
Algorithm Hash digest
SHA256 cdfc88c2cdb1a22bf0caa167e0c7660ba7878d685f756ebfe4b694c78b8b2f0e
MD5 e4de8037bf92e9dd5db6bb0892c23a43
BLAKE2b-256 851fc7c6a99ea8a3dbfb034af3ea823df3b7cabbcb559221e15c94c090a2b5f2

See more details on using hashes here.

File details

Details for the file smart_llm_router-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for smart_llm_router-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3e5461c723a282e9a502dfa8962e21e636488b3b76059ea1bc0be21b9894dac2
MD5 6fac359432521fb881199f3ed1bcc7da
BLAKE2b-256 3fa3bd8e74956367bdf73bccb79fe273d7e425d2696826caeda68e97780805b6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page