Skip to main content

Your LiteLLM config, but smarter. Routes to the model the ecosystem is converging on.

Project description

litellm-wzrd-momentum

Your LiteLLM config, but smarter. Routes to the model the ecosystem is converging on — not the one you hardcoded six months ago.

pip install litellm-wzrd-momentum
from litellm import Router
from wzrd_momentum_strategy import register

router = Router(model_list=[...])  # your existing config
register(router)                    # done — routing is now dynamic

That's it. Your router.completion() calls now prefer models with accelerating real-world adoption. If WZRD is unreachable, your existing config takes over. Nothing breaks.

Why

You hardcode "gpt-4o" or "claude-sonnet-4-20250514". A new model surges past both on HuggingFace and OpenRouter. You don't notice for weeks. Meanwhile, the new model is 3x cheaper and faster for your workload.

This plugin tracks which models are trending RIGHT NOW across HuggingFace downloads, GitHub stars, OpenRouter routing volume, and ArtificialAnalysis benchmarks. It re-ranks your existing model list every 5 minutes. No new models injected — just smarter ordering of what you already configured.

Full example

from litellm import Router
from wzrd_momentum_strategy import register

router = Router(model_list=[
    {"model_name": "qwen-9b",  "litellm_params": {"model": "openrouter/qwen/qwen-3.5-9b"}},
    {"model_name": "qwen-35b", "litellm_params": {"model": "openrouter/qwen/qwen-3.5-35b-a3b"}},
    {"model_name": "llama-70b","litellm_params": {"model": "openrouter/meta-llama/llama-3.3-70b-instruct"}},
])

register(router, alias_map={
    "qwen-9b":  ["Qwen/Qwen3.5-9B"],
    "qwen-35b": ["Qwen/Qwen3.5-35B-A3B"],
    "llama-70b": ["meta-llama/Llama-3.3-70B-Instruct"],
})

# Sync
response = router.completion(model="qwen-9b", messages=[{"role": "user", "content": "Hello"}])

# Async
# response = await router.acompletion(model="qwen-9b", messages=[...])

How it works

  1. On each routing decision, fetches WZRD momentum signals (cached 5 min)
  2. Scores each deployment: trend + momentum × 0.3 + delta × 0.25, weighted by confidence
  3. Returns the highest-scoring deployment to LiteLLM
  4. LiteLLM handles retries, fallbacks, and provider errors as normal

If WZRD is unreachable, returns the first deployment. Your inference pipeline never breaks.

Behavior defaults

  • cache_ttl=300 seconds (5 minutes)
  • confidence policy:
    • normal: full signal weight (eligible for proactive routing)
    • low: half signal weight (observe-first posture)
    • insufficient: zero signal weight (observe-only; no proactive push)
  • fallback policy: if WZRD is down or payload contract drifts, route by deployment order (first candidate)
  • contract guard: requires contract_version (or legacy signal_version) and model-level fields (model, trend, score, confidence)

Score table

Trend Score Signal
surging +3.0 Downloads/stars growing >50% day-over-day
accelerating +2.0 Growing 10-50% day-over-day
stable 0.0 Flat or <10% growth
decelerating -1.0 Slowing 5-30% day-over-day
cooling -2.0 Dropping >30% day-over-day

Confidence scaling: normal = full weight, low = 50%, insufficient = 0% (new models with <3 days of data).

Alias mapping

WZRD tracks models by HuggingFace/GitHub name (Qwen/Qwen3.5-9B). LiteLLM uses provider-specific names (openrouter/qwen/qwen-3.5-9b).

The alias_map bridges them explicitly. Without it, the strategy auto-matches by extracting slugs from litellm_params.model — works for most cases, but explicit mapping is more reliable.

register(router, alias_map={
    "qwen-9b": ["Qwen/Qwen3.5-9B", "Qwen/Qwen3-9B"],  # multiple variants
    "llama-70b": ["meta-llama/Llama-3.3-70B-Instruct"],
})

Proxy integration

LiteLLM's proxy doesn't support custom strategies via YAML config. For proxy deployments, create a wrapper script:

# wzrd_proxy.py
import litellm
from litellm import Router
from wzrd_momentum_strategy import register

# Your normal proxy config
router = Router(model_list=[...])
register(router, alias_map={...})

# Start proxy with the patched router
from litellm.proxy.proxy_server import app

Or use the pre-router pattern from integrations/litellm-wzrd-router/ which works as middleware before any LiteLLM call (SDK or proxy).

Manual setup

If you prefer explicit control over the register() convenience:

from wzrd_momentum_strategy import WZRDMomentumStrategy

strategy = WZRDMomentumStrategy(
    router,
    wzrd_url="https://api.twzrd.xyz/v1/signals/momentum",
    alias_map={"qwen-9b": ["Qwen/Qwen3.5-9B"]},
    cache_ttl=300,
)
router.set_custom_routing_strategy(strategy)

API

The momentum data comes from a public, free, no-auth endpoint:

GET https://api.twzrd.xyz/v1/signals/momentum
GET https://api.twzrd.xyz/v1/signals/momentum?platform=huggingface&trending=true

Returns trend classification, score, confidence, action, capabilities, and platform for 48+ tracked AI models.

Expected output (live sample)

For a candidate set like qwen-9b, nemotron-120b, llama-70b, expected behavior is:

  • route to nemotron-120b when it is surging
  • deprioritize qwen-9b when decelerating
  • deprioritize llama-70b when cooling

The exact winner changes as momentum updates, but routing should follow trend and confidence consistently.

v0.1.0 release notes

  • Added LiteLLM CustomRoutingStrategyBase plugin with one-line registration helper
  • Added trend + momentum + delta scoring with confidence weighting
  • Added explicit alias map matching and automatic fallback matching from provider model slugs
  • Added contract guard for WZRD payload shape (signal_version + required model fields)
  • Added graceful degradation fallback to first deployment when WZRD is unavailable
  • Added test suite coverage for scoring order, confidence behavior, matching paths, async routing, caching behavior, register helper, and payload contract guard

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

litellm_wzrd_momentum-0.2.2.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

litellm_wzrd_momentum-0.2.2-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file litellm_wzrd_momentum-0.2.2.tar.gz.

File metadata

  • Download URL: litellm_wzrd_momentum-0.2.2.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for litellm_wzrd_momentum-0.2.2.tar.gz
Algorithm Hash digest
SHA256 8bbe7ed07041a0265adfb75d8a1773f3a163a54facd3feeec45348304b6564b1
MD5 d3b182926802f75a4b0f19dca9a881d8
BLAKE2b-256 4f179cc8ed06166536b8411567a3fbabc5b5645c8526a0496b9b97f309dde302

See more details on using hashes here.

File details

Details for the file litellm_wzrd_momentum-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for litellm_wzrd_momentum-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ff74ca7c2552ba757f60cb2fe17d50e0d09c2a548d972330d76a3125237855a9
MD5 97d9cd260f2040e08a0d02fdf7461ae1
BLAKE2b-256 0f015a94f88800f7abac05e8d02f77b5f76a3a9b841e2520c2ad73c467b757e4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page