Plug-and-play Mixture-of-Models router. Route cheap requests to cheap models, expensive ones to specialists, track real cost savings.

These details have not been verified by PyPI

Project links

Project description

npc-mom-router

Plug-and-play Mixture-of-Models router. Route cheap requests to cheap models, expensive ones to specialists, and track real cost savings.

Why Mixture-of-Models routing?

Most LLM workloads are not uniformly hard. Simple lookups, format conversions, and short factual questions can be answered accurately by a small, fast model at a fraction of the cost of a frontier model. Routing requests intelligently based on complexity lets you serve the same quality of answers at significantly lower cost—without changing your application's interface.

npc-mom-router sits between your application and your model backends. It classifies each incoming request as fast or heavy, dispatches to the appropriate backend, and records the token usage and dollar cost of every call. The ledger computes how much you saved compared to always routing to the heavy backend, so you can quantify the benefit in real dollars.

Install

pip install npc-mom-router

30-second quickstart

from npc_mom_router import MoMClient, BackendConfig, ZeroShotRouter

router = ZeroShotRouter(
    base_url="https://api.groq.com/openai/v1",
    api_key="YOUR_GROQ_KEY",
    model="llama-3.1-8b-instant",
)

client = MoMClient(
    router=router,
    backends={
        "fast": BackendConfig(
            kind="oai_compat",
            base_url="https://api.groq.com/openai/v1",
            api_key="YOUR_GROQ_KEY",
            model="llama-3.3-70b-versatile",
            cost_per_1m_input=0.59,
            cost_per_1m_output=0.79,
        ),
        "heavy": BackendConfig(
            kind="anthropic",
            api_key="YOUR_ANTHROPIC_KEY",
            model="claude-sonnet-4-5",
            cost_per_1m_input=3.0,
            cost_per_1m_output=15.0,
        ),
    },
)

resp = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "What's the capital of France?"}],
)
print(f"Route: {resp._mom.route} ({resp._mom.reason})")
print(f"Answer: {resp.choices[0].message.content}")
print(f"Cost: ${resp._mom.cost_usd:.6f}")

NPC Fast router (local vLLM)

Run a tiny routing model locally for zero-latency, zero-cost classification:

from npc_mom_router import MoMClient, BackendConfig, NPCFastRouter

router = NPCFastRouter(
    base_url="http://localhost:8001/v1",
    model="npc-fast-1.7b",
)

client = MoMClient(
    router=router,
    backends={
        "fast": BackendConfig(
            kind="vllm",
            base_url="http://localhost:8000/v1",
            api_key="placeholder",
            model="Qwen/Qwen2.5-7B-Instruct",
            cost_per_1m_input=0.05,
            cost_per_1m_output=0.10,
        ),
        "heavy": BackendConfig(
            kind="openai",
            api_key="YOUR_OPENAI_KEY",
            model="gpt-4o",
            cost_per_1m_input=2.50,
            cost_per_1m_output=10.00,
        ),
    },
)

result = client.route_and_complete(
    [{"role": "user", "content": "Explain the transformer architecture in depth."}]
)
print(result.decision.route, result.cost_entry.usd)

Async client

import asyncio
from npc_mom_router import AsyncMoMClient, BackendConfig, ZeroShotRouter

# ... same setup as above, just use AsyncMoMClient ...

async def main():
    resp = await client.chat.completions.create(
        model="auto",
        messages=[{"role": "user", "content": "List the G7 countries."}],
    )
    print(resp._mom.route, resp._mom.cost_usd)

asyncio.run(main())

Cost tracking

Every request is logged to an in-memory ledger. The ledger re-prices fast-routed requests as if they had hit the heavy backend to compute counterfactual savings.

s = client.ledger.summary()
# {
#   "total_requests": 100,
#   "fast_requests": 73,
#   "heavy_requests": 27,
#   "total_cost_usd": 0.0412,
#   "savings_vs_always_heavy_usd": 0.3891
# }

client.ledger.dump("ledger.json")  # writes full per-request JSON

Backend reference

`kind`	Description	Default base URL
`oai_compat`	Any OpenAI-compatible API	Required
`openai`	OpenAI (api.openai.com)	`https://api.openai.com/v1`
`anthropic`	Anthropic (native SDK)	`https://api.anthropic.com`
`groq`	Groq (OAI-compat)	`https://api.groq.com/openai/v1`
`vllm`	Local vLLM server (OAI-compat)	`http://localhost:8000/v1`

Each BackendConfig takes cost_per_1m_input and cost_per_1m_output (USD) for cost tracking.

Router reference

Router	How it works
`ZeroShotRouter`	Prompts any OAI-compat model; parses JSON response
`NPCFastRouter`	Calls a local vLLM endpoint; sub-10ms routing

Both routers return RoutingDecision(route="fast"|"heavy", reason="..."). On any failure or malformed response, they fall back to heavy to preserve correctness.

Custom routers: implement route(messages) -> RoutingDecision and async_route(messages) -> RoutingDecision.

Roadmap

Support for streaming responses, per-model latency tracking, a pluggable cost-model registry, and a simple CLI dashboard are planned for v0.2. Pull requests welcome.

License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

npc_mom_router-0.1.0.tar.gz (16.6 kB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

npc_mom_router-0.1.0-py3-none-any.whl (19.5 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file npc_mom_router-0.1.0.tar.gz.

File metadata

Download URL: npc_mom_router-0.1.0.tar.gz
Upload date: May 6, 2026
Size: 16.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.9 {"installer":{"name":"uv","version":"0.11.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for npc_mom_router-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`41cc18fca4e8f4b682898fd9f5712f65ce042edd0049cde4624fd095a91de7ce`
MD5	`5661ecb4f9a0725488b1aafd872c3353`
BLAKE2b-256	`c69196f56712afbfa338a78b79793ed00992113c0c92cca11ed97243037746ed`

See more details on using hashes here.

File details

Details for the file npc_mom_router-0.1.0-py3-none-any.whl.

File metadata

Download URL: npc_mom_router-0.1.0-py3-none-any.whl
Upload date: May 6, 2026
Size: 19.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.9 {"installer":{"name":"uv","version":"0.11.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for npc_mom_router-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5799555f51d4f5929cc265df24be54f26a9f3ddef19ebb51f14cb73569fcdbfe`
MD5	`124313a63a9f07913fbc9ef75e1ff589`
BLAKE2b-256	`c830b17f1e77ea861135a40ffc8270876a15382bc6caf3a63e0971e75e703799`

See more details on using hashes here.

npc-mom-router 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

npc-mom-router

Why Mixture-of-Models routing?

Install

30-second quickstart

NPC Fast router (local vLLM)

Async client

Cost tracking

Backend reference

Router reference

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes