nim-model-router

OpenAI-compatible proxy that routes requests to the best NVIDIA NIM model by task

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

cobusgreyling

These details have not been verified by PyPI

Project description

NIM Model Router

OpenAI-compatible proxy that routes requests to the best NVIDIA NIM model by task.

NVIDIA's NIM catalog has 100+ models. Picking the right one for each request is tedious. This router sits in front of the NIM API and automatically selects a model based on what you're asking for — fast chat, agentic tool use, deep reasoning, coding, embeddings, reranking, and more.

Drop it into any OpenAI SDK client by changing base_url. No other code changes required.

Quick start

git clone https://github.com/cobusgreyling/nim-model-router.git
cd nim-model-router
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

cp .env.example .env
# Edit .env and set NVIDIA_API_KEY

nim-router serve

The proxy listens on http://127.0.0.1:8080. API docs: http://127.0.0.1:8080/docs

Docker

cp .env.example .env  # set NVIDIA_API_KEY
docker compose up --build

Usage

Auto-routing (recommended)

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:8080/v1",
    api_key="local",  # not used — router injects NVIDIA_API_KEY upstream
)

response = client.chat.completions.create(
    model="nim-router/auto",
    messages=[{"role": "user", "content": "Build a Python agent with tool calling"}],
)
print(response.choices[0].message.content)

Explicit task aliases

Model alias	Task	Default NIM model
`nim-router/auto`	Classify automatically	—
`nim-router/fast`	Short Q&A, classification	`meta/llama-3.1-8b-instruct`
`nim-router/general`	Ambiguous general chat	`nvidia/nemotron-3-nano-30b-a3b`
`nim-router/agentic`	Tool use, agents	`nvidia/nemotron-3-super-120b-a12b`
`nim-router/reasoning`	Deep analysis	`nvidia/nemotron-3-ultra-550b-a55b`
`nim-router/long-context`	Large documents	`nvidia/nemotron-3-super-120b-a12b`
`nim-router/coding`	Code generation	`nvidia/llama-3.3-nemotron-super-49b-v1.5`
`nim-router/embedding`	Embeddings	`nvidia/llama-nemotron-embed-1b-v2`
`nim-router/rerank`	Reranking	`nvidia/llama-nemotron-rerank-1b-v2`

Force a task via header

curl http://127.0.0.1:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-NIM-Task: reasoning" \
  -d '{
    "model": "nim-router/auto",
    "messages": [{"role": "user", "content": "Analyze the root cause step by step"}]
  }'

Rerank

curl http://127.0.0.1:8080/v1/rerank \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nim-router/rerank",
    "query": "What is NVIDIA NIM?",
    "documents": ["NIM provides optimized inference...", "Unrelated text"],
    "top_n": 2
  }'

Passthrough to a specific NIM model

If you pass a concrete NIM model ID (e.g. meta/llama-3.1-70b-instruct), the router forwards it unchanged.

How routing works

flowchart TD
    A[Client request] --> B{Model alias?}
    B -->|nim-router/*| C[Resolve task]
    B -->|concrete NIM ID| D[Passthrough]
    B -->|nim-router/auto| E[Classifier]
    A --> F{X-NIM-Task header?}
    F -->|set| C
    E --> G{Signals}
    G -->|tools| H[agentic]
    G -->|large prompt| I[long_context]
    G -->|keywords| J[coding / reasoning / rerank]
    G -->|short prompt| K[fast]
    G -->|ambiguous| L[general]
    C --> M[Apply policies]
    M --> N[Pick model + fallbacks]
    N --> O[Proxy to NIM API]
    O -->|5xx| P[Try fallback model]

Classifier signals:

Tools present → agentic
Large prompt (>12k estimated tokens, tiktoken) → long_context
Reasoning keywords → reasoning
Coding keywords → coding
Rerank keywords / query+documents → rerank
Short prompt (≤120 chars) → fast
Ambiguous → general (not ultra-expensive agentic)

Policies can downgrade ultra models for short prompts and route low-confidence requests to general.

CLI

# Start proxy
nim-router serve --port 8080 --config src/nim_model_router/models.yaml

# Dry-run routing (no API call)
nim-router route "refactor this Python function" --json
nim-router route "hello"
nim-router route "plan a multi-step agent" --tools

# Show registry
nim-router models

# Sync model suggestions from NIM catalog
nim-router catalog-sync --task coding

# Print OpenAI SDK example
nim-router client-example

Observability

Every proxied response includes routing metadata:

Header	Example
`X-NIM-Routed-Task`	`agentic`
`X-NIM-Routed-Model`	`nvidia/nemotron-3-super-120b-a12b`
`X-NIM-Router-Reason`	`request includes tool definitions`
`X-NIM-Router-Confidence`	`0.950`

# Live stats
curl http://127.0.0.1:8080/v1/router/stats

# Task registry + cost summary
curl http://127.0.0.1:8080/v1/router/tasks

# Reload config without restart
curl -X POST http://127.0.0.1:8080/v1/router/reload

# Prometheus metrics
curl http://127.0.0.1:8080/metrics

# Dry-run endpoint
curl -X POST http://127.0.0.1:8080/v1/router/dry-run \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"debug my rust code"}]}'

Set ROUTER_LOG_PATH=data/router.log.jsonl to persist request logs.

Environment variables

Variable	Default	Description
`NVIDIA_API_KEY`	—	Required upstream API key
`NIM_BASE_URL`	`https://integrate.api.nvidia.com/v1`	NIM API base URL
`ROUTER_HOST`	`127.0.0.1`	Proxy bind host
`ROUTER_PORT`	`8080`	Proxy bind port
`ROUTER_CONFIG`	bundled `models.yaml`	Custom registry path
`ROUTER_LOG_PATH`	—	JSONL log file path
`ROUTER_API_KEY`	—	Optional client auth key
`UPSTREAM_MAX_RETRIES`	`3`	Retries for 429/5xx
`UPSTREAM_RETRY_BACKOFF_SECONDS`	`0.5`	Retry backoff base
`ENABLE_PROMETHEUS`	`true`	Expose `/metrics`
`HEALTH_CHECK_UPSTREAM`	`false`	Include upstream status in `/health`
`MAX_REQUEST_BODY_BYTES`	`10485760`	Max request body size

Customizing models

Edit src/nim_model_router/models.yaml (or set ROUTER_CONFIG):

tasks:
  agentic:
    model: nvidia/nemotron-3-nano-30b-a3b
    fallbacks:
      - general
      - fast
    extra_body:
      enable_thinking: true
      reasoning_budget: 2048
    ab_test:
      enabled: false
      variants:
        - model: nvidia/nemotron-3-nano-30b-a3b
          weight: 50
        - model: nvidia/nemotron-3-super-120b-a12b
          weight: 50

classifier:
  use_llm_classifier: false  # set true + pip install ".[llm-classifier]"

Reload at runtime: curl -X POST http://127.0.0.1:8080/v1/router/reload

Integrations

LiteLLM

model_list:
  - model_name: nim-auto
    litellm_params:
      model: openai/nim-router/auto
      api_base: http://127.0.0.1:8080/v1
      api_key: local

LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="http://127.0.0.1:8080/v1",
    api_key="local",
    model="nim-router/auto",
)

Continue / Cursor (OpenAI-compatible)

{
  "models": [{
    "title": "NIM Auto",
    "provider": "openai",
    "model": "nim-router/auto",
    "apiBase": "http://127.0.0.1:8080/v1",
    "apiKey": "local"
  }]
}

Development

pip install -e ".[dev]"
pytest --cov=nim_model_router --cov-report=term-missing
ruff check src tests
ruff format src tests

See CONTRIBUTING.md.

Security

Store NVIDIA_API_KEY in .env — never commit it.
Set ROUTER_API_KEY before exposing the proxy beyond localhost.
Bind to 127.0.0.1 by default. Use Docker/reverse proxy auth for production.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

cobusgreyling

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.0

Jun 10, 2026

This version

0.2.0

Jun 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nim_model_router-0.2.0.tar.gz (25.1 kB view details)

Uploaded Jun 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nim_model_router-0.2.0-py3-none-any.whl (27.8 kB view details)

Uploaded Jun 10, 2026 Python 3

File details

Details for the file nim_model_router-0.2.0.tar.gz.

File metadata

Download URL: nim_model_router-0.2.0.tar.gz
Upload date: Jun 10, 2026
Size: 25.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nim_model_router-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`8c46fa0a5e9ac20147f4733236c715ec33b18664960b1fdf3c56ff442022b21a`
MD5	`bb413133dd41315f3d587fa19b8233bf`
BLAKE2b-256	`c75c2bb6b5c849509dd4ee71bf7e802c8f1295d6b91c0d65dbf9c23c75dfc59d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nim_model_router-0.2.0.tar.gz:

Publisher: publish.yml on cobusgreyling/nim-model-router

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nim_model_router-0.2.0.tar.gz
- Subject digest: 8c46fa0a5e9ac20147f4733236c715ec33b18664960b1fdf3c56ff442022b21a
- Sigstore transparency entry: 1776564141
- Sigstore integration time: Jun 10, 2026
Source repository:
- Permalink: cobusgreyling/nim-model-router@45028ec01957b1064e4ac59b8ac9ebf97abbdb15
- Branch / Tag: refs/heads/main
- Owner: https://github.com/cobusgreyling
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@45028ec01957b1064e4ac59b8ac9ebf97abbdb15
- Trigger Event: workflow_dispatch

File details

Details for the file nim_model_router-0.2.0-py3-none-any.whl.

File metadata

Download URL: nim_model_router-0.2.0-py3-none-any.whl
Upload date: Jun 10, 2026
Size: 27.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nim_model_router-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e09532c95798aed7d4f5352a5244426c514bfa4186143c5671b87a9cddde29cc`
MD5	`db414530026bd9d5efe4939c047cc323`
BLAKE2b-256	`883755b5e58d665139913a45b43bdbc8d390a988c36cdabf66b8aa60251d711c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nim_model_router-0.2.0-py3-none-any.whl:

Publisher: publish.yml on cobusgreyling/nim-model-router

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nim_model_router-0.2.0-py3-none-any.whl
- Subject digest: e09532c95798aed7d4f5352a5244426c514bfa4186143c5671b87a9cddde29cc
- Sigstore transparency entry: 1776564252
- Sigstore integration time: Jun 10, 2026
Source repository:
- Permalink: cobusgreyling/nim-model-router@45028ec01957b1064e4ac59b8ac9ebf97abbdb15
- Branch / Tag: refs/heads/main
- Owner: https://github.com/cobusgreyling
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@45028ec01957b1064e4ac59b8ac9ebf97abbdb15
- Trigger Event: workflow_dispatch

nim-model-router 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

NIM Model Router

Quick start

Docker

Usage

Auto-routing (recommended)

Explicit task aliases

Force a task via header

Rerank

Passthrough to a specific NIM model

How routing works

CLI

Observability

Environment variables

Customizing models

Integrations

LiteLLM

LangChain

Continue / Cursor (OpenAI-compatible)

Development

Security

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance