vLLM Semantic Router - Intelligent routing for Mixture-of-Models
Project description
vLLM Semantic Router
Intelligent Router for Mixture-of-Models (MoM).
GitHub: https://github.com/vllm-project/semantic-router
Quick Start
Installation
# Install from PyPI
pip install vllm-sr
# Or install from source (development)
cd src/vllm-sr
pip install -e .
Usage
# Initialize vLLM Semantic Router Configuration
vllm-sr init
# Start the router (includes dashboard)
vllm-sr serve
# Open dashboard in browser
vllm-sr dashboard
# View logs
vllm-sr logs router
vllm-sr logs envoy
vllm-sr logs dashboard
# Check status
vllm-sr status
# Stop
vllm-sr stop
Features
- Router: Intelligent request routing based on intent classification
- Envoy Proxy: High-performance proxy with ext_proc integration
- Dashboard: Web UI for monitoring and testing (http://localhost:8700)
- Metrics: Prometheus metrics endpoint (http://localhost:9190/metrics)
Endpoints
After running vllm-sr serve, the following endpoints are available:
| Endpoint | Port | Description |
|---|---|---|
| Dashboard | 8700 | Web UI for monitoring and Playground |
| API | 8888* | Chat completions API (configurable in config.yaml) |
| Metrics | 9190 | Prometheus metrics |
| gRPC | 50051 | Router gRPC (internal) |
*Default port, configurable via listeners in config.yaml
Configuration
File Descriptor Limits
The CLI automatically sets file descriptor limits to 65,536 for Envoy proxy. To customize:
export VLLM_SR_NOFILE_LIMIT=100000 # Optional (min: 8192)
vllm-sr serve
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vllm_sr-0.1.0b2.dev20260120053448.tar.gz.
File metadata
- Download URL: vllm_sr-0.1.0b2.dev20260120053448.tar.gz
- Upload date:
- Size: 36.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
caabc7343f784b0ea0583818373a5091ebb2e0f036f2019f7b719403d9071ceb
|
|
| MD5 |
e73bc1571dcb266539e71cf9fa365a54
|
|
| BLAKE2b-256 |
b4f0ca14c49844acd1f0b77b06e6e2ec45563fa83e22f5389a598344079a3b50
|
File details
Details for the file vllm_sr-0.1.0b2.dev20260120053448-py3-none-any.whl.
File metadata
- Download URL: vllm_sr-0.1.0b2.dev20260120053448-py3-none-any.whl
- Upload date:
- Size: 44.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8dba99214e9a5b8d62ed6c01da0ac9e87e6ff906ab080579f3bbac3a2b7d4a5
|
|
| MD5 |
ea43086acf4c9398a5e32c0c14dfdf1c
|
|
| BLAKE2b-256 |
783834c0ec9318ffd6fd238cfcc1c25c2902ee4ee6c19ccd5a00efb052502fb3
|