SGLang router is a standalone module implemented in Rust to achieve data parallelism across SGLang instances.
Project description
SGLang Router
SGLang router is a standalone Rust module that enables data parallelism across SGLang instances, providing high-performance request routing and advanced load balancing. The router supports multiple load balancing algorithms including cache-aware, power of two, random, and round robin, and acts as a specialized load balancer for prefill-decode disaggregated serving architectures.
Documentation
- User Guide: docs.sglang.ai/router/router.html
Quick Start
Prerequisites
Rust and Cargo:
# Install rustup (Rust installer and version manager)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Follow the installation prompts, then reload your shell
source $HOME/.cargo/env
# Verify installation
rustc --version
cargo --version
Python with pip installed
Installation
Option A: Build and Install Wheel (Recommended)
# Install build dependencies
pip install setuptools-rust wheel build
# Build the wheel package
python -m build
# Install the generated wheel
pip install dist/*.whl
# One-liner for development (rebuild + install)
python -m build && pip install --force-reinstall dist/*.whl
Option B: Development Mode
pip install -e .
⚠️ Warning: Editable installs may suffer performance degradation. Use wheel builds for performance testing.
Basic Usage
# Build Rust components
cargo build
# Launch router with worker URLs
python -m sglang_router.launch_router \
--worker-urls http://worker1:8000 http://worker2:8000
Configuration
Logging
Enable structured logging with optional file output:
from sglang_router import Router
# Console logging (default)
router = Router(worker_urls=["http://worker1:8000", "http://worker2:8000"])
# File logging enabled
router = Router(
worker_urls=["http://worker1:8000", "http://worker2:8000"],
log_dir="./logs" # Daily log files created here
)
Set log level with --log-level flag (documentation).
Metrics
Prometheus metrics endpoint available at 127.0.0.1:29000 by default.
# Custom metrics configuration
python -m sglang_router.launch_router \
--worker-urls http://localhost:8080 http://localhost:8081 \
--prometheus-host 0.0.0.0 \
--prometheus-port 9000
Advanced Features
Kubernetes Service Discovery
Automatic worker discovery and management in Kubernetes environments.
Basic Service Discovery
python -m sglang_router.launch_router \
--service-discovery \
--selector app=sglang-worker role=inference \
--service-discovery-namespace default
PD (Prefill-Decode) Mode
For disaggregated prefill/decode routing:
python -m sglang_router.launch_router \
--pd-disaggregation \
--policy cache_aware \
--service-discovery \
--prefill-selector app=sglang component=prefill \
--decode-selector app=sglang component=decode \
--service-discovery-namespace sglang-system
Kubernetes Pod Configuration
Prefill Server Pod:
apiVersion: v1
kind: Pod
metadata:
name: sglang-prefill-1
labels:
app: sglang
component: prefill
annotations:
sglang.ai/bootstrap-port: "9001" # Optional: Bootstrap port
spec:
containers:
- name: sglang
image: lmsys/sglang:latest
ports:
- containerPort: 8000 # Main API port
- containerPort: 9001 # Optional: Bootstrap port
Decode Server Pod:
apiVersion: v1
kind: Pod
metadata:
name: sglang-decode-1
labels:
app: sglang
component: decode
spec:
containers:
- name: sglang
image: lmsys/sglang:latest
ports:
- containerPort: 8000
RBAC Configuration
Namespace-scoped (recommended):
apiVersion: v1
kind: ServiceAccount
metadata:
name: sglang-router
namespace: sglang-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: sglang-system
name: sglang-router
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: sglang-router
namespace: sglang-system
subjects:
- kind: ServiceAccount
name: sglang-router
namespace: sglang-system
roleRef:
kind: Role
name: sglang-router
apiGroup: rbac.authorization.k8s.io
Complete PD Example
python -m sglang_router.launch_router \
--pd-disaggregation \
--policy cache_aware \
--service-discovery \
--prefill-selector app=sglang component=prefill environment=production \
--decode-selector app=sglang component=decode environment=production \
--service-discovery-namespace production \
--host 0.0.0.0 \
--port 8080 \
--prometheus-host 0.0.0.0 \
--prometheus-port 9090
Command Line Arguments Reference
Service Discovery
--service-discovery: Enable Kubernetes service discovery--service-discovery-port: Port for worker URLs (default: 8000)--service-discovery-namespace: Kubernetes namespace to watch--selector: Label selectors for regular mode (format:key1=value1 key2=value2)
PD Mode
--pd-disaggregation: Enable Prefill-Decode disaggregated mode--prefill: Initial prefill server (format:URL BOOTSTRAP_PORT)--decode: Initial decode server URL--prefill-selector: Label selector for prefill pods--decode-selector: Label selector for decode pods--policy: Routing policy (cache_aware,random,power_of_two)
Development
Build Process
# Build Rust project
cargo build
# Build Python binding (see Installation section above)
Note: When modifying Rust code, you must rebuild the wheel for changes to take effect.
Troubleshooting
VSCode Rust Analyzer Issues:
Set rust-analyzer.linkedProjects to the absolute path of Cargo.toml:
{
"rust-analyzer.linkedProjects": ["/workspaces/sglang/sgl-router/Cargo.toml"]
}
CI/CD Pipeline
The continuous integration pipeline includes comprehensive testing, benchmarking, and publishing:
Build & Test
- Build Wheels: Uses
cibuildwheelfor manylinux x86_64 packages - Build Source Distribution: Creates source distribution for pip fallback
- Rust HTTP Server Benchmarking: Performance testing of router overhead
- Basic Inference Testing: End-to-end validation through the router
- PD Disaggregation Testing: Benchmark and sanity checks for prefill-decode load balancing
Publishing
- PyPI Publishing: Wheels and source distributions are published only when the version changes in
pyproject.toml - Container Images: Docker images published using
/docker/Dockerfile.router
Features
- High Performance: Rust-based routing with connection pooling and optimized request handling
- Advanced Load Balancing: Multiple algorithms including:
- Cache-Aware: Intelligent routing based on cache locality for optimal performance
- Power of Two: Chooses the less loaded of two randomly selected workers
- Random: Distributes requests randomly across available workers
- Round Robin: Sequential distribution across workers in rotation
- Prefill-Decode Disaggregation: Specialized load balancing for separated prefill and decode servers
- Service Discovery: Automatic Kubernetes worker discovery and health management
- Monitoring: Comprehensive Prometheus metrics and structured logging
- Scalability: Handles thousands of concurrent connections with efficient resource utilization
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sglang_router-0.1.5.tar.gz.
File metadata
- Download URL: sglang_router-0.1.5.tar.gz
- Upload date:
- Size: 64.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
682ac906ab901c71d74e2f9c4bbc9fbed70d7930dc3c836bb44fe8a89b6584f7
|
|
| MD5 |
a84c57ff7932dbc74a7fa3285ef1296c
|
|
| BLAKE2b-256 |
37b574a97222ee40edb40014947e8dd94a4dacd513ed1312361a0dfcf4b8c72f
|
File details
Details for the file sglang_router-0.1.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: sglang_router-0.1.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 10.3 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28592b846585545fbc7433e1de009cb3c1a4c5653564cc5d4fae18811c13ac88
|
|
| MD5 |
86e3b91a6a38fb04d1b94bec4397c8d7
|
|
| BLAKE2b-256 |
f0b97034df26c75bd9037eed374f8231704d299ecd83f905652441e35bc6d137
|
File details
Details for the file sglang_router-0.1.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: sglang_router-0.1.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 10.3 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a2fed4598e70f3df13b3397a449aff54b2c7635aaec3f65ec933bc3aa99fb92
|
|
| MD5 |
fc0eb1737d3d0e7c335cafcef58299e0
|
|
| BLAKE2b-256 |
9b42ad9055f57c3a019b2b16e5dca852ea02ad559816e304370555a30d81b64f
|
File details
Details for the file sglang_router-0.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: sglang_router-0.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 10.3 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7db2feb32331bed04ba812a431b4eb2b13aa85bd819411186f755134f077f942
|
|
| MD5 |
8544405438dd94b202603c0cee921830
|
|
| BLAKE2b-256 |
9892249ffd4c690a29664ca274742738cc8c08120ace9185cd0962fc645a0090
|
File details
Details for the file sglang_router-0.1.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: sglang_router-0.1.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 10.3 MB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd9318aff8d43e8e8629c2cd99a1eec6e3cd41bdcbb2a435dbd2b1c2ccb4959b
|
|
| MD5 |
63935b12f627db9795117bf15de9a0e4
|
|
| BLAKE2b-256 |
14e3c52b18bfd6c1dd4ea6a31cdf8a59250d5413f9249df0da9704d489ff64ce
|
File details
Details for the file sglang_router-0.1.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: sglang_router-0.1.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 10.3 MB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c0205438316aed63a1865430b622ff2943999a4dcfa4296b37e0faa6f36b38d
|
|
| MD5 |
b892faaf1d6d69035d920655c5bb03a4
|
|
| BLAKE2b-256 |
3b75aa750f195377b25e69403a64a5efb38fe119634a1f94052aa50564cbb6ff
|