Skip to main content

SGLang router is a standalone module implemented in Rust to achieve data parallelism across SGLang instances.

Project description

SGLang Router

SGLang router is a standalone Rust module that enables data parallelism across SGLang instances, providing high-performance request routing and advanced load balancing. The router supports multiple load balancing algorithms including cache-aware, power of two, random, and round robin, and acts as a specialized load balancer for prefill-decode disaggregated serving architectures.

Documentation

Quick Start

Prerequisites

Rust and Cargo:

# Install rustup (Rust installer and version manager)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Follow the installation prompts, then reload your shell
source $HOME/.cargo/env

# Verify installation
rustc --version
cargo --version

Python with pip installed

Installation

Option A: Build and Install Wheel (Recommended)

# Install build dependencies
pip install setuptools-rust wheel build

# Build the wheel package
python -m build

# Install the generated wheel
pip install dist/*.whl

# One-liner for development (rebuild + install)
python -m build && pip install --force-reinstall dist/*.whl

Option B: Development Mode

pip install -e .

⚠️ Warning: Editable installs may suffer performance degradation. Use wheel builds for performance testing.

Basic Usage

# Build Rust components
cargo build

# Launch router with worker URLs
python -m sglang_router.launch_router \
    --worker-urls http://worker1:8000 http://worker2:8000

Configuration

Logging

Enable structured logging with optional file output:

from sglang_router import Router

# Console logging (default)
router = Router(worker_urls=["http://worker1:8000", "http://worker2:8000"])

# File logging enabled
router = Router(
    worker_urls=["http://worker1:8000", "http://worker2:8000"],
    log_dir="./logs"  # Daily log files created here
)

Set log level with --log-level flag (documentation).

Metrics

Prometheus metrics endpoint available at 127.0.0.1:29000 by default.

# Custom metrics configuration
python -m sglang_router.launch_router \
    --worker-urls http://localhost:8080 http://localhost:8081 \
    --prometheus-host 0.0.0.0 \
    --prometheus-port 9000

Advanced Features

Kubernetes Service Discovery

Automatic worker discovery and management in Kubernetes environments.

Basic Service Discovery

python -m sglang_router.launch_router \
    --service-discovery \
    --selector app=sglang-worker role=inference \
    --service-discovery-namespace default

PD (Prefill-Decode) Mode

For disaggregated prefill/decode routing:

python -m sglang_router.launch_router \
    --pd-disaggregation \
    --policy cache_aware \
    --service-discovery \
    --prefill-selector app=sglang component=prefill \
    --decode-selector app=sglang component=decode \
    --service-discovery-namespace sglang-system

Kubernetes Pod Configuration

Prefill Server Pod:

apiVersion: v1
kind: Pod
metadata:
  name: sglang-prefill-1
  labels:
    app: sglang
    component: prefill
  annotations:
    sglang.ai/bootstrap-port: "9001"  # Optional: Bootstrap port
spec:
  containers:
  - name: sglang
    image: lmsys/sglang:latest
    ports:
    - containerPort: 8000  # Main API port
    - containerPort: 9001  # Optional: Bootstrap port

Decode Server Pod:

apiVersion: v1
kind: Pod
metadata:
  name: sglang-decode-1
  labels:
    app: sglang
    component: decode
spec:
  containers:
  - name: sglang
    image: lmsys/sglang:latest
    ports:
    - containerPort: 8000

RBAC Configuration

Namespace-scoped (recommended):

apiVersion: v1
kind: ServiceAccount
metadata:
  name: sglang-router
  namespace: sglang-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: sglang-system
  name: sglang-router
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: sglang-router
  namespace: sglang-system
subjects:
- kind: ServiceAccount
  name: sglang-router
  namespace: sglang-system
roleRef:
  kind: Role
  name: sglang-router
  apiGroup: rbac.authorization.k8s.io

Complete PD Example

python -m sglang_router.launch_router \
    --pd-disaggregation \
    --policy cache_aware \
    --service-discovery \
    --prefill-selector app=sglang component=prefill environment=production \
    --decode-selector app=sglang component=decode environment=production \
    --service-discovery-namespace production \
    --host 0.0.0.0 \
    --port 8080 \
    --prometheus-host 0.0.0.0 \
    --prometheus-port 9090

Command Line Arguments Reference

Service Discovery

  • --service-discovery: Enable Kubernetes service discovery
  • --service-discovery-port: Port for worker URLs (default: 8000)
  • --service-discovery-namespace: Kubernetes namespace to watch
  • --selector: Label selectors for regular mode (format: key1=value1 key2=value2)

PD Mode

  • --pd-disaggregation: Enable Prefill-Decode disaggregated mode
  • --prefill: Initial prefill server (format: URL BOOTSTRAP_PORT)
  • --decode: Initial decode server URL
  • --prefill-selector: Label selector for prefill pods
  • --decode-selector: Label selector for decode pods
  • --policy: Routing policy (cache_aware, random, power_of_two)

Development

Build Process

# Build Rust project
cargo build

# Build Python binding (see Installation section above)

Note: When modifying Rust code, you must rebuild the wheel for changes to take effect.

Troubleshooting

VSCode Rust Analyzer Issues: Set rust-analyzer.linkedProjects to the absolute path of Cargo.toml:

{
  "rust-analyzer.linkedProjects": ["/workspaces/sglang/sgl-router/Cargo.toml"]
}

CI/CD Pipeline

The continuous integration pipeline includes comprehensive testing, benchmarking, and publishing:

Build & Test

  1. Build Wheels: Uses cibuildwheel for manylinux x86_64 packages
  2. Build Source Distribution: Creates source distribution for pip fallback
  3. Rust HTTP Server Benchmarking: Performance testing of router overhead
  4. Basic Inference Testing: End-to-end validation through the router
  5. PD Disaggregation Testing: Benchmark and sanity checks for prefill-decode load balancing

Publishing

  • PyPI Publishing: Wheels and source distributions are published only when the version changes in pyproject.toml
  • Container Images: Docker images published using /docker/Dockerfile.router

Features

  • High Performance: Rust-based routing with connection pooling and optimized request handling
  • Advanced Load Balancing: Multiple algorithms including:
    • Cache-Aware: Intelligent routing based on cache locality for optimal performance
    • Power of Two: Chooses the less loaded of two randomly selected workers
    • Random: Distributes requests randomly across available workers
    • Round Robin: Sequential distribution across workers in rotation
  • Prefill-Decode Disaggregation: Specialized load balancing for separated prefill and decode servers
  • Service Discovery: Automatic Kubernetes worker discovery and health management
  • Monitoring: Comprehensive Prometheus metrics and structured logging
  • Scalability: Handles thousands of concurrent connections with efficient resource utilization

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sglang_router-0.1.5.tar.gz (64.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sglang_router-0.1.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

sglang_router-0.1.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

sglang_router-0.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

sglang_router-0.1.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.3 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

sglang_router-0.1.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.3 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file sglang_router-0.1.5.tar.gz.

File metadata

  • Download URL: sglang_router-0.1.5.tar.gz
  • Upload date:
  • Size: 64.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for sglang_router-0.1.5.tar.gz
Algorithm Hash digest
SHA256 682ac906ab901c71d74e2f9c4bbc9fbed70d7930dc3c836bb44fe8a89b6584f7
MD5 a84c57ff7932dbc74a7fa3285ef1296c
BLAKE2b-256 37b574a97222ee40edb40014947e8dd94a4dacd513ed1312361a0dfcf4b8c72f

See more details on using hashes here.

File details

Details for the file sglang_router-0.1.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sglang_router-0.1.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 28592b846585545fbc7433e1de009cb3c1a4c5653564cc5d4fae18811c13ac88
MD5 86e3b91a6a38fb04d1b94bec4397c8d7
BLAKE2b-256 f0b97034df26c75bd9037eed374f8231704d299ecd83f905652441e35bc6d137

See more details on using hashes here.

File details

Details for the file sglang_router-0.1.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sglang_router-0.1.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0a2fed4598e70f3df13b3397a449aff54b2c7635aaec3f65ec933bc3aa99fb92
MD5 fc0eb1737d3d0e7c335cafcef58299e0
BLAKE2b-256 9b42ad9055f57c3a019b2b16e5dca852ea02ad559816e304370555a30d81b64f

See more details on using hashes here.

File details

Details for the file sglang_router-0.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sglang_router-0.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7db2feb32331bed04ba812a431b4eb2b13aa85bd819411186f755134f077f942
MD5 8544405438dd94b202603c0cee921830
BLAKE2b-256 9892249ffd4c690a29664ca274742738cc8c08120ace9185cd0962fc645a0090

See more details on using hashes here.

File details

Details for the file sglang_router-0.1.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sglang_router-0.1.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fd9318aff8d43e8e8629c2cd99a1eec6e3cd41bdcbb2a435dbd2b1c2ccb4959b
MD5 63935b12f627db9795117bf15de9a0e4
BLAKE2b-256 14e3c52b18bfd6c1dd4ea6a31cdf8a59250d5413f9249df0da9704d489ff64ce

See more details on using hashes here.

File details

Details for the file sglang_router-0.1.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sglang_router-0.1.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6c0205438316aed63a1865430b622ff2943999a4dcfa4296b37e0faa6f36b38d
MD5 b892faaf1d6d69035d920655c5bb03a4
BLAKE2b-256 3b75aa750f195377b25e69403a64a5efb38fe119634a1f94052aa50564cbb6ff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page