Skip to main content

SGLang multiplexer with an OpenAI-compatible frontend

Project description

sglangmux

sglangmux is a lightweight Rust multiplexer for running multiple SGLang model servers behind one OpenAI-compatible frontend.

It provides:

  • one frontend endpoint for chat/completions
  • automatic model activation/switching based on the request model
  • OpenAI-style /models and /v1/models listing
  • per-model process management with per-model stdout/stderr logs

Repository Layout

  • src/lib.rs: core multiplexer library (SgLangMux)
  • src/bin/sglangmuxd.rs: HTTP daemon frontend
  • examples/sglangmux-manual/: manual verification scripts for two models
  • tests/: integration tests

How It Works

  1. You provide one launch script per model.
  2. Each script must include:
    • model identifier via either MODEL_NAME=<openai-model-id> or launch arg --model <openai-model-id> (or --model-path <openai-model-id>)
    • local port via either PORT=<local-port> or launch arg --port <local-port>
  3. sglangmuxd starts models (bootstrap), tracks active model state, and forwards requests to the correct upstream model server.
  4. When the requested model differs from active model, the mux switches by pausing/sleeping current model and waking target model.

Requirements

  • Rust toolchain (for cargo run)
  • Python environment with sglang installed for your model launch scripts
  • GPU/runtime support required by your chosen SGLang models

Python Install (uv / pip)

The project ships a Python CLI wrapper that executes the Rust daemon binary.

After publishing to PyPI, usage is:

uv pip install sglangmux
sglangmux --help

For local install from this repository:

uv pip install .
sglangmux --help

Notes:

  • The wheel build runs cargo build --release --bin sglangmuxd.
  • Installing from source requires a working Rust toolchain.
  • The installed command is sglangmux, which forwards all args to sglangmuxd.

Quick Start

1. Prepare Python env for model scripts

uv venv --python /usr/bin/python3.10 .venv
uv pip install --python .venv/bin/python sglang

2. Start mux with example scripts

./examples/sglangmux-manual/start_sglangmux.sh

3. Send requests

./examples/sglangmux-manual/request_models.sh
./examples/sglangmux-manual/request_qwen.sh
./examples/sglangmux-manual/request_hf.sh

See examples/sglangmux-manual/README.md for detailed manual workflow.

Running sglangmuxd Directly

cargo run --bin sglangmuxd -- \
  --host 127.0.0.1 \
  --listen-port 30100 \
  --upstream-timeout-secs 120 \
  --model-ready-timeout-secs 120 \
  --model-switch-timeout-secs 60 \
  --log-dir sglangmux-logs \
  /path/to/model1.sh /path/to/model2.sh

CLI Options

  • --host: bind host for frontend daemon (default 127.0.0.1)
  • --listen-port: bind port for frontend daemon (default 30100)
  • --upstream-timeout-secs: timeout waiting for upstream model response (default 120)
  • --model-ready-timeout-secs: timeout while waiting for model process to become healthy (default 120)
  • --model-switch-timeout-secs: timeout waiting for model activation/switch for a pending request (default 60)
  • --log-dir: directory for per-model logs (default sglangmux-logs)

To expose externally:

--host 0.0.0.0

Frontend API

Implemented routes:

  • GET /health
  • GET /models
  • GET /v1/models
  • POST /v1/chat/completions
  • POST /v1/completions

Notes:

  • Requests must include a string model field.
  • For streaming (stream: true / SSE), sglangmuxd forwards the streaming payload through.

Model Launch Script Contract

Each script passed to sglangmuxd must define a model id and local port. The model id can come from MODEL_NAME or launch flags --model / --model-path, and the local port can come from PORT or launch flag --port:

MODEL_NAME="Qwen/Qwen3-0.6B"
PORT=30001

The daemon parses these values from script text and uses them to build model registry and routing map.

Timeouts and Failure Modes

  • upstream-timeout-secs: model server did not respond in time for completion request (returns 504)
  • model-ready-timeout-secs: model process did not become healthy during startup/bring-up
  • model-switch-timeout-secs: request waited too long for requested model to become active

Common frontend errors:

  • model not ready: ...: switch/startup issue
  • upstream request timed out: generation took longer than upstream timeout
  • invalid upstream response: upstream returned non-JSON where JSON expected (non-stream path)

Logging

Rust log filter is controlled by RUST_LOG.

Examples:

RUST_LOG=info ./examples/sglangmux-manual/start_sglangmux.sh
RUST_LOG=sglangmux=info,sglangmuxd=info,warn ./examples/sglangmux-manual/start_sglangmux.sh

Per-model stdout/stderr log files are written under --log-dir.

Graceful Shutdown

sglangmuxd listens for Ctrl+C and triggers model shutdown via mux cleanup logic before exit.

Development

Build:

cargo check --bin sglangmuxd

Test:

cargo test

Publishing to PyPI

Use the helper script:

scripts/publish_pypi.sh

Upload to TestPyPI:

scripts/publish_pypi.sh --testpypi

The script:

  • builds sdist + wheel (python -m build)
  • runs twine check on artifacts
  • uploads via twine upload

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sglangmux-0.1.1.tar.gz (33.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sglangmux-0.1.1-py3-none-any.whl (3.2 MB view details)

Uploaded Python 3

File details

Details for the file sglangmux-0.1.1.tar.gz.

File metadata

  • Download URL: sglangmux-0.1.1.tar.gz
  • Upload date:
  • Size: 33.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.22

File hashes

Hashes for sglangmux-0.1.1.tar.gz
Algorithm Hash digest
SHA256 7f7a16d1f7b57e35693a43ac0061f882e8d1f4e1b94916f51d44af1d4ff511b4
MD5 5ad8b9c3fe2a214e742b99bb8a8db28a
BLAKE2b-256 d443afe25bc571cd94500fab01a4a7d7e177e3a9932d6d4acc29d86701b18b89

See more details on using hashes here.

File details

Details for the file sglangmux-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: sglangmux-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.22

File hashes

Hashes for sglangmux-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 36abfb5000754ee30f4a1df5cf97ba1bbba768ba827668b51b05042e93a29cb5
MD5 d72d2a437c90bc4a1c96b758b090cc03
BLAKE2b-256 bc6e70ba9c5a4de3f12797c0a02ce484f01d94979d5d3e9d5ae854703ff1f3ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page