SGLang multiplexer with an OpenAI-compatible frontend
Project description
sglangmux
sglangmux is a lightweight Rust multiplexer for running multiple SGLang model servers behind one OpenAI-compatible frontend.
It provides:
- one frontend endpoint for chat/completions
- automatic model activation/switching based on the request
model - OpenAI-style
/modelsand/v1/modelslisting - per-model process management with per-model stdout/stderr logs
Repository Layout
src/lib.rs: core multiplexer library (SgLangMux)src/bin/sglangmuxd.rs: HTTP daemon frontendexamples/sglangmux-manual/: manual verification scripts for two modelstests/: integration tests
How It Works
- You provide one launch script per model.
- Each script must include:
MODEL_NAME=<openai-model-id>PORT=<local-port>
sglangmuxdstarts models (bootstrap), tracks active model state, and forwards requests to the correct upstream model server.- When the requested model differs from active model, the mux switches by pausing/sleeping current model and waking target model.
Requirements
- Rust toolchain (for
cargo run) - Python environment with
sglanginstalled for your model launch scripts - GPU/runtime support required by your chosen SGLang models
Python Install (uv / pip)
The project ships a Python CLI wrapper that executes the Rust daemon binary.
After publishing to PyPI, usage is:
uv pip install sglangmux
sglangmux --help
For local install from this repository:
uv pip install .
sglangmux --help
Notes:
- The wheel build runs
cargo build --release --bin sglangmuxd. - Installing from source requires a working Rust toolchain.
- The installed command is
sglangmux, which forwards all args tosglangmuxd.
Quick Start
1. Prepare Python env for model scripts
uv venv --python /usr/bin/python3.10 .venv
uv pip install --python .venv/bin/python sglang
2. Start mux with example scripts
./examples/sglangmux-manual/start_sglangmux.sh
3. Send requests
./examples/sglangmux-manual/request_models.sh
./examples/sglangmux-manual/request_qwen.sh
./examples/sglangmux-manual/request_hf.sh
See examples/sglangmux-manual/README.md for detailed manual workflow.
Running sglangmuxd Directly
cargo run --bin sglangmuxd -- \
--host 127.0.0.1 \
--listen-port 30100 \
--upstream-timeout-secs 120 \
--model-ready-timeout-secs 120 \
--model-switch-timeout-secs 60 \
--log-dir sglangmux-logs \
/path/to/model1.sh /path/to/model2.sh
CLI Options
--host: bind host for frontend daemon (default127.0.0.1)--listen-port: bind port for frontend daemon (default30100)--upstream-timeout-secs: timeout waiting for upstream model response (default120)--model-ready-timeout-secs: timeout while waiting for model process to become healthy (default120)--model-switch-timeout-secs: timeout waiting for model activation/switch for a pending request (default60)--log-dir: directory for per-model logs (defaultsglangmux-logs)
To expose externally:
--host 0.0.0.0
Frontend API
Implemented routes:
GET /healthGET /modelsGET /v1/modelsPOST /v1/chat/completionsPOST /v1/completions
Notes:
- Requests must include a string
modelfield. - For streaming (
stream: true/ SSE),sglangmuxdforwards the streaming payload through.
Model Launch Script Contract
Each script passed to sglangmuxd must define:
MODEL_NAME="Qwen/Qwen3-0.6B"
PORT=30001
The daemon parses these values from script text and uses them to build model registry and routing map.
Timeouts and Failure Modes
upstream-timeout-secs: model server did not respond in time for completion request (returns504)model-ready-timeout-secs: model process did not become healthy during startup/bring-upmodel-switch-timeout-secs: request waited too long for requested model to become active
Common frontend errors:
model not ready: ...: switch/startup issueupstream request timed out: generation took longer than upstream timeoutinvalid upstream response: upstream returned non-JSON where JSON expected (non-stream path)
Logging
Rust log filter is controlled by RUST_LOG.
Examples:
RUST_LOG=info ./examples/sglangmux-manual/start_sglangmux.sh
RUST_LOG=sglangmux=info,sglangmuxd=info,warn ./examples/sglangmux-manual/start_sglangmux.sh
Per-model stdout/stderr log files are written under --log-dir.
Graceful Shutdown
sglangmuxd listens for Ctrl+C and triggers model shutdown via mux cleanup logic before exit.
Development
Build:
cargo check --bin sglangmuxd
Test:
cargo test
Publishing to PyPI
Use the helper script:
scripts/publish_pypi.sh
Upload to TestPyPI:
scripts/publish_pypi.sh --testpypi
The script:
- builds sdist + wheel (
python -m build) - runs
twine checkon artifacts - uploads via
twine upload
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sglangmux-0.1.0.tar.gz.
File metadata
- Download URL: sglangmux-0.1.0.tar.gz
- Upload date:
- Size: 32.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f222a9fcc93768018c32aba1b6bb5968e7962d6a68b9ae455c660608f80d7629
|
|
| MD5 |
dbff98ba88df4ea79f8df58456ca0222
|
|
| BLAKE2b-256 |
f89fe92e32a944793459cb13629a3e8ed2ea59dc3047a0ed66069510a911e366
|
File details
Details for the file sglangmux-0.1.0-py3-none-any.whl.
File metadata
- Download URL: sglangmux-0.1.0-py3-none-any.whl
- Upload date:
- Size: 3.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62d2570c8c784763e975983267d7efb042e801d2e2b3fcbfc0a4812167b74898
|
|
| MD5 |
801fbf1b13515ea8961fc41a6d3a8841
|
|
| BLAKE2b-256 |
6b268e3ebc0fdea649f1e7e7668f5da971947a43c2f1b4ad0aec66f4be248a5f
|