Skip to main content

SGLang multiplexer with an OpenAI-compatible frontend

Project description

sglangmux

sglangmux is a lightweight Rust multiplexer for running multiple SGLang model servers behind one OpenAI-compatible frontend.

It provides:

  • one frontend endpoint for chat/completions
  • automatic model activation/switching based on the request model
  • OpenAI-style /models and /v1/models listing
  • per-model process management with per-model stdout/stderr logs

Repository Layout

  • src/lib.rs: core multiplexer library (SgLangMux)
  • src/bin/sglangmuxd.rs: HTTP daemon frontend
  • examples/sglangmux-manual/: manual verification scripts for two models
  • tests/: integration tests

How It Works

  1. You provide one launch script per model.
  2. Each script must include:
    • MODEL_NAME=<openai-model-id>
    • PORT=<local-port>
  3. sglangmuxd starts models (bootstrap), tracks active model state, and forwards requests to the correct upstream model server.
  4. When the requested model differs from active model, the mux switches by pausing/sleeping current model and waking target model.

Requirements

  • Rust toolchain (for cargo run)
  • Python environment with sglang installed for your model launch scripts
  • GPU/runtime support required by your chosen SGLang models

Python Install (uv / pip)

The project ships a Python CLI wrapper that executes the Rust daemon binary.

After publishing to PyPI, usage is:

uv pip install sglangmux
sglangmux --help

For local install from this repository:

uv pip install .
sglangmux --help

Notes:

  • The wheel build runs cargo build --release --bin sglangmuxd.
  • Installing from source requires a working Rust toolchain.
  • The installed command is sglangmux, which forwards all args to sglangmuxd.

Quick Start

1. Prepare Python env for model scripts

uv venv --python /usr/bin/python3.10 .venv
uv pip install --python .venv/bin/python sglang

2. Start mux with example scripts

./examples/sglangmux-manual/start_sglangmux.sh

3. Send requests

./examples/sglangmux-manual/request_models.sh
./examples/sglangmux-manual/request_qwen.sh
./examples/sglangmux-manual/request_hf.sh

See examples/sglangmux-manual/README.md for detailed manual workflow.

Running sglangmuxd Directly

cargo run --bin sglangmuxd -- \
  --host 127.0.0.1 \
  --listen-port 30100 \
  --upstream-timeout-secs 120 \
  --model-ready-timeout-secs 120 \
  --model-switch-timeout-secs 60 \
  --log-dir sglangmux-logs \
  /path/to/model1.sh /path/to/model2.sh

CLI Options

  • --host: bind host for frontend daemon (default 127.0.0.1)
  • --listen-port: bind port for frontend daemon (default 30100)
  • --upstream-timeout-secs: timeout waiting for upstream model response (default 120)
  • --model-ready-timeout-secs: timeout while waiting for model process to become healthy (default 120)
  • --model-switch-timeout-secs: timeout waiting for model activation/switch for a pending request (default 60)
  • --log-dir: directory for per-model logs (default sglangmux-logs)

To expose externally:

--host 0.0.0.0

Frontend API

Implemented routes:

  • GET /health
  • GET /models
  • GET /v1/models
  • POST /v1/chat/completions
  • POST /v1/completions

Notes:

  • Requests must include a string model field.
  • For streaming (stream: true / SSE), sglangmuxd forwards the streaming payload through.

Model Launch Script Contract

Each script passed to sglangmuxd must define:

MODEL_NAME="Qwen/Qwen3-0.6B"
PORT=30001

The daemon parses these values from script text and uses them to build model registry and routing map.

Timeouts and Failure Modes

  • upstream-timeout-secs: model server did not respond in time for completion request (returns 504)
  • model-ready-timeout-secs: model process did not become healthy during startup/bring-up
  • model-switch-timeout-secs: request waited too long for requested model to become active

Common frontend errors:

  • model not ready: ...: switch/startup issue
  • upstream request timed out: generation took longer than upstream timeout
  • invalid upstream response: upstream returned non-JSON where JSON expected (non-stream path)

Logging

Rust log filter is controlled by RUST_LOG.

Examples:

RUST_LOG=info ./examples/sglangmux-manual/start_sglangmux.sh
RUST_LOG=sglangmux=info,sglangmuxd=info,warn ./examples/sglangmux-manual/start_sglangmux.sh

Per-model stdout/stderr log files are written under --log-dir.

Graceful Shutdown

sglangmuxd listens for Ctrl+C and triggers model shutdown via mux cleanup logic before exit.

Development

Build:

cargo check --bin sglangmuxd

Test:

cargo test

Publishing to PyPI

Use the helper script:

scripts/publish_pypi.sh

Upload to TestPyPI:

scripts/publish_pypi.sh --testpypi

The script:

  • builds sdist + wheel (python -m build)
  • runs twine check on artifacts
  • uploads via twine upload

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sglangmux-0.1.0.tar.gz (32.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sglangmux-0.1.0-py3-none-any.whl (3.2 MB view details)

Uploaded Python 3

File details

Details for the file sglangmux-0.1.0.tar.gz.

File metadata

  • Download URL: sglangmux-0.1.0.tar.gz
  • Upload date:
  • Size: 32.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for sglangmux-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f222a9fcc93768018c32aba1b6bb5968e7962d6a68b9ae455c660608f80d7629
MD5 dbff98ba88df4ea79f8df58456ca0222
BLAKE2b-256 f89fe92e32a944793459cb13629a3e8ed2ea59dc3047a0ed66069510a911e366

See more details on using hashes here.

File details

Details for the file sglangmux-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sglangmux-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for sglangmux-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 62d2570c8c784763e975983267d7efb042e801d2e2b3fcbfc0a4812167b74898
MD5 801fbf1b13515ea8961fc41a6d3a8841
BLAKE2b-256 6b268e3ebc0fdea649f1e7e7668f5da971947a43c2f1b4ad0aec66f4be248a5f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page