Skip to main content

Generate Shenron docker-compose deployments from model config files

Project description

Shenron

Shenron now ships as a config-driven generator for production LLM docker-compose deployments.

shenron reads a model config YAML and generates:

  • docker-compose.yml
  • .generated/onwards_config.json
  • .generated/prometheus.yml
  • .generated/scouter_reporter.env
  • .generated/engine_start.sh
  • .generated/engine_start_N.sh + .generated/sglangmux_start.sh when models: has 2+ entries

Quick Start

uv pip install shenron
shenron get
docker compose up -d

shenron get reads a per-release config index asset, shows available configs with arrow-key selection, downloads the chosen config, and generates deployment artifacts in the current directory. Using --release latest also rewrites shenron_version in the downloaded config to latest. You can also override config values on download with:

  • --api-key (writes api_key)
  • --scouter-api-key (writes scouter_ingest_api_key)
  • --scouter-collector-instance (writes scouter_collector_instance; alias: --scouter-colector-instance)

By default, shenron get pulls release configs from doublewordai/shenron-configs.

Use shenron get --helm to download the Helm chart bundle for the selected release and extract it to ./shenron-helm (or set --dir). This gives you a chart directory ready for helm install.

You can also install directly with Helm from release assets in shenron-configs:

  • helm repo add shenron https://github.com/doublewordai/shenron-configs/releases/download/v0.15.1
  • helm install my-shenron shenron/shenron --version 0.15.1

Helm Runtime Components

The Helm chart now deploys:

  • one model Deployment + Service per entry in values.models
  • one SGLang router Deployment + Service per model
  • onwards configured to call model-specific router services (/v1) instead of model services directly
  • an optional replica manager API service
  • optional Scouter reporter Deployments (one per model, replicas matched to models.<name>.replicas)

Replica manager API:

  • GET /healthz
  • GET /v1/models
  • POST /v1/models/{model}/replicas with body {\"replicas\": <int>=0}

Replica manager behavior:

  • authenticated with Authorization: Bearer <token> from values.replicaManager.auth.tokenSecret
  • performs helm upgrade --reuse-values and only updates models.<model>.replicas
  • enforces GPU capacity using:
    • values.cluster.total_gpus
    • per-model values.models.<model>.num_gpus
    • num_gpus * replicas per model, summed across all models

Scouter reporter behavior:

  • each model gets a dedicated reporter Deployment with the same replica count as the model Deployment
  • reporters run in SCOUTER_MODE=reporter and emit req/s from Prometheus to the collector ingest endpoint
  • collector instance and ingest API key are sourced from Kubernetes Secrets under values.scouterReporter.collector.*Secret

shenron . still works and expects exactly one config YAML (*.yml or *.yaml) in the current directory, unless you pass a config file path directly.

Configs

Repo configs are stored in configs/.

Available starter configs:

  • configs/Qwen06B-cu126-TP1.yml
  • configs/Qwen06B-cu129-TP1.yml
  • configs/Qwen06B-cu130-TP1.yml
  • configs/Qwen30B-A3B-cu126-TP1.yml
  • configs/Qwen30B-A3B-cu129-TP1.yml
  • configs/Qwen30B-A3B-cu129-TP2.yml
  • configs/Qwen30B-A3B-cu130-TP2.yml
  • configs/Qwen235-A22B-cu129-TP2.yml
  • configs/Qwen235-A22B-cu129-TP4.yml
  • configs/Qwen235-A22B-cu130-TP2.yml

This file uses the same defaults that were previously hardcoded in docker/run_docker_compose.sh.

Engine selection and args:

  • engine: vllm or sglang (default: vllm)
  • engine_args: engine CLI args appended after core settings.
  • engine_env: top-level default engine environment variables as alternating KEY, VALUE entries.
  • models[*].engine_envs: per-model engine environment variables as alternating KEY, VALUE entries.
  • engine_port, engine_host: engine bind settings used for generated scripts and targets.
  • engine_use_cuda_ipc_transport: when true, exports SGLANG_USE_CUDA_IPC_TRANSPORT=1 before launching SGLang.
  • models: optional per-model engine config. With 1 entry, Shenron generates a single engine_start.sh from that model entry. With 2+ entries, Shenron starts sglangmux (requires engine: sglang).
  • sglangmux_listen_port, sglangmux_host, sglangmux_upstream_timeout_secs, sglangmux_model_ready_timeout_secs, sglangmux_model_switch_timeout_secs, sglangmux_log_dir: optional sglangmux settings (hyphenated aliases like sglangmux-listen-port are also accepted).

engine_args, engine_env, and models[*].engine_envs values accept YAML scalars (string/number/bool). If you need to pass a structured value (like --override-generation-config), provide a YAML mapping and it will be JSON-encoded. engine_env and models[*].engine_envs must have an even number of entries (KEY VALUE pairs), and variable names must be valid shell env identifiers. Set VLLM_ENABLE_RESPONSES_API_STORE and VLLM_FLASHINFER_MOE_BACKEND through engine_env or models[*].engine_envs. Legacy keys (vllm_args, sglang_args, vllm_port, vllm_host, sglang_env, sglang_use_cuda_ipc_transport) are still accepted as aliases.

Single Config models: Schema (Single-Model + optional sglangmux)

When models: has 2+ entries, Shenron generates one engine launch script per model plus a mux launcher:

engine: sglang
sglangmux_listen_port: 8100
sglangmux_host: 0.0.0.0
sglangmux_upstream_timeout_secs: 120
sglangmux_model_ready_timeout_secs: 600
sglangmux_model_switch_timeout_secs: 120
sglangmux_log_dir: /tmp/sglangmux

models:
- model_name: Qwen/Qwen3-0.6B
  engine_port: 8001
  api_key: sk-model-a
  engine_envs: [VLLM_ENABLE_RESPONSES_API_STORE, -1]
  engine_args: [--tp, 1]
- model_name: Qwen/Qwen3-30B-A3B
  engine_port: 8002
  api_key: sk-model-b
  engine_use_cuda_ipc_transport: true
  engine_args: [--tp, 2]

Rules in models: mode:

  • with exactly 1 model entry: works for any engine value and Shenron generates .generated/engine_start.sh
  • with 2+ model entries: engine must be sglang
  • each models[*].model_name must be unique
  • each models[*].engine_port must be set and unique
  • with 2+ model entries: sglangmux_listen_port must be different from all model ports
  • when models: is set, top-level model_name/engine_port/engine_host can be omitted

With 2+ model entries, .generated/onwards_config.json contains one target per model and all target URLs point to http://vllm:<sglangmux_listen_port>/v1.

Generated Compose Behavior

docker-compose.yml is fully rendered from config values:

  • model image tag from shenron_version + cuda_version
  • onwards image tag from onwards_version
  • service ports from config
  • no ${SHENRON_VERSION} placeholders

Development

# Run tests (Rust + CLI + compose checks)
./scripts/ci.sh

# Install local package for manual testing
python3 -m pip install -e .

# Generate from repo config
shenron configs/Qwen06B-cu126-TP1.yml --output-dir /tmp/shenron-test

Release Automation

  • release-assets.yaml publishes stamped config files (*.yml) as release assets.
  • release-assets.yaml also publishes configs-index.txt, which powers shenron get.
  • release-assets.yaml packages Helm chart assets as shenron-<version>.tgz + index.yaml (Helm repository format).
  • release-assets.yaml mirrors *.yml, configs-index.txt, shenron-*.tgz, and index.yaml into ${OWNER}/shenron-configs under the same tag as the main shenron release.
  • Set CONFIGS_REPO_TOKEN (or reuse RELEASE_PLEASE_TOKEN) with write access to the configs repo release assets; optional repo variable CONFIGS_REPO overrides the default target (${OWNER}/shenron-configs).
  • python-release.yaml builds/publishes the shenron package to PyPI on release tags.
  • Docker image build/push via Depot remains in ci.yaml and still triggers when docker/Dockerfile.vllm.cu*, docker/Dockerfile.sglang.cu*, or VERSION changes.

License

MIT, see LICENSE.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shenron-0.17.0.tar.gz (66.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shenron-0.17.0-cp311-cp311-manylinux_2_34_x86_64.whl (515.3 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

File details

Details for the file shenron-0.17.0.tar.gz.

File metadata

  • Download URL: shenron-0.17.0.tar.gz
  • Upload date:
  • Size: 66.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for shenron-0.17.0.tar.gz
Algorithm Hash digest
SHA256 244b882eba1542b80dd51424455d9c671ac3e1adba9033b9f89e8a1ff6bc10b1
MD5 6811ffe26fc949cbf6607dedce59e3c4
BLAKE2b-256 481941aea8ee111ec947ff4a565b35fca69175bb08c06b65964761397b5735b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for shenron-0.17.0.tar.gz:

Publisher: python-release.yaml on doublewordai/shenron

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file shenron-0.17.0-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for shenron-0.17.0-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 237327bee6eaabd8edad75e675d5c69990fdc0d34e1043782b992b8976a939e6
MD5 91fe90c134180697e452c57ec4db2090
BLAKE2b-256 dd0b2938bd865da67eb5a87cd1f89cf4c21f5e8dc525ce8c708e0f376e941038

See more details on using hashes here.

Provenance

The following attestation bundles were made for shenron-0.17.0-cp311-cp311-manylinux_2_34_x86_64.whl:

Publisher: python-release.yaml on doublewordai/shenron

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page