Generate Shenron docker-compose deployments from model config files
Project description
Shenron
Shenron now ships as a config-driven generator for production LLM docker-compose deployments.
shenron reads a model config YAML and generates:
docker-compose.yml.generated/onwards_config.json.generated/prometheus.yml.generated/scouter_reporter.env.generated/engine_start.sh.generated/engine_start_N.sh+.generated/sglangmux_start.shwhenmodels:has 2+ entries
Quick Start
uv pip install shenron
shenron get
docker compose up -d
shenron get reads a per-release config index asset, shows available configs with arrow-key selection, downloads the chosen config, and generates deployment artifacts in the current directory. Using --release latest also rewrites shenron_version in the downloaded config to latest. You can also override config values on download with:
--api-key(writesapi_key)--scouter-api-key(writesscouter_ingest_api_key)--scouter-collector-instance(writesscouter_collector_instance; alias:--scouter-colector-instance)
By default, shenron get pulls release configs from doublewordai/shenron-configs.
Use shenron get --helm to download the Helm chart bundle for the selected release and extract it to ./shenron-helm (or set --dir). This gives you a chart directory ready for helm install.
You can also install directly with Helm from release assets in shenron-configs:
helm repo add shenron https://github.com/doublewordai/shenron-configs/releases/download/v0.15.1helm install my-shenron shenron/shenron --version 0.15.1
shenron . still works and expects exactly one config YAML (*.yml or *.yaml) in the current directory, unless you pass a config file path directly.
Configs
Repo configs are stored in configs/.
Available starter configs:
configs/Qwen06B-cu126-TP1.ymlconfigs/Qwen06B-cu129-TP1.ymlconfigs/Qwen06B-cu130-TP1.ymlconfigs/Qwen30B-A3B-cu126-TP1.ymlconfigs/Qwen30B-A3B-cu129-TP1.ymlconfigs/Qwen30B-A3B-cu129-TP2.ymlconfigs/Qwen30B-A3B-cu130-TP2.ymlconfigs/Qwen235-A22B-cu129-TP2.ymlconfigs/Qwen235-A22B-cu129-TP4.ymlconfigs/Qwen235-A22B-cu130-TP2.yml
This file uses the same defaults that were previously hardcoded in docker/run_docker_compose.sh.
Engine selection and args:
engine:vllmorsglang(default:vllm)engine_args: engine CLI args appended after core settings.engine_env: top-level default engine environment variables as alternatingKEY, VALUEentries.models[*].engine_envs: per-model engine environment variables as alternatingKEY, VALUEentries.engine_port,engine_host: engine bind settings used for generated scripts and targets.engine_use_cuda_ipc_transport: whentrue, exportsSGLANG_USE_CUDA_IPC_TRANSPORT=1before launching SGLang.models: optional per-model engine config. With 1 entry, Shenron generates a singleengine_start.shfrom that model entry. With 2+ entries, Shenron startssglangmux(requiresengine: sglang).sglangmux_listen_port,sglangmux_host,sglangmux_upstream_timeout_secs,sglangmux_model_ready_timeout_secs,sglangmux_model_switch_timeout_secs,sglangmux_log_dir: optionalsglangmuxsettings (hyphenated aliases likesglangmux-listen-portare also accepted).
engine_args, engine_env, and models[*].engine_envs values accept YAML scalars (string/number/bool). If you need to pass a structured value (like --override-generation-config), provide a YAML mapping and it will be JSON-encoded.
engine_env and models[*].engine_envs must have an even number of entries (KEY VALUE pairs), and variable names must be valid shell env identifiers.
Set VLLM_ENABLE_RESPONSES_API_STORE and VLLM_FLASHINFER_MOE_BACKEND through engine_env or models[*].engine_envs.
Legacy keys (vllm_args, sglang_args, vllm_port, vllm_host, sglang_env, sglang_use_cuda_ipc_transport) are still accepted as aliases.
Single Config models: Schema (Single-Model + optional sglangmux)
When models: has 2+ entries, Shenron generates one engine launch script per model plus a mux launcher:
engine: sglang
sglangmux_listen_port: 8100
sglangmux_host: 0.0.0.0
sglangmux_upstream_timeout_secs: 120
sglangmux_model_ready_timeout_secs: 600
sglangmux_model_switch_timeout_secs: 120
sglangmux_log_dir: /tmp/sglangmux
models:
- model_name: Qwen/Qwen3-0.6B
engine_port: 8001
api_key: sk-model-a
engine_envs: [VLLM_ENABLE_RESPONSES_API_STORE, -1]
engine_args: [--tp, 1]
- model_name: Qwen/Qwen3-30B-A3B
engine_port: 8002
api_key: sk-model-b
engine_use_cuda_ipc_transport: true
engine_args: [--tp, 2]
Rules in models: mode:
- with exactly 1 model entry: works for any
enginevalue and Shenron generates.generated/engine_start.sh - with 2+ model entries:
enginemust besglang - each
models[*].model_namemust be unique - each
models[*].engine_portmust be set and unique - with 2+ model entries:
sglangmux_listen_portmust be different from all model ports - when
models:is set, top-levelmodel_name/engine_port/engine_hostcan be omitted
With 2+ model entries, .generated/onwards_config.json contains one target per model and all target URLs point to http://vllm:<sglangmux_listen_port>/v1.
Generated Compose Behavior
docker-compose.yml is fully rendered from config values:
- model image tag from
shenron_version+cuda_version onwardsimage tag fromonwards_version- service ports from config
- no
${SHENRON_VERSION}placeholders
Development
# Run tests (Rust + CLI + compose checks)
./scripts/ci.sh
# Install local package for manual testing
python3 -m pip install -e .
# Generate from repo config
shenron configs/Qwen06B-cu126-TP1.yml --output-dir /tmp/shenron-test
Release Automation
release-assets.yamlpublishes stamped config files (*.yml) as release assets.release-assets.yamlalso publishesconfigs-index.txt, which powersshenron get.release-assets.yamlpackages Helm chart assets asshenron-<version>.tgz+index.yaml(Helm repository format).release-assets.yamlmirrors*.yml,configs-index.txt,shenron-*.tgz, andindex.yamlinto${OWNER}/shenron-configsunder the same tag as the mainshenronrelease.- Set
CONFIGS_REPO_TOKEN(or reuseRELEASE_PLEASE_TOKEN) with write access to the configs repo release assets; optional repo variableCONFIGS_REPOoverrides the default target (${OWNER}/shenron-configs). python-release.yamlbuilds/publishes theshenronpackage to PyPI on release tags.- Docker image build/push via Depot remains in
ci.yamland still triggers whendocker/Dockerfile.vllm.cu*,docker/Dockerfile.sglang.cu*, orVERSIONchanges.
License
MIT, see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file shenron-0.16.1.tar.gz.
File metadata
- Download URL: shenron-0.16.1.tar.gz
- Upload date:
- Size: 53.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe468b98877e015fc851e280e82824e141559a5b5fe195dda521b913ef95aebd
|
|
| MD5 |
8b9932d20731df8dc2e16282adc54072
|
|
| BLAKE2b-256 |
bd01c4e5d7df9c29db3cd59d212d45e9214930da065e32fb8588f12bffa641c4
|
Provenance
The following attestation bundles were made for shenron-0.16.1.tar.gz:
Publisher:
python-release.yaml on doublewordai/shenron
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
shenron-0.16.1.tar.gz -
Subject digest:
fe468b98877e015fc851e280e82824e141559a5b5fe195dda521b913ef95aebd - Sigstore transparency entry: 1004101071
- Sigstore integration time:
-
Permalink:
doublewordai/shenron@04eec2a009551b6f2ffb9924c12667b5447e8d36 -
Branch / Tag:
refs/tags/v0.16.1 - Owner: https://github.com/doublewordai
-
Access:
internal
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-release.yaml@04eec2a009551b6f2ffb9924c12667b5447e8d36 -
Trigger Event:
release
-
Statement type:
File details
Details for the file shenron-0.16.1-cp311-cp311-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: shenron-0.16.1-cp311-cp311-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 514.8 kB
- Tags: CPython 3.11, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f371ab598c5e7f57681625f96fa2e403a4e7c110d7099bac9c7b3ed0681c87d4
|
|
| MD5 |
1123e1e79b380848e6ffc801d1ad512b
|
|
| BLAKE2b-256 |
34e3d24f1e89369b1b694cb6fa24f7eee1f3af920986468230efc6045c6e3888
|
Provenance
The following attestation bundles were made for shenron-0.16.1-cp311-cp311-manylinux_2_34_x86_64.whl:
Publisher:
python-release.yaml on doublewordai/shenron
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
shenron-0.16.1-cp311-cp311-manylinux_2_34_x86_64.whl -
Subject digest:
f371ab598c5e7f57681625f96fa2e403a4e7c110d7099bac9c7b3ed0681c87d4 - Sigstore transparency entry: 1004101074
- Sigstore integration time:
-
Permalink:
doublewordai/shenron@04eec2a009551b6f2ffb9924c12667b5447e8d36 -
Branch / Tag:
refs/tags/v0.16.1 - Owner: https://github.com/doublewordai
-
Access:
internal
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-release.yaml@04eec2a009551b6f2ffb9924c12667b5447e8d36 -
Trigger Event:
release
-
Statement type: