Admin dashboard + satellite clients for multi-model vLLM deployments

Project description

vLLM Cluster Manager

VLLM Cluster Manager overview UI

Admin dashboard + satellite clients for multi-model vLLM deployments.

Use this UI to deploy vLLM serve endpoints across a cluster so you can stand up multiple LLM servers (same or different models) with a few clicks. It is ideal for research labs or small business environments that need repeatable, multi-endpoint deployments without building a full MLOps stack.

Deployment is as simple as running the CLI on the host and on each client, with automatic client discovery. You can run in the foreground or with --service to install persistent systemd services.

Tested hardware/software

GPUs: NVIDIA H100, NVIDIA A100, NVIDIA L40, NVIDIA DGX Spark (GB10), NVIDIA RTX 4090.
OS: Ubuntu 22.04 and Ubuntu 24.04.

What it can do

Register and manage GPU nodes that run vLLM workloads.
Create model configurations and launch models on selected nodes.
Monitor node health and model status.
Stream logs from running processes for quick troubleshooting.

Real-time logs

Stream logs from running nodes and model processes directly in the dashboard.

Real-time logs window

Model configuration

Define and manage model settings (weights, runtime settings, resource usage) from the UI.

Model configuration panel

Architecture

Host: Admin services for infrastructure, API, and UI.
- Infra: Postgres + Consul (service discovery) via Docker Compose.
- Backend: FastAPI service for orchestration and persistence.
- Frontend: React + Vite admin dashboard.
Client: Python agent running on GPU nodes; registers with the host and runs vLLM workloads.

Repo layout

host/ Admin services (infra, backend, frontend)
client/ Satellite node agent
img/ Screenshots used in documentation

Prerequisites

Host:

Docker + Docker Compose plugin (configure the docker group so no sudo is required).
Node.js + npm.
Python 3.12.
uv (Python package manager).

Client:

NVIDIA GPU with CUDA.
nvcc or nvidia-smi on PATH (used to detect CUDA version).
Python 3.12 + python3.12-dev and build-essential (Debian/Ubuntu).

On Debian/Ubuntu:

sudo apt update
sudo apt install -y python3.12-dev build-essential

Install uv if you don't already have it:

curl -LsSf https://astral.sh/uv/install.sh | sh

Install (pip)

Create and activate a Python 3.12 virtual environment:

uv venv --python=3.12
source .venv/bin/activate

uv pip install vllm-cluster-manager

Start the host

Foreground (no sudo):

vllm-cluster-manager host up --host-ip 127.0.0.1 --host-frontend-port 5173 --host-discover-port 47528

host up builds a static frontend bundle and serves it with the Vite preview server. The UI assumes it is served at / by default; if you serve it under a subpath (for example /vllm/), pass --base-path /vllm/ so asset URLs and API/WebSocket paths are generated correctly.

Persistent service (systemd):

vllm-cluster-manager host up --service --host-ip 127.0.0.1 --host-frontend-port 5173 --host-discover-port 47528

--host-discover-port sets the discovery port used for clients. Use --host-backend-port to override the backend API port (default 8000).

Stop host services (foreground or systemd):

vllm-cluster-manager host down

Start a client

Foreground (no sudo):

vllm-cluster-manager client up --host-ip 127.0.0.1 --host-discover-port 47528

Persistent service (systemd):

vllm-cluster-manager client up --service --host-ip 127.0.0.1 --host-discover-port 47528

Stop client services (foreground or systemd):

vllm-cluster-manager client down

CLI flags

Host (host up)

Flag	Default	Description
`--service`	`false`	Run as a persistent systemd service.
`--host-ip`	`127.0.0.1`	Bind host for the backend API and UI backend target.
`--host-frontend-port`	`5173`	UI port.
`--host-discover-port`	`47528`	Discovery port used by clients.
`--host-backend-port`	`8000`	Backend API port.
`--base-path`	`/`	Base path for the UI (reverse proxy subpath).
`--postgres-host`	`127.0.0.1`	Postgres host.
`--postgres-port`	`5757`	Postgres port.
`--postgres-db`	`vllm_admin`	Postgres database name.
`--postgres-user`	`vllm`	Postgres user.
`--postgres-password`	`change-me`	Postgres password.

Client (client up)

Flag	Default	Description
`--service`	`false`	Run as a persistent systemd service.
`--host-ip`	`127.0.0.1`	Host IP for discovery.
`--host-discover-port`	`47528`	Host discovery port.
`--client-host`	`0.0.0.0`	Client bind host.
`--client-port`	`9000`	Client bind port.
`--node-name`	`<hostname>`	Node name used for registration.

Down commands

host down and client down stop foreground processes and remove/stop systemd services if present.

Configuration files

The CLI writes service-specific env files under ~/.local/share/vllm_cluster_manager:

host/.env (Docker compose: Postgres + discovery service)
host/backend/.env (API service)
host/frontend/.env (UI)
client/.env (client agent)

If you edit any env file, restart the affected service.

Gated models (Hugging Face)

Some models (for example Llama variants) require a Hugging Face access token. Provide the token via an env var when creating the deployment:

HF_TOKEN
HUGGING_FACE_HUB_TOKEN

Set the value to your Hugging Face access token (read access) and include quotation marks, for example:

HUGGING_FACE_HUB_TOKEN="hf_..."

You can add this in the UI under env vars or by setting it in the client environment before starting a deployment.

Firewall rules

Allow these network paths (adjust ports to your flags):

User → Host UI: TCP host-frontend-port (default 5173).
UI/Browser → Host API: TCP host-backend-port (default 8000).
Clients → Host discovery port: TCP host-discover-port (default 47528).
Host → Client agents: TCP client-port (default 9000).

Data persistence

By default, shutting down the host (host down or stopping the systemd infra unit) runs docker compose down -v, which wipes the Postgres volume. Remove -v in code if you want to keep data.

Quick start (dev)

Start infrastructure:

cd host
cp .env.example .env
# edit .env for passwords

docker compose up -d

Backend (venv recommended):

cd host/backend
python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --reload

Frontend:

cd host/frontend
npm install
npm run dev

Open the UI at http://localhost:5173 by default (see host/frontend/.env).

Notes

The service registry is Consul (used for client discovery).
WebSocket log streaming is handled in host/frontend/src/services/ws.ts.

Project details

Release history Release notifications | RSS feed

0.2.3

Mar 10, 2026

0.2.2

Mar 10, 2026

0.2.1

Mar 9, 2026

0.2.0

Mar 9, 2026

0.1.7

Feb 24, 2026

This version

0.1.6

Feb 17, 2026

0.1.5

Feb 17, 2026

0.1.4

Feb 13, 2026

0.1.3

Feb 13, 2026

0.1.2

Feb 12, 2026

0.1.1

Feb 12, 2026

0.1.0

Feb 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_cluster_manager-0.1.6.tar.gz (64.9 kB view details)

Uploaded Feb 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vllm_cluster_manager-0.1.6-py3-none-any.whl (79.4 kB view details)

Uploaded Feb 17, 2026 Python 3

File details

Details for the file vllm_cluster_manager-0.1.6.tar.gz.

File metadata

Download URL: vllm_cluster_manager-0.1.6.tar.gz
Upload date: Feb 17, 2026
Size: 64.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vllm_cluster_manager-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`e21410074e98cf7b9aa29f308d9431dd17e43802192803f7a83aa86f18c15260`
MD5	`6d9aa3eb864e0c70e65fb49c0b4c945a`
BLAKE2b-256	`9eafb69a1d426a915210abf30fe8aaedf959094742c5b2f800485f94f1d9b5eb`

See more details on using hashes here.

File details

Details for the file vllm_cluster_manager-0.1.6-py3-none-any.whl.

File metadata

Download URL: vllm_cluster_manager-0.1.6-py3-none-any.whl
Upload date: Feb 17, 2026
Size: 79.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vllm_cluster_manager-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`84aae83fb863c87bca0fd45b9417b57eb57148f6fd0c6b13b1122359da6193cc`
MD5	`cd3bce173ba2fe48bc528aac8d5ad4de`
BLAKE2b-256	`934767d4e04499ec54d4ba6ffa903c8dcce20a636f467c8b243c237de07e1cc2`

See more details on using hashes here.

vllm-cluster-manager 0.1.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

vLLM Cluster Manager

Tested hardware/software

What it can do

Real-time logs

Model configuration

Architecture

Repo layout

Prerequisites

Install (pip)

Start the host

Start a client

CLI flags

Configuration files

Gated models (Hugging Face)

Firewall rules

Data persistence

Quick start (dev)

Notes

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes