Single control plane for multi-node vLLM inference — deploy, serve, and manage LLMs across a GPU cluster without Kubernetes.
Project description
Aquila
Single control plane for multi-node vLLM inference. Point-and-click deployments, an OpenAI-compatible gateway, warm caching, live GPU monitoring, and a full deployment lifecycle — without Kubernetes or a managed platform.
Quick start
uv venv && source .venv/bin/activate
uv pip install aquila
Host (management server):
aquila host up --host-ip 0.0.0.0 --host-frontend-port 5173 --host-discover-port 11400
Client (each GPU node):
aquila client up --host-ip <host-ip> --host-discover-port 11400
Open http://<host-ip>:5173 — client nodes appear within seconds. Add --service for persistent systemd services.
Features
- Deploy and manage models across GPU nodes via Docker or rootless Podman — each runs in the official
vllm/vllm-openaicontainer with a specific version, nightly build, or commit hash. - OpenAI-compatible gateway (
/v1) with stable URLs across node moves, API key auth with per-deployment scoping, and auto-expiring snippet keys. - Warm cache — pause idle models to RAM and resume on demand; LRU auto-eviction frees GPU VRAM while keeping weights ready for near-instant restart.
- Local checkpoints and LoRA adapters — upload from the browser (streamed) or pull from a URL directly onto a node.
- Live monitoring — GPU utilization, disk usage, deployment status, per-deployment usage metrics, and 48-hour metric history charts.
- Usage tracking — lifetime tokens, request counts, and average prefill/generation speeds from vLLM's own metrics.
- Reproducibility manifests — export model, HF revision, seed, vLLM version, image digest, and full config per deployment.
- Notifications — Slack/webhook alerts when deployments become ready, fail, or are about to expire.
- Per-GPU maintenance — cordon individual GPUs while the rest of the node keeps serving; optionally drain affected deployments.
- Extra packages and plugins — install pip packages and upload vLLM plugins per deployment via cached derived images.
- Reverse proxy support — deploy behind nginx at any sub-path with
--base-path.
Best for
- Research labs and university clusters
- Teams sharing GPUs across projects
- Self-hosted multi-model inference
Supported platforms
- Python 3.10–3.14, Node.js ≥ 23 (host only)
- Ubuntu 22.04 and 24.04
- NVIDIA GPUs (H100, A100, L40, RTX 4090, DGX Spark)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aquila-0.3.4.tar.gz.
File metadata
- Download URL: aquila-0.3.4.tar.gz
- Upload date:
- Size: 1.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49a1c3b2eee3b6012966484802fbab7ab0e220fb510eabb80d321e0fffe19b87
|
|
| MD5 |
279132309edf39c611da0c4ba42e7526
|
|
| BLAKE2b-256 |
fbbe9a20b292336e51c0d4bcb499266e800fcf693a46598f9e11f117d2b77dd4
|
Provenance
The following attestation bundles were made for aquila-0.3.4.tar.gz:
Publisher:
publish.yml on sisl/aquila
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
aquila-0.3.4.tar.gz -
Subject digest:
49a1c3b2eee3b6012966484802fbab7ab0e220fb510eabb80d321e0fffe19b87 - Sigstore transparency entry: 1932864795
- Sigstore integration time:
-
Permalink:
sisl/aquila@f0c14b1a322898e338b6408e9512c528c88b6358 -
Branch / Tag:
refs/tags/v0.3.4 - Owner: https://github.com/sisl
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f0c14b1a322898e338b6408e9512c528c88b6358 -
Trigger Event:
push
-
Statement type:
File details
Details for the file aquila-0.3.4-py3-none-any.whl.
File metadata
- Download URL: aquila-0.3.4-py3-none-any.whl
- Upload date:
- Size: 262.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0980001c6c3692be41953cd01c214778f2619d130c5f043d22d941c42e643af
|
|
| MD5 |
9a157d01427aa62b3099c5ad058a037c
|
|
| BLAKE2b-256 |
6467edd4b0a9835ba57beb2a2834e461bd8b8b59a053b1debbd2e6e3b18dda2a
|
Provenance
The following attestation bundles were made for aquila-0.3.4-py3-none-any.whl:
Publisher:
publish.yml on sisl/aquila
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
aquila-0.3.4-py3-none-any.whl -
Subject digest:
f0980001c6c3692be41953cd01c214778f2619d130c5f043d22d941c42e643af - Sigstore transparency entry: 1932865234
- Sigstore integration time:
-
Permalink:
sisl/aquila@f0c14b1a322898e338b6408e9512c528c88b6358 -
Branch / Tag:
refs/tags/v0.3.4 - Owner: https://github.com/sisl
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f0c14b1a322898e338b6408e9512c528c88b6358 -
Trigger Event:
push
-
Statement type: