Single control plane for multi-node vLLM inference — deploy, serve, and manage LLMs across a GPU cluster without Kubernetes.

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

MarcSchlichting

These details have not been verified by PyPI

Project description

Aquila

Single control plane for multi-node vLLM inference. Point-and-click deployments, an OpenAI-compatible gateway, warm caching, live GPU monitoring, and a full deployment lifecycle — without Kubernetes or a managed platform.

Quick start

uv venv && source .venv/bin/activate
uv pip install aquila

Host (management server):

aquila host up --host-ip 0.0.0.0 --host-frontend-port 5173 --host-discover-port 11400

Client (each GPU node):

aquila client up --host-ip <host-ip> --host-discover-port 11400

Open http://<host-ip>:5173 — client nodes appear within seconds. Add --service for persistent systemd services.

Features

Deploy and manage models across GPU nodes via Docker or rootless Podman — each runs in the official vllm/vllm-openai container with a specific version, nightly build, or commit hash.
OpenAI-compatible gateway (/v1) with stable URLs across node moves, API key auth with per-deployment scoping, and auto-expiring snippet keys.
Warm cache — pause idle models to RAM and resume on demand; LRU auto-eviction frees GPU VRAM while keeping weights ready for near-instant restart.
Local checkpoints and LoRA adapters — upload from the browser (streamed) or pull from a URL directly onto a node.
Live monitoring — GPU utilization, disk usage, deployment status, per-deployment usage metrics, and 48-hour metric history charts.
Usage tracking — lifetime tokens, request counts, and average prefill/generation speeds from vLLM's own metrics.
Reproducibility manifests — export model, HF revision, seed, vLLM version, image digest, and full config per deployment.
Notifications — Slack/webhook alerts when deployments become ready, fail, or are about to expire.
Per-GPU maintenance — cordon individual GPUs while the rest of the node keeps serving; optionally drain affected deployments.
Extra packages and plugins — install pip packages and upload vLLM plugins per deployment via cached derived images.
Reverse proxy support — deploy behind nginx at any sub-path with --base-path.

Best for

Research labs and university clusters
Teams sharing GPUs across projects
Self-hosted multi-model inference

Supported platforms

Python 3.10–3.14, Node.js ≥ 23 (host only)
Ubuntu 22.04 and 24.04
NVIDIA GPUs (H100, A100, L40, RTX 4090, DGX Spark)

Full documentation

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

MarcSchlichting

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.5

Jun 24, 2026

0.3.4

Jun 23, 2026

0.3.3

Jun 23, 2026

This version

0.3.2

Jun 23, 2026

0.3.1

Jun 23, 2026

0.3.0

Jun 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aquila-0.3.2.tar.gz (1.5 MB view details)

Uploaded Jun 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aquila-0.3.2-py3-none-any.whl (260.0 kB view details)

Uploaded Jun 23, 2026 Python 3

File details

Details for the file aquila-0.3.2.tar.gz.

File metadata

Download URL: aquila-0.3.2.tar.gz
Upload date: Jun 23, 2026
Size: 1.5 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aquila-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`4103deeb57dc1528f2756794f6a49b3ee8adbc02c080b8ce2ed2689862444c45`
MD5	`9c8ecb81ab55650b2f17e0cd838318cc`
BLAKE2b-256	`50da9dbe1398f10b804ff2637695ca58a59af2542aa6162655d60310b45714c9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for aquila-0.3.2.tar.gz:

Publisher: publish.yml on sisl/aquila

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: aquila-0.3.2.tar.gz
- Subject digest: 4103deeb57dc1528f2756794f6a49b3ee8adbc02c080b8ce2ed2689862444c45
- Sigstore transparency entry: 1931911896
- Sigstore integration time: Jun 23, 2026
Source repository:
- Permalink: sisl/aquila@7283e5a93a717d447eea1f56ca9d16b5b523b1b9
- Branch / Tag: refs/tags/v0.3.2
- Owner: https://github.com/sisl
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@7283e5a93a717d447eea1f56ca9d16b5b523b1b9
- Trigger Event: push

File details

Details for the file aquila-0.3.2-py3-none-any.whl.

File metadata

Download URL: aquila-0.3.2-py3-none-any.whl
Upload date: Jun 23, 2026
Size: 260.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aquila-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0a3a88565a36394891539401fb0535da0add76f37f19830e43b77e46cc48494c`
MD5	`61a2995f2dd132f9c54fc8bf64776f9a`
BLAKE2b-256	`eb609169917fc30e63772790a9b6a81444775ad185990a1ea8d46efd864cafd4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for aquila-0.3.2-py3-none-any.whl:

Publisher: publish.yml on sisl/aquila

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: aquila-0.3.2-py3-none-any.whl
- Subject digest: 0a3a88565a36394891539401fb0535da0add76f37f19830e43b77e46cc48494c
- Sigstore transparency entry: 1931912084
- Sigstore integration time: Jun 23, 2026
Source repository:
- Permalink: sisl/aquila@7283e5a93a717d447eea1f56ca9d16b5b523b1b9
- Branch / Tag: refs/tags/v0.3.2
- Owner: https://github.com/sisl
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@7283e5a93a717d447eea1f56ca9d16b5b523b1b9
- Trigger Event: push

aquila 0.3.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Aquila

Quick start

Features

Best for

Supported platforms

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance