A CLI tool for monitoring and managing VLLM inference servers in Docker containers

Project description

VLLMoni

VLLMoni is a command-line tool for monitoring and managing vllm inference servers running in Docker containers. It provides real-time GPU usage tracking, model management, and container lifecycle control.

Coverage

Features

Model Management: Easily run and stop VLLM models in Docker containers
GPU Monitoring: Track GPU memory usage and utilization in real-time
VLLM Health Monitoring: Monitor VLLM server health status and statistics
Container Control: Start, stop, and monitor Docker containers running VLLM servers
Database Integration: Persistent storage of model and container information using SQLite
CLI Interface: Simple command-line interface with rich table outputs
Configuration: Flexible configuration using Hydra for different models and settings

Installation

Prerequisites

Python 3.12
Docker
NVIDIA GPU (optional, for GPU acceleration)
NVIDIA Container Toolkit (for GPU support)

Quick Install (Recommended)

Install VLLMoni system-wide using curl:

curl -sSL https://raw.githubusercontent.com/uhh-hcds/vllmonitor/main/install.sh | bash

This will:

Install vllmoni via pip
Create a user configuration directory at ~/.vllmoni/
Set up custom model configuration support
Add vllmoni to your PATH

After installation, reload your shell:

source ~/.bashrc  # or ~/.zshrc for zsh users

Custom Model Configurations

After installation, you can add custom model configurations in ~/.vllmoni/conf/model/:

Create a new YAML file, e.g., ~/.vllmoni/conf/model/my_model.yaml:

model_name: "your-org/your-model-name"
model_name_short: "my-model"
gpu_memory_utilization: 0.5
temperature: 0
max_tokens: 1000
max_model_len: 40000
tensor_parallel_size: 1

Use it with:
```
vllmoni run model=my_model
```

You can override the config path with the VLLMONI_CONFIG_PATH environment variable:

export VLLMONI_CONFIG_PATH=/path/to/your/config

Uninstall

To uninstall VLLMoni:

curl -sSL https://raw.githubusercontent.com/uhh-hcds/vllmonitor/main/uninstall.sh | bash

Install from Source

git clone <repository-url>
cd vllmonitor
pip install -e .

Quick Start

Initialize the database:
```
vllmoni init
```

Run a model:

vllmoni run model=a100_80gb_pcie/llama_3_1

List running models:
```
vllmoni ls
```
Monitor with live updates:
```
vllmoni ls --interval 2
```

Usage

Commands

vllmoni init [--override]: Initialize the database
vllmoni ls [--full] [--interval SECONDS]: List all registered models with health status and request statistics
vllmoni run [OVERRIDES] [--follow]: Run a VLLM model in a Docker container
vllmoni stop <ID>: Stop a specific model container
vllmoni stop-all: Stop all running model containers
vllmoni logs <ID>: View logs for a specific model

The ls command displays:

Model information and URLs
GPU memory usage and utilization
VLLM server health status
Request statistics (running/waiting requests)
Docker container details

Configuration

Models are configured using YAML files in the conf/model/ directory. The configs are organized by GPU type:

conf/model/a100_80gb_pcie/ - Configurations for A100 80GB PCIe GPUs
conf/model/h100_nvl/ - Configurations for H100 NVL GPUs
conf/model/a40/ - Configurations for A40 GPUs
conf/model/rtx_a6000/ - Configurations for RTX A6000 GPUs

Each model config includes:

Model name and HuggingFace path
GPU memory utilization
Generation parameters (temperature, max_tokens, etc.)
Quantization settings

Example model config (conf/model/a100_80gb_pcie/llama_3_1.yaml):

model_name: "meta-llama/Meta-Llama-3.1-8B-Instruct"
model_name_short: "llama3.1-8B-Instruct"
gpu_memory_utilization: 0.5
temperature: 0
max_tokens: 1000
max_model_len: 40000
tensor_parallel_size: 1

Override settings at runtime:

vllmoni run model=a100_80gb_pcie/llama_3_1 port=8006 devices=0,1
vllmoni run model=h100_nvl/llama_3_1 port=8007 devices=0
vllmoni run model=a40/gemma9b port=8008 devices=1

Architecture

CLI: Typer-based command-line interface
Database: SQLAlchemy with SQLite backend
Configuration: Hydra for flexible config management
Container Management: Docker Python SDK
Monitoring:
- Real-time GPU stats via nvidia-smi
- VLLM health and statistics via HTTP API
Logging: Structured logging with Rich console output

Development

Project Structure

src/
├── cli/           # Command-line interface
├── app/           # Database models and repository
├── container/     # Docker container management
├── vllm_client/   # VLLM API client for health and metrics
├── utils/         # Utilities (logging, settings)
└── tests/         # Unit tests

conf/
├── defaults.yaml  # Default configuration
└── model/         # Model-specific configs

Running Tests

The project includes a comprehensive test suite covering all CLI commands, database operations, and container management.

Install Test Dependencies

pip install -e ".[test]"

Run All Tests

pytest tests/

Run Tests with Coverage

pytest tests/ --cov=src --cov-report=term-missing --cov-report=html

Run Specific Test Files

# Test CLI commands
pytest tests/test_cli.py -v

# Test database operations
pytest tests/test_db.py -v

# Test repository operations
pytest tests/test_repository.py -v

# Test container commands
pytest tests/test_container_cmd.py -v

Test Coverage

The test suite includes:

CLI Command Tests: All commands (init, ls, run, stop, stop-all, logs)
Database Tests: Database initialization and session management
Repository Tests: CRUD operations on model entries
Container Tests: Docker command generation and container lifecycle
Model Tests: Data model creation and validation

Current coverage: ~77%

Continuous Integration

Tests are automatically run on every push and pull request via GitHub Actions. The workflow:

Runs tests on Python 3.12 and 3.13
Generates coverage reports
Uploads coverage to Codecov (if configured)

See .github/workflows/tests.yml for the full CI configuration.

Building

python -m build

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

VLLM - Efficient LLM inference
Hydra - Configuration management
Typer - CLI framework
Rich - Terminal formatting

Publishing to PyPI

To publish a new version of VLLMoni to PyPI, follow these steps:

Build the package with uv:
```
uv build
```

Publish the package using your UV_PUBLISH_TOKEN:

export $(grep -v '^#' .env | xargs) | uv publish --token $UV_PUBLISH_TOKEN

Project details

Release history Release notifications | RSS feed

This version

0.1.6.2

Jan 20, 2026

0.1.5.3

Jan 20, 2026

0.1.5.2

Jan 20, 2026

0.1.4

Jan 20, 2026

0.1.0

Jan 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllmoni-0.1.6.2.tar.gz (70.5 kB view details)

Uploaded Jan 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vllmoni-0.1.6.2-py3-none-any.whl (50.2 kB view details)

Uploaded Jan 20, 2026 Python 3

File details

Details for the file vllmoni-0.1.6.2.tar.gz.

File metadata

Download URL: vllmoni-0.1.6.2.tar.gz
Upload date: Jan 20, 2026
Size: 70.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vllmoni-0.1.6.2.tar.gz
Algorithm	Hash digest
SHA256	`cc7d16aa754be4f500e84d0173c12cb4d018070e1536134885b9c77178c9b30e`
MD5	`f44c482589c6b44f6b7a4c1ca96912f0`
BLAKE2b-256	`0b3d4738790b49771d0ef0707760882601ab8586d1ad00e4cf8bcaea5470817f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vllmoni-0.1.6.2.tar.gz:

Publisher: publish.yml on uhh-hcds/vllmonitor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vllmoni-0.1.6.2.tar.gz
- Subject digest: cc7d16aa754be4f500e84d0173c12cb4d018070e1536134885b9c77178c9b30e
- Sigstore transparency entry: 836742705
- Sigstore integration time: Jan 20, 2026
Source repository:
- Permalink: uhh-hcds/vllmonitor@49a74969a10afd5602ee8c4c703db1d30e6f271c
- Branch / Tag: refs/tags/0.1.6.2
- Owner: https://github.com/uhh-hcds
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@49a74969a10afd5602ee8c4c703db1d30e6f271c
- Trigger Event: release

File details

Details for the file vllmoni-0.1.6.2-py3-none-any.whl.

File metadata

Download URL: vllmoni-0.1.6.2-py3-none-any.whl
Upload date: Jan 20, 2026
Size: 50.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vllmoni-0.1.6.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c061581ff59867d456a2a0a9b587da4b81c022538245284b3d9353e6fb3f2b25`
MD5	`74b7f0521b659e438c46134e68b93ea5`
BLAKE2b-256	`8d74baf09dc68beaab19467886558b21fb3a162ea2a32fba025ba7b516fa24b9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vllmoni-0.1.6.2-py3-none-any.whl:

Publisher: publish.yml on uhh-hcds/vllmonitor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vllmoni-0.1.6.2-py3-none-any.whl
- Subject digest: c061581ff59867d456a2a0a9b587da4b81c022538245284b3d9353e6fb3f2b25
- Sigstore transparency entry: 836742756
- Sigstore integration time: Jan 20, 2026
Source repository:
- Permalink: uhh-hcds/vllmonitor@49a74969a10afd5602ee8c4c703db1d30e6f271c
- Branch / Tag: refs/tags/0.1.6.2
- Owner: https://github.com/uhh-hcds
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@49a74969a10afd5602ee8c4c703db1d30e6f271c
- Trigger Event: release

vllmoni 0.1.6.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

VLLMoni

Features

Installation

Prerequisites

Quick Install (Recommended)

Custom Model Configurations

Uninstall

Install from Source

Quick Start

Usage

Commands

Configuration

Architecture

Development

Project Structure

Running Tests

Install Test Dependencies

Run All Tests

Run Tests with Coverage

Run Specific Test Files

Test Coverage

Continuous Integration

Building

Contributing

License

Acknowledgments

Publishing to PyPI

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance