A CLI tool for monitoring and managing VLLM inference servers in Docker containers
Project description
VLLMoni
VLLMoni is a command-line tool for monitoring and managing vllm inference servers running in Docker containers. It provides real-time GPU usage tracking, model management, and container lifecycle control.
Features
- Model Management: Easily run and stop VLLM models in Docker containers
- GPU Monitoring: Track GPU memory usage and utilization in real-time
- Container Control: Start, stop, and monitor Docker containers running VLLM servers
- Database Integration: Persistent storage of model and container information using SQLite
- CLI Interface: Simple command-line interface with rich table outputs
- Configuration: Flexible configuration using Hydra for different models and settings
Installation
Prerequisites
- Python 3.12
- Docker
- NVIDIA GPU (optional, for GPU acceleration)
- NVIDIA Container Toolkit (for GPU support)
Quick Install (Recommended)
Install VLLMoni system-wide using curl:
curl -sSL https://raw.githubusercontent.com/uhh-hcds/vllmonitor/main/install.sh | bash
This will:
- Install vllmoni via pip
- Create a user configuration directory at
~/.vllmoni/ - Set up custom model configuration support
- Add vllmoni to your PATH
After installation, reload your shell:
source ~/.bashrc # or ~/.zshrc for zsh users
Custom Model Configurations
After installation, you can add custom model configurations in ~/.vllmoni/conf/model/:
-
Create a new YAML file, e.g.,
~/.vllmoni/conf/model/my_model.yaml:model_name: "your-org/your-model-name" model_name_short: "my-model" gpu_memory_utilization: 0.5 temperature: 0 max_tokens: 1000 max_model_len: 40000 tensor_parallel_size: 1
-
Use it with:
vllmoni run model=my_model
You can override the config path with the VLLMONI_CONFIG_PATH environment variable:
export VLLMONI_CONFIG_PATH=/path/to/your/config
Uninstall
To uninstall VLLMoni:
curl -sSL https://raw.githubusercontent.com/uhh-hcds/vllmonitor/main/uninstall.sh | bash
Install from Source
git clone <repository-url>
cd vllmonitor
pip install -e .
Quick Start
-
Initialize the database:
vllmoni init -
Run a model:
vllmoni run model=a100_80gb_pcie/llama_3_1
-
List running models:
vllmoni ls -
Monitor with live updates:
vllmoni ls --interval 2
Usage
Commands
vllmoni init [--override]: Initialize the databasevllmoni ls [--full] [--interval SECONDS]: List all registered modelsvllmoni run [OVERRIDES] [--follow]: Run a VLLM model in a Docker containervllmoni stop <ID>: Stop a specific model containervllmoni stop-all: Stop all running model containersvllmoni logs <ID>: View logs for a specific model
Configuration
Models are configured using YAML files in the conf/model/ directory. The configs are organized by GPU type:
conf/model/a100_80gb_pcie/- Configurations for A100 80GB PCIe GPUsconf/model/h100_nvl/- Configurations for H100 NVL GPUsconf/model/a40/- Configurations for A40 GPUsconf/model/rtx_a6000/- Configurations for RTX A6000 GPUs
Each model config includes:
- Model name and HuggingFace path
- GPU memory utilization
- Generation parameters (temperature, max_tokens, etc.)
- Quantization settings
Example model config (conf/model/a100_80gb_pcie/llama_3_1.yaml):
model_name: "meta-llama/Meta-Llama-3.1-8B-Instruct"
model_name_short: "llama3.1-8B-Instruct"
gpu_memory_utilization: 0.5
temperature: 0
max_tokens: 1000
max_model_len: 40000
tensor_parallel_size: 1
Override settings at runtime:
vllmoni run model=a100_80gb_pcie/llama_3_1 port=8006 devices=0,1
vllmoni run model=h100_nvl/llama_3_1 port=8007 devices=0
vllmoni run model=a40/gemma9b port=8008 devices=1
Architecture
- CLI: Typer-based command-line interface
- Database: SQLAlchemy with SQLite backend
- Configuration: Hydra for flexible config management
- Container Management: Docker Python SDK
- Monitoring: Real-time GPU stats via nvidia-smi
- Logging: Structured logging with Rich console output
Development
Project Structure
src/
├── cli/ # Command-line interface
├── app/ # Database models and repository
├── container/ # Docker container management
├── utils/ # Utilities (logging, settings)
└── tests/ # Unit tests
conf/
├── defaults.yaml # Default configuration
└── model/ # Model-specific configs
Running Tests
The project includes a comprehensive test suite covering all CLI commands, database operations, and container management.
Install Test Dependencies
pip install -e ".[test]"
Run All Tests
pytest tests/
Run Tests with Coverage
pytest tests/ --cov=src --cov-report=term-missing --cov-report=html
Run Specific Test Files
# Test CLI commands
pytest tests/test_cli.py -v
# Test database operations
pytest tests/test_db.py -v
# Test repository operations
pytest tests/test_repository.py -v
# Test container commands
pytest tests/test_container_cmd.py -v
Test Coverage
The test suite includes:
- CLI Command Tests: All commands (init, ls, run, stop, stop-all, logs)
- Database Tests: Database initialization and session management
- Repository Tests: CRUD operations on model entries
- Container Tests: Docker command generation and container lifecycle
- Model Tests: Data model creation and validation
Current coverage: ~77%
Continuous Integration
Tests are automatically run on every push and pull request via GitHub Actions. The workflow:
- Runs tests on Python 3.12 and 3.13
- Generates coverage reports
- Uploads coverage to Codecov (if configured)
See .github/workflows/tests.yml for the full CI configuration.
Building
python -m build
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- VLLM - Efficient LLM inference
- Hydra - Configuration management
- Typer - CLI framework
- Rich - Terminal formatting
Publishing to PyPI
To publish a new version of VLLMoni to PyPI, follow these steps:
- Build the package with uv:
uv build - Publish the package using your UV_PUBLISH_TOKEN:
export $(grep -v '^#' .env | xargs) | uv publish --token $UV_PUBLISH_TOKEN
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vllmoni-0.1.4.tar.gz.
File metadata
- Download URL: vllmoni-0.1.4.tar.gz
- Upload date:
- Size: 67.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a07b76fcd7576b4a952fa8e4238d2edc735a19cec5fdb6e5ed4d16a5b800708f
|
|
| MD5 |
81f7823a2b82fa7630a5ff011fca2d73
|
|
| BLAKE2b-256 |
d35532f5e3ef1eb5ebc33816eba976fd41c084d7f39d32d95924670300ab1657
|
Provenance
The following attestation bundles were made for vllmoni-0.1.4.tar.gz:
Publisher:
publish.yml on uhh-hcds/vllmonitor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vllmoni-0.1.4.tar.gz -
Subject digest:
a07b76fcd7576b4a952fa8e4238d2edc735a19cec5fdb6e5ed4d16a5b800708f - Sigstore transparency entry: 836401042
- Sigstore integration time:
-
Permalink:
uhh-hcds/vllmonitor@2c17e6873783cfd7df0cd995fc4b0018442493cb -
Branch / Tag:
refs/tags/0.1.4 - Owner: https://github.com/uhh-hcds
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2c17e6873783cfd7df0cd995fc4b0018442493cb -
Trigger Event:
release
-
Statement type:
File details
Details for the file vllmoni-0.1.4-py3-none-any.whl.
File metadata
- Download URL: vllmoni-0.1.4-py3-none-any.whl
- Upload date:
- Size: 17.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce5bbffa3f48dda2e15d5746bff5d867bc590eefcafd16cfe5af0247f1670b48
|
|
| MD5 |
3ef3b03faa1742b5ae3aeee550f3655e
|
|
| BLAKE2b-256 |
96c69204a5736818e2f02760c2310cb01cdb904ac0200e4b4019ea54fb3d0662
|
Provenance
The following attestation bundles were made for vllmoni-0.1.4-py3-none-any.whl:
Publisher:
publish.yml on uhh-hcds/vllmonitor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vllmoni-0.1.4-py3-none-any.whl -
Subject digest:
ce5bbffa3f48dda2e15d5746bff5d867bc590eefcafd16cfe5af0247f1670b48 - Sigstore transparency entry: 836401043
- Sigstore integration time:
-
Permalink:
uhh-hcds/vllmonitor@2c17e6873783cfd7df0cd995fc4b0018442493cb -
Branch / Tag:
refs/tags/0.1.4 - Owner: https://github.com/uhh-hcds
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2c17e6873783cfd7df0cd995fc4b0018442493cb -
Trigger Event:
release
-
Statement type: