Skip to main content

A CLI tool for monitoring and managing VLLM inference servers in Docker containers

Project description

VLLMoni

VLLMoni is a command-line tool for monitoring and managing vllm inference servers running in Docker containers. It provides real-time GPU usage tracking, model management, and container lifecycle control.

Coverage

Features

  • Model Management: Easily run and stop VLLM models in Docker containers
  • GPU Monitoring: Track GPU memory usage and utilization in real-time
  • Container Control: Start, stop, and monitor Docker containers running VLLM servers
  • Database Integration: Persistent storage of model and container information using SQLite
  • CLI Interface: Simple command-line interface with rich table outputs
  • Configuration: Flexible configuration using Hydra for different models and settings

Installation

Prerequisites

  • Python 3.12
  • Docker
  • NVIDIA GPU (optional, for GPU acceleration)
  • NVIDIA Container Toolkit (for GPU support)

Quick Install (Recommended)

Install VLLMoni system-wide using curl:

curl -sSL https://raw.githubusercontent.com/uhh-hcds/vllmonitor/main/install.sh | bash

This will:

  • Install vllmoni via pip
  • Create a user configuration directory at ~/.vllmoni/
  • Set up custom model configuration support
  • Add vllmoni to your PATH

After installation, reload your shell:

source ~/.bashrc  # or ~/.zshrc for zsh users

Custom Model Configurations

After installation, you can add custom model configurations in ~/.vllmoni/conf/model/:

  1. Create a new YAML file, e.g., ~/.vllmoni/conf/model/my_model.yaml:

    model_name: "your-org/your-model-name"
    model_name_short: "my-model"
    gpu_memory_utilization: 0.5
    temperature: 0
    max_tokens: 1000
    max_model_len: 40000
    tensor_parallel_size: 1
    
  2. Use it with:

    vllmoni run model=my_model
    

You can override the config path with the VLLMONI_CONFIG_PATH environment variable:

export VLLMONI_CONFIG_PATH=/path/to/your/config

Uninstall

To uninstall VLLMoni:

curl -sSL https://raw.githubusercontent.com/uhh-hcds/vllmonitor/main/uninstall.sh | bash

Install from Source

git clone <repository-url>
cd vllmonitor
pip install -e .

Quick Start

  1. Initialize the database:

    vllmoni init
    
  2. Run a model:

    vllmoni run model=a100_80gb_pcie/llama_3_1
    
  3. List running models:

    vllmoni ls
    
  4. Monitor with live updates:

    vllmoni ls --interval 2
    

Usage

Commands

  • vllmoni init [--override]: Initialize the database
  • vllmoni ls [--full] [--interval SECONDS]: List all registered models
  • vllmoni run [OVERRIDES] [--follow]: Run a VLLM model in a Docker container
  • vllmoni stop <ID>: Stop a specific model container
  • vllmoni stop-all: Stop all running model containers
  • vllmoni logs <ID>: View logs for a specific model

Configuration

Models are configured using YAML files in the conf/model/ directory. The configs are organized by GPU type:

  • conf/model/a100_80gb_pcie/ - Configurations for A100 80GB PCIe GPUs
  • conf/model/h100_nvl/ - Configurations for H100 NVL GPUs
  • conf/model/a40/ - Configurations for A40 GPUs
  • conf/model/rtx_a6000/ - Configurations for RTX A6000 GPUs

Each model config includes:

  • Model name and HuggingFace path
  • GPU memory utilization
  • Generation parameters (temperature, max_tokens, etc.)
  • Quantization settings

Example model config (conf/model/a100_80gb_pcie/llama_3_1.yaml):

model_name: "meta-llama/Meta-Llama-3.1-8B-Instruct"
model_name_short: "llama3.1-8B-Instruct"
gpu_memory_utilization: 0.5
temperature: 0
max_tokens: 1000
max_model_len: 40000
tensor_parallel_size: 1

Override settings at runtime:

vllmoni run model=a100_80gb_pcie/llama_3_1 port=8006 devices=0,1
vllmoni run model=h100_nvl/llama_3_1 port=8007 devices=0
vllmoni run model=a40/gemma9b port=8008 devices=1

Architecture

  • CLI: Typer-based command-line interface
  • Database: SQLAlchemy with SQLite backend
  • Configuration: Hydra for flexible config management
  • Container Management: Docker Python SDK
  • Monitoring: Real-time GPU stats via nvidia-smi
  • Logging: Structured logging with Rich console output

Development

Project Structure

src/
├── cli/           # Command-line interface
├── app/           # Database models and repository
├── container/     # Docker container management
├── utils/         # Utilities (logging, settings)
└── tests/         # Unit tests

conf/
├── defaults.yaml  # Default configuration
└── model/         # Model-specific configs

Running Tests

The project includes a comprehensive test suite covering all CLI commands, database operations, and container management.

Install Test Dependencies

pip install -e ".[test]"

Run All Tests

pytest tests/

Run Tests with Coverage

pytest tests/ --cov=src --cov-report=term-missing --cov-report=html

Run Specific Test Files

# Test CLI commands
pytest tests/test_cli.py -v

# Test database operations
pytest tests/test_db.py -v

# Test repository operations
pytest tests/test_repository.py -v

# Test container commands
pytest tests/test_container_cmd.py -v

Test Coverage

The test suite includes:

  • CLI Command Tests: All commands (init, ls, run, stop, stop-all, logs)
  • Database Tests: Database initialization and session management
  • Repository Tests: CRUD operations on model entries
  • Container Tests: Docker command generation and container lifecycle
  • Model Tests: Data model creation and validation

Current coverage: ~77%

Continuous Integration

Tests are automatically run on every push and pull request via GitHub Actions. The workflow:

  • Runs tests on Python 3.12 and 3.13
  • Generates coverage reports
  • Uploads coverage to Codecov (if configured)

See .github/workflows/tests.yml for the full CI configuration.

Building

python -m build

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • VLLM - Efficient LLM inference
  • Hydra - Configuration management
  • Typer - CLI framework
  • Rich - Terminal formatting

Publishing to PyPI

To publish a new version of VLLMoni to PyPI, follow these steps:

  1. Build the package with uv:
    uv build
    
  2. Publish the package using your UV_PUBLISH_TOKEN:
    export $(grep -v '^#' .env | xargs) | uv publish --token $UV_PUBLISH_TOKEN  
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllmoni-0.1.5.2.tar.gz (67.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vllmoni-0.1.5.2-py3-none-any.whl (29.7 kB view details)

Uploaded Python 3

File details

Details for the file vllmoni-0.1.5.2.tar.gz.

File metadata

  • Download URL: vllmoni-0.1.5.2.tar.gz
  • Upload date:
  • Size: 67.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vllmoni-0.1.5.2.tar.gz
Algorithm Hash digest
SHA256 21e6f20bc1e05fe6299bda97add3ac73a446375fa94f660137a6b0f981f74a0d
MD5 ff0af474c36f195ea6686b22084277cb
BLAKE2b-256 b0ea5bedec2c45035154d0336c1cf6115fe0689374616a19777f1adaf169dfcc

See more details on using hashes here.

Provenance

The following attestation bundles were made for vllmoni-0.1.5.2.tar.gz:

Publisher: publish.yml on uhh-hcds/vllmonitor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vllmoni-0.1.5.2-py3-none-any.whl.

File metadata

  • Download URL: vllmoni-0.1.5.2-py3-none-any.whl
  • Upload date:
  • Size: 29.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vllmoni-0.1.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2b9f88484bd508572b36a1c4965d0b338513620278af194673d2818960109122
MD5 e52903e19e9768ed427338797218239f
BLAKE2b-256 c950077400127dcf1bddced7dbbaca6b4efe1d54bac680914db491adccb83112

See more details on using hashes here.

Provenance

The following attestation bundles were made for vllmoni-0.1.5.2-py3-none-any.whl:

Publisher: publish.yml on uhh-hcds/vllmonitor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page