Skip to main content

A CLI tool for monitoring and managing VLLM inference servers in Docker containers

Project description

VLLMoni

VLLMoni is a command-line tool for monitoring and managing vllm inference servers running in Docker containers. It provides real-time GPU usage tracking, model management, and container lifecycle control.

Features

  • Model Management: Easily run and stop VLLM models in Docker containers
  • GPU Monitoring: Track GPU memory usage and utilization in real-time
  • Container Control: Start, stop, and monitor Docker containers running VLLM servers
  • Database Integration: Persistent storage of model and container information using SQLite
  • CLI Interface: Simple command-line interface with rich table outputs
  • Configuration: Flexible configuration using Hydra for different models and settings

Installation

Prerequisites

  • Python 3.13+
  • Docker
  • NVIDIA GPU (optional, for GPU acceleration)
  • NVIDIA Container Toolkit (for GPU support)

Install from Source

git clone <repository-url>
cd vllmonitor
pip install -e .

Quick Start

  1. Initialize the database:

    vllmoni init
    
  2. Run a model:

    vllmoni run model=llama_3_1
    
  3. List running models:

    vllmoni ls
    
  4. Monitor with live updates:

    vllmoni ls --interval 2
    

Usage

Commands

  • vllmoni init [--override]: Initialize the database
  • vllmoni ls [--full] [--interval SECONDS]: List all registered models
  • vllmoni run [OVERRIDES] [--follow]: Run a VLLM model in a Docker container
  • vllmoni stop <ID>: Stop a specific model container
  • vllmoni stop-all: Stop all running model containers
  • vllmoni logs <ID>: View logs for a specific model

Configuration

Models are configured using YAML files in the conf/model/ directory. Each model config includes:

  • Model name and HuggingFace path
  • GPU memory utilization
  • Generation parameters (temperature, max_tokens, etc.)
  • Quantization settings

Example model config (conf/model/llama_3_1.yaml):

model_name: "meta-llama/Meta-Llama-3.1-8B-Instruct"
model_name_short: "llama3.1-8B-Instruct"
gpu_memory_utilization: 0.5
temperature: 0
max_tokens: 1000
max_model_len: 40000
tensor_parallel_size: 1

Override settings at runtime:

vllmoni run model=llama_3_1 port=8006 devices=0,1

Architecture

  • CLI: Typer-based command-line interface
  • Database: SQLAlchemy with SQLite backend
  • Configuration: Hydra for flexible config management
  • Container Management: Docker Python SDK
  • Monitoring: Real-time GPU stats via nvidia-smi
  • Logging: Structured logging with Rich console output

Development

Project Structure

src/
├── cli/           # Command-line interface
├── app/           # Database models and repository
├── container/     # Docker container management
├── utils/         # Utilities (logging, settings)
└── tests/         # Unit tests

conf/
├── defaults.yaml  # Default configuration
└── model/         # Model-specific configs

Running Tests

python -m pytest tests/

Building

python -m build

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

License

[Add your license here]

Acknowledgments

  • VLLM - Efficient LLM inference
  • Hydra - Configuration management
  • Typer - CLI framework
  • Rich - Terminal formatting

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllmoni-0.1.0.tar.gz (30.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vllmoni-0.1.0-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file vllmoni-0.1.0.tar.gz.

File metadata

  • Download URL: vllmoni-0.1.0.tar.gz
  • Upload date:
  • Size: 30.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.14

File hashes

Hashes for vllmoni-0.1.0.tar.gz
Algorithm Hash digest
SHA256 252e42ecd2469e9f9529ec176cb35f7f42e41ff70cb2cbce7fec21270d256186
MD5 5dc567320876a7c3304f0b520cf54e4b
BLAKE2b-256 859f60fd81c93100e338f7ce9a6dafeb8be1f51511c0a3390cfa6370d7f7a29a

See more details on using hashes here.

File details

Details for the file vllmoni-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: vllmoni-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.14

File hashes

Hashes for vllmoni-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8995c78c4324a9e26cc1a2913b38538c8ecc10c4b7f845c48c4492d871953f16
MD5 e73ee638bb50e33ce596d2331dd781be
BLAKE2b-256 a7b0b12ace36a63152ca7fbd50d1d496303c98221fe1d003aad6d9e1fccd0d2c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page