A CLI tool for monitoring and managing VLLM inference servers in Docker containers

Project description

VLLMoni

VLLMoni is a command-line tool for monitoring and managing vllm inference servers running in Docker containers. It provides real-time GPU usage tracking, model management, and container lifecycle control.

Features

Model Management: Easily run and stop VLLM models in Docker containers
GPU Monitoring: Track GPU memory usage and utilization in real-time
Container Control: Start, stop, and monitor Docker containers running VLLM servers
Database Integration: Persistent storage of model and container information using SQLite
CLI Interface: Simple command-line interface with rich table outputs
Configuration: Flexible configuration using Hydra for different models and settings

Installation

Prerequisites

Python 3.13+
Docker
NVIDIA GPU (optional, for GPU acceleration)
NVIDIA Container Toolkit (for GPU support)

Install from Source

git clone <repository-url>
cd vllmonitor
pip install -e .

Quick Start

Initialize the database:
```
vllmoni init
```
Run a model:
```
vllmoni run model=llama_3_1
```
List running models:
```
vllmoni ls
```
Monitor with live updates:
```
vllmoni ls --interval 2
```

Usage

Commands

vllmoni init [--override]: Initialize the database
vllmoni ls [--full] [--interval SECONDS]: List all registered models
vllmoni run [OVERRIDES] [--follow]: Run a VLLM model in a Docker container
vllmoni stop <ID>: Stop a specific model container
vllmoni stop-all: Stop all running model containers
vllmoni logs <ID>: View logs for a specific model

Configuration

Models are configured using YAML files in the conf/model/ directory. Each model config includes:

Model name and HuggingFace path
GPU memory utilization
Generation parameters (temperature, max_tokens, etc.)
Quantization settings

Example model config (conf/model/llama_3_1.yaml):

model_name: "meta-llama/Meta-Llama-3.1-8B-Instruct"
model_name_short: "llama3.1-8B-Instruct"
gpu_memory_utilization: 0.5
temperature: 0
max_tokens: 1000
max_model_len: 40000
tensor_parallel_size: 1

Override settings at runtime:

vllmoni run model=llama_3_1 port=8006 devices=0,1

Architecture

CLI: Typer-based command-line interface
Database: SQLAlchemy with SQLite backend
Configuration: Hydra for flexible config management
Container Management: Docker Python SDK
Monitoring: Real-time GPU stats via nvidia-smi
Logging: Structured logging with Rich console output

Development

Project Structure

src/
├── cli/           # Command-line interface
├── app/           # Database models and repository
├── container/     # Docker container management
├── utils/         # Utilities (logging, settings)
└── tests/         # Unit tests

conf/
├── defaults.yaml  # Default configuration
└── model/         # Model-specific configs

Running Tests

python -m pytest tests/

Building

python -m build

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

License

[Add your license here]

Acknowledgments

VLLM - Efficient LLM inference
Hydra - Configuration management
Typer - CLI framework
Rich - Terminal formatting

Project details

Release history Release notifications | RSS feed

0.1.6.2

Jan 20, 2026

0.1.5.3

Jan 20, 2026

0.1.5.2

Jan 20, 2026

0.1.4

Jan 20, 2026

This version

0.1.0

Jan 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllmoni-0.1.0.tar.gz (30.1 kB view details)

Uploaded Jan 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vllmoni-0.1.0-py3-none-any.whl (13.6 kB view details)

Uploaded Jan 12, 2026 Python 3

File details

Details for the file vllmoni-0.1.0.tar.gz.

File metadata

Download URL: vllmoni-0.1.0.tar.gz
Upload date: Jan 12, 2026
Size: 30.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.14

File hashes

Hashes for vllmoni-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`252e42ecd2469e9f9529ec176cb35f7f42e41ff70cb2cbce7fec21270d256186`
MD5	`5dc567320876a7c3304f0b520cf54e4b`
BLAKE2b-256	`859f60fd81c93100e338f7ce9a6dafeb8be1f51511c0a3390cfa6370d7f7a29a`

See more details on using hashes here.

File details

Details for the file vllmoni-0.1.0-py3-none-any.whl.

File metadata

Download URL: vllmoni-0.1.0-py3-none-any.whl
Upload date: Jan 12, 2026
Size: 13.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.14

File hashes

Hashes for vllmoni-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8995c78c4324a9e26cc1a2913b38538c8ecc10c4b7f845c48c4492d871953f16`
MD5	`e73ee638bb50e33ce596d2331dd781be`
BLAKE2b-256	`a7b0b12ace36a63152ca7fbd50d1d496303c98221fe1d003aad6d9e1fccd0d2c`

See more details on using hashes here.

vllmoni 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

VLLMoni

Features

Installation

Prerequisites

Install from Source

Quick Start

Usage

Commands

Configuration

Architecture

Development

Project Structure

Running Tests

Building

Contributing

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes