A comprehensive Python package for managing vLLM API server lifecycle with logging, process management, and multi-server support

These details have not been verified by PyPI

Project links

Project description

vLLM Server Manager

A comprehensive Python package for managing vLLM API server lifecycle with proper logging, process management, and multi-server support.

Features

Two Startup Modes: Parameterized configuration or full command execution
Comprehensive Logging: Automatic log files with model name, date, and server identifier
Graceful Shutdown: Proper process termination with timeout and force kill fallback
Health Monitoring: Built-in health check and readiness detection
Multi-Server Management: Cluster support for running multiple instances
Auto-Restart: Automatic server restart on failure (optional)
CUDA Management: GPU device selection and resource management
Context Manager Support: Clean resource management with with statement

Installation

# Ensure vLLM is installed
pip install vllm

# Import the manager
from vllm_manager import VLLMManager

Quick Start

Basic Usage

from vllm_manager import VLLMManager

# Create manager and start server
manager = VLLMManager(model="facebook/opt-125m")
manager.start(host="0.0.0.0", port=8000)

# Check status
print(manager.get_status())
# {'running': True, 'pid': 12345, 'model': 'facebook/opt-125m', ...}

# Wait for server to be ready
manager.wait_for_ready(timeout=60)

# Stop when done
manager.stop()

Parameterized Startup

manager = VLLMManager(model="meta-llama/Llama-2-7b-hf")

manager.start(
    host="0.0.0.0",
    port=8000,
    tensor_parallel_size=2,      # Multi-GPU
    gpu_memory_utilization=0.9,
    max_model_len=4096,
    dtype="float16",
    quantization="awq",          # Optional: AWQ quantization
    api_key="your-api-key",      # Optional: API authentication
)

Full Command Startup

manager = VLLMManager()

# Start with complete command
manager.start_command(
    "vllm serve facebook/opt-125m --port 8000 --tensor-parallel-size 2"
)

# Or as list
manager.start_command([
    "vllm", "serve", "facebook/opt-125m",
    "--port", "8000",
    "--host", "0.0.0.0"
])

Context Manager

from vllm_manager import VLLMManager

with VLLMManager(model="facebook/opt-125m") as manager:
    manager.start(port=8000)
    # Server runs here
    # Automatically stopped when exiting context

Convenience Function

from vllm_manager import serve

# Quick one-liner
manager = serve("facebook/opt-125m", port=8001)
# ... use the server ...
manager.stop()

Advanced Features

Multi-Server Cluster

from vllm_manager import VLLMCluster

# Create cluster
cluster = VLLMCluster()

# Add multiple servers
cluster.add_server("small", model="facebook/opt-125m", port=8001)
cluster.add_server("medium", model="facebook/opt-350m", port=8002, auto_restart=True)

# Start all
results = cluster.start_all()

# Health check
health = cluster.health_check()

# Get status
status = cluster.get_status()

# Stop all
cluster.stop_all()

Configuration Persistence

from vllm_manager import VLLMCluster

# Save configuration
cluster = VLLMCluster()
cluster.add_server("model1", model="facebook/opt-125m", port=8001)
cluster.save_config("config.json")

# Load configuration
cluster = VLLMCluster.load_config("config.json")

GPU Selection

from vllm_manager import VLLMManager

# Select specific GPUs
manager = VLLMManager(model="facebook/opt-125m")
manager.start(
    port=8000,
    cuda_devices=[0, 1],  # Use GPUs 0 and 1
    tensor_parallel_size=2,
)

Health Monitoring

from vllm_manager import VLLMManager, health_monitor
import threading

manager = VLLMManager(model="facebook/opt-125m")
manager.start(port=8000)

# Start health monitoring in background thread
monitor_thread = threading.Thread(
    target=health_monitor,
    args=(manager,),
    kwargs={"interval": 30, "max_failures": 3},
    daemon=True
)
monitor_thread.start()

API Reference

VLLMManager

Initialization

VLLMManager(
    model: Optional[str] = None,       # Model name/path
    log_dir: Optional[Path] = None,    # Log directory
    server_id: Optional[str] = None,   # Custom server ID
)

Methods

start(**kwargs) - Start server with parameters
start_command(cmd) - Start server with full command
stop(timeout=30, force=False) - Stop server gracefully
is_running() - Check if server is running
get_pid() - Get server process ID
get_uptime() - Get server uptime string
wait_for_ready(timeout=60) - Wait for server readiness
get_log_file() - Get log file path
get_status() - Get comprehensive status dict

Start Parameters

Parameter	Type	Default	Description
`model`	str	-	Model name or path
`host`	str	"0.0.0.0"	Bind address
`port`	int	8000	Port number
`tensor_parallel_size`	int	1	GPU count for parallelism
`dtype`	str	"auto"	Data type (auto, float16, etc.)
`max_model_len`	int	None	Maximum context length
`gpu_memory_utilization`	float	0.9	GPU memory ratio
`swap_space`	int	4	CPU swap space (GB)
`quantization`	str	None	Quantization method (awq, gptq, etc.)
`api_key`	str	None	API authentication key
`cuda_devices`	List[int]	None	GPU device IDs
`additional_args`	List[str]	[]	Extra command-line arguments

VLLMCluster

Methods

add_server(name, model, port, **kwargs) - Add server instance
start_server(name) - Start specific server
stop_server(name) - Stop specific server
start_all() - Start all servers
stop_all() - Stop all servers
health_check(timeout=5) - Check all servers' health
get_status() - Get all servers' status
save_config(path) - Save configuration to JSON
load_config(path) - Load configuration from JSON

Log Files

Logs are stored in ./vllm_logs/ by default with naming format:

vllm_{model_name}_{YYYYMMDD}_{server_id}.log

Example:

vllm_facebook_opt-125m_20260226_srv_20260226105451.log

Log format:

2026-02-26 10:54:51 [INFO] [vllm.srv_20260226105451] VLLMManager initialized
2026-02-26 10:54:52 [INFO] [vllm.srv_20260226105451] Starting vLLM server: vllm serve ...
2026-02-26 10:54:53 [INFO] [vllm.srv_20260226105451] vLLM server started with PID: 12345

Error Handling

from vllm_manager import VLLMManager

manager = VLLMManager(model="facebook/opt-125m")

try:
    manager.start(port=8000)
except RuntimeError as e:
    # Server already running
    print(f"Error: {e}")
except ValueError as e:
    # Model not specified
    print(f"Error: {e}")
except Exception as e:
    # Other errors
    print(f"Error: {e}")
finally:
    manager.stop()

Examples

Example 1: Simple Server

from vllm_manager import VLLMManager

# Start server
manager = VLLMManager(model="facebook/opt-125m")
manager.start(port=8000)

# Keep running
import time
while manager.is_running():
    time.sleep(1)

Example 2: Production Deployment

from vllm_manager import VLLMManager
import logging

# Setup
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Start with production config
manager = VLLMManager(
    model="meta-llama/Llama-2-70b-hf",
    log_dir="/var/log/vllm",
)

try:
    manager.start(
        host="0.0.0.0",
        port=8000,
        tensor_parallel_size=4,
        gpu_memory_utilization=0.95,
        api_key="secret-key",
        disable_log_requests=False,
    )
    
    # Wait for ready
    if not manager.wait_for_ready(timeout=120):
        raise RuntimeError("Server failed to start")
    
    logger.info("Server is ready!")
    
    # Keep running
    while True:
        time.sleep(1)
        
except KeyboardInterrupt:
    logger.info("Shutting down...")
    manager.stop(timeout=60)
except Exception as e:
    logger.error(f"Error: {e}")
    manager.stop(force=True)
    raise

Example 3: Multiple Models

from vllm_manager import VLLMCluster

# Create cluster for multiple models
cluster = VLLMCluster(log_dir="/var/log/vllm")

# Add models
cluster.add_server(
    "llama-7b",
    model="meta-llama/Llama-2-7b-hf",
    port=8001,
    auto_restart=True,
)
cluster.add_server(
    "llama-70b",
    model="meta-llama/Llama-2-70b-hf",
    port=8002,
    tensor_parallel_size=4,
)

# Start all
cluster.start_all()

# Monitor
import time
while True:
    status = cluster.get_status()
    print(f"Status: {status}")
    time.sleep(10)

License

MIT License

Contributing

Contributions welcome! Please feel free to submit issues and pull requests.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.4

Feb 27, 2026

0.2.1

Feb 27, 2026

0.2.0

Feb 27, 2026

This version

0.1.0

Feb 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_manager-0.1.0.tar.gz (18.4 kB view details)

Uploaded Feb 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vllm_manager-0.1.0-py3-none-any.whl (15.1 kB view details)

Uploaded Feb 26, 2026 Python 3

File details

Details for the file vllm_manager-0.1.0.tar.gz.

File metadata

Download URL: vllm_manager-0.1.0.tar.gz
Upload date: Feb 26, 2026
Size: 18.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vllm_manager-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`f49b12781a99bc7b3938f2776d8e3593c41840d85ce7f4fe8cf3f6972ab7b3a8`
MD5	`de3c685dab6b63908c2a11bbb6bd13fd`
BLAKE2b-256	`fc78a345fe944356583dccbe34e0f5e45e9dd6e7c616093cc496d6387ac5c0c7`

See more details on using hashes here.

File details

Details for the file vllm_manager-0.1.0-py3-none-any.whl.

File metadata

Download URL: vllm_manager-0.1.0-py3-none-any.whl
Upload date: Feb 26, 2026
Size: 15.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vllm_manager-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8c3b633f701712133f6ac5a4ba60d2a3a019a71f2b34e8aea368c62cb0ef3a45`
MD5	`e7ca6828b0c8c1049d9719adc1ef438b`
BLAKE2b-256	`ce645363596638c67068587b7f96083b65ab17e17b0cb2b856615f460e06b2d1`

See more details on using hashes here.

vllm-manager 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

vLLM Server Manager

Features

Installation

Quick Start

Basic Usage

Parameterized Startup

Full Command Startup

Context Manager

Convenience Function

Advanced Features

Multi-Server Cluster

Configuration Persistence

GPU Selection

Health Monitoring

API Reference

VLLMManager

Initialization

Methods

Start Parameters

VLLMCluster

Methods

Log Files

Error Handling

Examples

Example 1: Simple Server

Example 2: Production Deployment

Example 3: Multiple Models

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes