Multi-instance vLLM cluster orchestration and log management

These details have not been verified by PyPI

Project links

Project description

vLLM Manager

Multi-Instance vLLM Cluster Management & Log Aggregation

English | 中文

📖 About

vLLM Manager provides multi-instance vLLM cluster management, automatic log collection, and load balancing.

Start vLLM: Uses official CLI (python -m vllm.entrypoints.openai.api_server)
Send Requests: Uses official OpenAI SDK (from openai import OpenAI)
Cluster Management: Auto start/stop, health checks, failover
Log Collection: All instance logs automatically saved to files

✨ Features

🎯 Multi-Instance Management: Start/stop multiple vLLM instances with one command
📝 Automatic Logging: Log files named by model and port for easy identification
🔄 Failover: Auto-retry on other instances when request fails
❤️ Health Monitoring: Continuous instance health checks
🔧 OpenAI SDK: Returns standard OpenAI client, seamless integration
⚖️ Load Balancing: Round-robin request distribution

🛠️ Tech Stack

Python 3.8+
vLLM - LLM inference engine
OpenAI SDK - API client
Requests - HTTP client

📦 Installation

# 1. Install vLLM
pip install vllm

# 2. Install dependencies
pip install -r requirements.txt

# Or install individually
pip install openai requests

🚀 Quick Start

Basic Usage

from vllm_manager import VLLMCluster, VLLMInstance

# 1. Create cluster
cluster = VLLMCluster(log_dir="./vllm_logs")

# 2. Add instances
cluster.add_instance(VLLMInstance(
    name="server1",
    model="facebook/opt-125m",
    port=8000,
    gpu_memory_utilization=0.5,
))

# 3. Start all instances
cluster.start_all()

# 4. Get OpenAI client
client = cluster.get_openai_client()

# 5. Send requests (auto load-balanced)
response = client.completions.create(
    model="facebook/opt-125m",
    prompt="San Francisco is a",
)
print(response)

# 6. Stop cluster
cluster.stop_all()

Multi-Model Example

from vllm_manager import VLLMCluster, VLLMInstance

cluster = VLLMCluster()

# Add instances with different models
cluster.add_instance(VLLMInstance(
    name="qwen-server",
    model="Qwen/Qwen2.5-1.5B-Instruct",
    port=8000,
))

cluster.add_instance(VLLMInstance(
    name="llama-server",
    model="meta-llama/Llama-2-7b-chat",
    port=8001,
))

cluster.start_all()

# View model name for each instance
for instance in cluster.instances.values():
    print(f"{instance.name}: {instance.served_model_name}")
# qwen-server: Qwen2.5-1.5B-Instruct
# llama-server: Llama-2-7b-chat

# Log files automatically include model name
# vllm_Qwen2.5-1.5B-Instruct_8000_20260227_101234.log
# vllm_Llama-2-7b-chat_8001_20260227_101235.log

📖 API Reference

VLLMInstance

VLLMInstance(
    name: str,                    # Instance name
    model: str,                   # Model name/path
    port: int = 8000,             # Port
    host: str = "0.0.0.0",        # Host
    log_dir: Optional[Path] = None,
    
    # vLLM parameters (inherited from AsyncEngineArgs)
    gpu_memory_utilization: float = 0.9,
    tensor_parallel_size: int = 1,
    pipeline_parallel_size: int = 1,
    max_model_len: Optional[int] = None,
    quantization: Optional[str] = None,
    dtype: str = "auto",
    # ... supports all AsyncEngineArgs parameters
)

# Properties
instance.served_model_name  # Model name (last path component)
instance.base_url           # http://host:port
instance.api_url            # http://host:port/v1
instance.log_file           # Log file path

VLLMCluster

cluster = VLLMCluster(log_dir="./vllm_logs")
cluster.add_instance(instance: VLLMInstance)
cluster.start_all()
cluster.stop_all()
cluster.health_check()
client = cluster.get_openai_client()

📝 Log Management

Log File Naming

Log files are named by model name + port + timestamp for easy identification:

./vllm_logs/
├── vllm_manager_20260227_101234.log          # Manager logs
├── vllm_Qwen2.5-1.5B-Instruct_8000_101235.log  # Qwen model
└── vllm_Llama-2-7b-chat_8001_101236.log        # Llama model

View Logs

from vllm_manager import LogAggregator

aggregator = LogAggregator(log_dir="./vllm_logs")

# Get all logs
logs = aggregator.get_all_logs(limit=100)
for log in logs:
    print(f"[{log.timestamp}] {log.instance}: {log.message}")

# Export to JSON
aggregator.export_json("logs.json")

❓ FAQ

Q: Why use vLLM Manager?

A: When you need to run multiple vLLM instances (different models, different GPUs), vLLM Manager provides unified cluster management and log collection.

Q: Which vLLM parameters are supported?

A: All AsyncEngineArgs parameters are supported, since VLLMInstance inherits from AsyncEngineArgs.

Q: How are log files named?

A: Format is vllm_{model_name}_{port}_{timestamp}.log, where model_name is the last component of the model path (e.g., Qwen2.5-1.5B-Instruct).

Q: How do I check which model each instance is running?

A: Use the instance.served_model_name property:

for instance in cluster.instances.values():
    print(f"{instance.name}: {instance.served_model_name}")

🤝 Contributing

Issues and Pull Requests are welcome!

# 1. Fork the repo
# 2. Create your branch (git checkout -b feature/AmazingFeature)
# 3. Commit your changes (git commit -m 'Add some AmazingFeature')
# 4. Push to the branch (git push origin feature/AmazingFeature)
# 5. Open a Pull Request

📄 License

MIT License - See LICENSE file for details.

📬 Contact

Project URL: https://github.com/AiKiAi-stack/vllm_startup
Issues: https://github.com/AiKiAi-stack/vllm_startup/issues
Author: AiKiAi-stack

🙏 Acknowledgements

vLLM - LLM inference engine
OpenAI Python SDK - API client

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.4

Feb 27, 2026

This version

0.2.1

Feb 27, 2026

0.2.0

Feb 27, 2026

0.1.0

Feb 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_manager-0.2.1.tar.gz (16.2 kB view details)

Uploaded Feb 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vllm_manager-0.2.1-py3-none-any.whl (13.7 kB view details)

Uploaded Feb 27, 2026 Python 3

File details

Details for the file vllm_manager-0.2.1.tar.gz.

File metadata

Download URL: vllm_manager-0.2.1.tar.gz
Upload date: Feb 27, 2026
Size: 16.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vllm_manager-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`965f632574680c786050e4317ec1cd5981c935319f2819d1d563d408a439dd88`
MD5	`79ca3528fcfb622959416f452d1c49e8`
BLAKE2b-256	`7c560c3c823ccc24cf612763adcf75b48e2fdfc568fa85e2c4f70629f0e3e4b0`

See more details on using hashes here.

File details

Details for the file vllm_manager-0.2.1-py3-none-any.whl.

File metadata

Download URL: vllm_manager-0.2.1-py3-none-any.whl
Upload date: Feb 27, 2026
Size: 13.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vllm_manager-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a884c1800164e27fb87914f5a9de70c930e447fa1ffa8faf2c9d0bc3bf3940be`
MD5	`57fe40995e8a2357729289f0ed484f6f`
BLAKE2b-256	`ff9a42a60223a8f81fd25915b0e5af33bf27b19bda6da9b7a5942d5b83b2683c`

See more details on using hashes here.

vllm-manager 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

vLLM Manager

📖 About

✨ Features

🛠️ Tech Stack

📦 Installation

🚀 Quick Start

Basic Usage

Multi-Model Example

📖 API Reference

VLLMInstance

VLLMCluster

📝 Log Management

Log File Naming

View Logs

❓ FAQ

Q: Why use vLLM Manager?

Q: Which vLLM parameters are supported?

Q: How are log files named?

Q: How do I check which model each instance is running?

🤝 Contributing

📄 License

📬 Contact

🙏 Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes