Skip to main content

Multi-instance vLLM cluster orchestration and log management

Project description

vLLM Manager

Multi-Instance vLLM Cluster Management & Log Aggregation

Python vLLM License PyPI

English | 中文


📖 About

vLLM Manager provides multi-instance vLLM cluster management, automatic log collection, and load balancing.

  • Start vLLM: Uses official CLI (python -m vllm.entrypoints.openai.api_server)
  • Send Requests: Uses official OpenAI SDK (from openai import OpenAI)
  • Cluster Management: Auto start/stop, health checks, failover
  • Log Collection: All instance logs automatically saved to files

✨ Features

  • 🎯 Multi-Instance Management: Start/stop multiple vLLM instances with one command
  • 📝 Automatic Logging: Log files named by model and port for easy identification
  • 🔄 Failover: Auto-retry on other instances when request fails
  • ❤️ Health Monitoring: Continuous instance health checks
  • 🔧 OpenAI SDK: Returns standard OpenAI client, seamless integration
  • ⚖️ Load Balancing: Round-robin request distribution

🛠️ Tech Stack

  • Python 3.8+
  • vLLM - LLM inference engine
  • OpenAI SDK - API client
  • Requests - HTTP client

📦 Installation

# 1. Install vLLM
pip install vllm

# 2. Install dependencies
pip install -r requirements.txt

# Or install individually
pip install openai requests

🚀 Quick Start

Basic Usage

from vllm_manager import VLLMCluster, VLLMInstance

# 1. Create cluster
cluster = VLLMCluster(log_dir="./vllm_logs")

# 2. Add instances
cluster.add_instance(VLLMInstance(
    name="server1",
    model="facebook/opt-125m",
    port=8000,
    gpu_memory_utilization=0.5,
))

# 3. Start all instances
cluster.start_all()

# 4. Get OpenAI client
client = cluster.get_openai_client()

# 5. Send requests (auto load-balanced)
response = client.completions.create(
    model="facebook/opt-125m",
    prompt="San Francisco is a",
)
print(response)

# 6. Stop cluster
cluster.stop_all()

Multi-Model Example

from vllm_manager import VLLMCluster, VLLMInstance

cluster = VLLMCluster()

# Add instances with different models
cluster.add_instance(VLLMInstance(
    name="qwen-server",
    model="Qwen/Qwen2.5-1.5B-Instruct",
    port=8000,
))

cluster.add_instance(VLLMInstance(
    name="llama-server",
    model="meta-llama/Llama-2-7b-chat",
    port=8001,
))

cluster.start_all()

# View model name for each instance
for instance in cluster.instances.values():
    print(f"{instance.name}: {instance.served_model_name}")
# qwen-server: Qwen2.5-1.5B-Instruct
# llama-server: Llama-2-7b-chat

# Log files automatically include model name
# vllm_Qwen2.5-1.5B-Instruct_8000_20260227_101234.log
# vllm_Llama-2-7b-chat_8001_20260227_101235.log

📖 API Reference

VLLMInstance

VLLMInstance(
    name: str,                    # Instance name
    model: str,                   # Model name/path
    port: int = 8000,             # Port
    host: str = "0.0.0.0",        # Host
    log_dir: Optional[Path] = None,
    
    # vLLM parameters (inherited from AsyncEngineArgs)
    gpu_memory_utilization: float = 0.9,
    tensor_parallel_size: int = 1,
    pipeline_parallel_size: int = 1,
    max_model_len: Optional[int] = None,
    quantization: Optional[str] = None,
    dtype: str = "auto",
    # ... supports all AsyncEngineArgs parameters
)

# Properties
instance.served_model_name  # Model name (last path component)
instance.base_url           # http://host:port
instance.api_url            # http://host:port/v1
instance.log_file           # Log file path

VLLMCluster

cluster = VLLMCluster(log_dir="./vllm_logs")
cluster.add_instance(instance: VLLMInstance)
cluster.start_all()
cluster.stop_all()
cluster.health_check()
client = cluster.get_openai_client()

📝 Log Management

Log File Naming

Log files are named by model name + port + timestamp for easy identification:

./vllm_logs/
├── vllm_manager_20260227_101234.log          # Manager logs
├── vllm_Qwen2.5-1.5B-Instruct_8000_101235.log  # Qwen model
└── vllm_Llama-2-7b-chat_8001_101236.log        # Llama model

View Logs

from vllm_manager import LogAggregator

aggregator = LogAggregator(log_dir="./vllm_logs")

# Get all logs
logs = aggregator.get_all_logs(limit=100)
for log in logs:
    print(f"[{log.timestamp}] {log.instance}: {log.message}")

# Export to JSON
aggregator.export_json("logs.json")

❓ FAQ

Q: Why use vLLM Manager?

A: When you need to run multiple vLLM instances (different models, different GPUs), vLLM Manager provides unified cluster management and log collection.

Q: Which vLLM parameters are supported?

A: All AsyncEngineArgs parameters are supported, since VLLMInstance inherits from AsyncEngineArgs.

Q: How are log files named?

A: Format is vllm_{model_name}_{port}_{timestamp}.log, where model_name is the last component of the model path (e.g., Qwen2.5-1.5B-Instruct).

Q: How do I check which model each instance is running?

A: Use the instance.served_model_name property:

for instance in cluster.instances.values():
    print(f"{instance.name}: {instance.served_model_name}")

🤝 Contributing

Issues and Pull Requests are welcome!

# 1. Fork the repo
# 2. Create your branch (git checkout -b feature/AmazingFeature)
# 3. Commit your changes (git commit -m 'Add some AmazingFeature')
# 4. Push to the branch (git push origin feature/AmazingFeature)
# 5. Open a Pull Request

📄 License

MIT License - See LICENSE file for details.

📬 Contact

🙏 Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_manager-0.2.1.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vllm_manager-0.2.1-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file vllm_manager-0.2.1.tar.gz.

File metadata

  • Download URL: vllm_manager-0.2.1.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vllm_manager-0.2.1.tar.gz
Algorithm Hash digest
SHA256 965f632574680c786050e4317ec1cd5981c935319f2819d1d563d408a439dd88
MD5 79ca3528fcfb622959416f452d1c49e8
BLAKE2b-256 7c560c3c823ccc24cf612763adcf75b48e2fdfc568fa85e2c4f70629f0e3e4b0

See more details on using hashes here.

File details

Details for the file vllm_manager-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: vllm_manager-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vllm_manager-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a884c1800164e27fb87914f5a9de70c930e447fa1ffa8faf2c9d0bc3bf3940be
MD5 57fe40995e8a2357729289f0ed484f6f
BLAKE2b-256 ff9a42a60223a8f81fd25915b0e5af33bf27b19bda6da9b7a5942d5b83b2683c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page