Skip to main content

Multi-instance vLLM cluster orchestration and log management

Project description

vLLM Manager

Multi-Instance vLLM Cluster Management & Log Aggregation

Python vLLM License PyPI

English | 中文


📖 About

vLLM Manager provides multi-instance vLLM cluster management, automatic log collection, and load balancing.

  • Start vLLM: Uses official CLI (python -m vllm.entrypoints.openai.api_server)
  • Send Requests: Uses official OpenAI SDK (from openai import OpenAI)
  • Cluster Management: Auto start/stop, health checks, failover
  • Log Collection: All instance logs automatically saved to files

✨ Features

  • 🎯 Multi-Instance Management: Start/stop multiple vLLM instances with one command
  • 📝 Automatic Logging: Log files named by model and port for easy identification
  • 🔄 Failover: Auto-retry on other instances when request fails
  • ❤️ Health Monitoring: Continuous instance health checks
  • 🔧 OpenAI SDK: Returns standard OpenAI client, seamless integration
  • ⚖️ Load Balancing: Round-robin request distribution

🛠️ Tech Stack

  • Python 3.8+
  • vLLM - LLM inference engine
  • OpenAI SDK - API client
  • Requests - HTTP client

📦 Installation

# 1. Install vLLM
pip install vllm

# 2. Install dependencies
pip install -r requirements.txt

# Or install individually
pip install openai requests

🚀 Quick Start

Basic Usage

from vllm_manager import VLLMCluster, VLLMInstance

# 1. Create cluster
cluster = VLLMCluster(log_dir="./vllm_logs")

# 2. Add instances
cluster.add_instance(VLLMInstance(
    name="server1",
    model="facebook/opt-125m",
    port=8000,
    gpu_memory_utilization=0.5,
))

# 3. Start all instances
cluster.start_all()

# 4. Get OpenAI client
client = cluster.get_openai_client()

# 5. Send requests (auto load-balanced)
response = client.completions.create(
    model="facebook/opt-125m",
    prompt="San Francisco is a",
)
print(response)

# 6. Stop cluster
cluster.stop_all()

Multi-Model Example

from vllm_manager import VLLMCluster, VLLMInstance

cluster = VLLMCluster()

# Add instances with different models
cluster.add_instance(VLLMInstance(
    name="qwen-server",
    model="Qwen/Qwen2.5-1.5B-Instruct",
    port=8000,
))

cluster.add_instance(VLLMInstance(
    name="llama-server",
    model="meta-llama/Llama-2-7b-chat",
    port=8001,
))

cluster.start_all()

# View model name for each instance
for instance in cluster.instances.values():
    print(f"{instance.name}: {instance.served_model_name}")
# qwen-server: Qwen2.5-1.5B-Instruct
# llama-server: Llama-2-7b-chat

# Log files automatically include model name
# vllm_Qwen2.5-1.5B-Instruct_8000_20260227_101234.log
# vllm_Llama-2-7b-chat_8001_20260227_101235.log

📖 API Reference

VLLMInstance

VLLMInstance(
    name: str,                    # Instance name
    model: str,                   # Model name/path
    port: int = 8000,             # Port
    host: str = "0.0.0.0",        # Host
    log_dir: Optional[Path] = None,
    
    # vLLM parameters (inherited from AsyncEngineArgs)
    gpu_memory_utilization: float = 0.9,
    tensor_parallel_size: int = 1,
    pipeline_parallel_size: int = 1,
    max_model_len: Optional[int] = None,
    quantization: Optional[str] = None,
    dtype: str = "auto",
    # ... supports all AsyncEngineArgs parameters
)

# Properties
instance.served_model_name  # Model name (last path component)
instance.base_url           # http://host:port
instance.api_url            # http://host:port/v1
instance.log_file           # Log file path

VLLMCluster

cluster = VLLMCluster(log_dir="./vllm_logs")
cluster.add_instance(instance: VLLMInstance)
cluster.start_all()
cluster.stop_all()
cluster.health_check()
client = cluster.get_openai_client()

📝 Log Management

Log File Naming

Log files are named by model name + port + timestamp for easy identification:

./vllm_logs/
├── vllm_manager_20260227_101234.log          # Manager logs
├── vllm_Qwen2.5-1.5B-Instruct_8000_101235.log  # Qwen model
└── vllm_Llama-2-7b-chat_8001_101236.log        # Llama model

View Logs

from vllm_manager import LogAggregator

aggregator = LogAggregator(log_dir="./vllm_logs")

# Get all logs
logs = aggregator.get_all_logs(limit=100)
for log in logs:
    print(f"[{log.timestamp}] {log.instance}: {log.message}")

# Export to JSON
aggregator.export_json("logs.json")

❓ FAQ

Q: Why use vLLM Manager?

A: When you need to run multiple vLLM instances (different models, different GPUs), vLLM Manager provides unified cluster management and log collection.

Q: Which vLLM parameters are supported?

A: All AsyncEngineArgs parameters are supported, since VLLMInstance inherits from AsyncEngineArgs.

Q: How are log files named?

A: Format is vllm_{model_name}_{port}_{timestamp}.log, where model_name is the last component of the model path (e.g., Qwen2.5-1.5B-Instruct).

Q: How do I check which model each instance is running?

A: Use the instance.served_model_name property:

for instance in cluster.instances.values():
    print(f"{instance.name}: {instance.served_model_name}")

🤝 Contributing

Issues and Pull Requests are welcome!

# 1. Fork the repo
# 2. Create your branch (git checkout -b feature/AmazingFeature)
# 3. Commit your changes (git commit -m 'Add some AmazingFeature')
# 4. Push to the branch (git push origin feature/AmazingFeature)
# 5. Open a Pull Request

📄 License

MIT License - See LICENSE file for details.

📬 Contact

🙏 Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_manager-0.2.0.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vllm_manager-0.2.0-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file vllm_manager-0.2.0.tar.gz.

File metadata

  • Download URL: vllm_manager-0.2.0.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vllm_manager-0.2.0.tar.gz
Algorithm Hash digest
SHA256 170f1ff643899644d157299d796ad0f9d59d94751caa25dd2424430cb863d13d
MD5 cb5fc84bf62a81c3e4ce8cc708aef4e1
BLAKE2b-256 aa13ce195a34d366ceb3d8595b9c4daaec6964fae40d6fe68da7eb91aa01bf62

See more details on using hashes here.

File details

Details for the file vllm_manager-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: vllm_manager-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vllm_manager-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 338c18bb3a25ea7b0534f39d22b87a3a34a033af30b136510301f921d6255475
MD5 358e2bad9d153c04e60cd1d8ed6b31e0
BLAKE2b-256 c5b2347e2b753ca6dc152881ca8ee26bcf2c27ae806dd02b265160519f2d3760

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page