Multi-instance vLLM cluster orchestration and log management
Project description
📖 About
vLLM Manager provides multi-instance vLLM cluster management, automatic log collection, and load balancing.
- Start vLLM: Uses official CLI (
python -m vllm.entrypoints.openai.api_server) - Send Requests: Uses official OpenAI SDK (
from openai import OpenAI) - Cluster Management: Auto start/stop, health checks, failover
- Log Collection: All instance logs automatically saved to files
✨ Features
- 🎯 Multi-Instance Management: Start/stop multiple vLLM instances with one command
- 📝 Automatic Logging: Log files named by model and port for easy identification
- 🔄 Failover: Auto-retry on other instances when request fails
- ❤️ Health Monitoring: Continuous instance health checks
- 🔧 OpenAI SDK: Returns standard OpenAI client, seamless integration
- ⚖️ Load Balancing: Round-robin request distribution
🛠️ Tech Stack
- Python 3.8+
- vLLM - LLM inference engine
- OpenAI SDK - API client
- Requests - HTTP client
📦 Installation
# 1. Install vLLM
pip install vllm
# 2. Install dependencies
pip install -r requirements.txt
# Or install individually
pip install openai requests
🚀 Quick Start
Basic Usage
from vllm_manager import VLLMCluster, VLLMInstance
# 1. Create cluster
cluster = VLLMCluster(log_dir="./vllm_logs")
# 2. Add instances
cluster.add_instance(VLLMInstance(
name="server1",
model="facebook/opt-125m",
port=8000,
gpu_memory_utilization=0.5,
))
# 3. Start all instances
cluster.start_all()
# 4. Get OpenAI client
client = cluster.get_openai_client()
# 5. Send requests (auto load-balanced)
response = client.completions.create(
model="facebook/opt-125m",
prompt="San Francisco is a",
)
print(response)
# 6. Stop cluster
cluster.stop_all()
Multi-Model Example
from vllm_manager import VLLMCluster, VLLMInstance
cluster = VLLMCluster()
# Add instances with different models
cluster.add_instance(VLLMInstance(
name="qwen-server",
model="Qwen/Qwen2.5-1.5B-Instruct",
port=8000,
))
cluster.add_instance(VLLMInstance(
name="llama-server",
model="meta-llama/Llama-2-7b-chat",
port=8001,
))
cluster.start_all()
# View model name for each instance
for instance in cluster.instances.values():
print(f"{instance.name}: {instance.served_model_name}")
# qwen-server: Qwen2.5-1.5B-Instruct
# llama-server: Llama-2-7b-chat
# Log files automatically include model name
# vllm_Qwen2.5-1.5B-Instruct_8000_20260227_101234.log
# vllm_Llama-2-7b-chat_8001_20260227_101235.log
📖 API Reference
VLLMInstance
VLLMInstance(
name: str, # Instance name
model: str, # Model name/path
port: int = 8000, # Port
host: str = "0.0.0.0", # Host
log_dir: Optional[Path] = None,
# vLLM parameters (inherited from AsyncEngineArgs)
gpu_memory_utilization: float = 0.9,
tensor_parallel_size: int = 1,
pipeline_parallel_size: int = 1,
max_model_len: Optional[int] = None,
quantization: Optional[str] = None,
dtype: str = "auto",
# ... supports all AsyncEngineArgs parameters
)
# Properties
instance.served_model_name # Model name (last path component)
instance.base_url # http://host:port
instance.api_url # http://host:port/v1
instance.log_file # Log file path
VLLMCluster
cluster = VLLMCluster(log_dir="./vllm_logs")
cluster.add_instance(instance: VLLMInstance)
cluster.start_all()
cluster.stop_all()
cluster.health_check()
client = cluster.get_openai_client()
📝 Log Management
Log File Naming
Log files are named by model name + port + timestamp for easy identification:
./vllm_logs/
├── vllm_manager_20260227_101234.log # Manager logs
├── vllm_Qwen2.5-1.5B-Instruct_8000_101235.log # Qwen model
└── vllm_Llama-2-7b-chat_8001_101236.log # Llama model
View Logs
from vllm_manager import LogAggregator
aggregator = LogAggregator(log_dir="./vllm_logs")
# Get all logs
logs = aggregator.get_all_logs(limit=100)
for log in logs:
print(f"[{log.timestamp}] {log.instance}: {log.message}")
# Export to JSON
aggregator.export_json("logs.json")
❓ FAQ
Q: Why use vLLM Manager?
A: When you need to run multiple vLLM instances (different models, different GPUs), vLLM Manager provides unified cluster management and log collection.
Q: Which vLLM parameters are supported?
A: All AsyncEngineArgs parameters are supported, since VLLMInstance inherits from AsyncEngineArgs.
Q: How are log files named?
A: Format is vllm_{model_name}_{port}_{timestamp}.log, where model_name is the last component of the model path (e.g., Qwen2.5-1.5B-Instruct).
Q: How do I check which model each instance is running?
A: Use the instance.served_model_name property:
for instance in cluster.instances.values():
print(f"{instance.name}: {instance.served_model_name}")
🤝 Contributing
Issues and Pull Requests are welcome!
# 1. Fork the repo
# 2. Create your branch (git checkout -b feature/AmazingFeature)
# 3. Commit your changes (git commit -m 'Add some AmazingFeature')
# 4. Push to the branch (git push origin feature/AmazingFeature)
# 5. Open a Pull Request
📄 License
MIT License - See LICENSE file for details.
📬 Contact
- Project URL: https://github.com/AiKiAi-stack/vllm_startup
- Issues: https://github.com/AiKiAi-stack/vllm_startup/issues
- Author: AiKiAi-stack
🙏 Acknowledgements
- vLLM - LLM inference engine
- OpenAI Python SDK - API client
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vllm_manager-0.2.0.tar.gz.
File metadata
- Download URL: vllm_manager-0.2.0.tar.gz
- Upload date:
- Size: 16.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
170f1ff643899644d157299d796ad0f9d59d94751caa25dd2424430cb863d13d
|
|
| MD5 |
cb5fc84bf62a81c3e4ce8cc708aef4e1
|
|
| BLAKE2b-256 |
aa13ce195a34d366ceb3d8595b9c4daaec6964fae40d6fe68da7eb91aa01bf62
|
File details
Details for the file vllm_manager-0.2.0-py3-none-any.whl.
File metadata
- Download URL: vllm_manager-0.2.0-py3-none-any.whl
- Upload date:
- Size: 13.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
338c18bb3a25ea7b0534f39d22b87a3a34a033af30b136510301f921d6255475
|
|
| MD5 |
358e2bad9d153c04e60cd1d8ed6b31e0
|
|
| BLAKE2b-256 |
c5b2347e2b753ca6dc152881ca8ee26bcf2c27ae806dd02b265160519f2d3760
|