๐ The Ultimate Multi-Model LLM Runtime Platform - Deploy, manage, and serve 300+ language models with OpenAI-compatible APIs. Built on ms-swift for production-ready performance.
This project has been archived.
The maintainers of this project have marked this project as archived. No new releases are expected.
Project description
๐ PolarisLLM Runtime Engine
The Ultimate Multi-Model LLM Runtime Platform
PolarisLLM is a production-ready, high-performance runtime engine that transforms how you deploy and serve Large Language Models. Built on the robust ms-swift framework, it provides seamless OpenAI-compatible APIs while enabling dynamic multi-model serving, making it the perfect solution for developers, researchers, and enterprises who need flexible, scalable LLM infrastructure.
๐ฏ Why PolarisLLM?
๐ Turn Any Server Into an LLM Powerhouse
- Deploy multiple models simultaneously on a single machine
- Switch between models without restarts or downtime
- Support for 300+ models including Qwen, Llama, DeepSeek, Mistral, and more
โก Production-Ready Performance
- Built on battle-tested ms-swift framework
- Automatic resource management and optimization
- Real-time health monitoring and auto-recovery
๐ Drop-in OpenAI Compatibility
- Use existing OpenAI client libraries without modification
- Seamless integration with popular frameworks like LangChain, LlamaIndex
- Perfect for migration from proprietary APIs to self-hosted solutions
โจ Key Features
๐๏ธ Dynamic Model Management
- Hot-swap models without server restarts
- Concurrent serving of multiple models on different ports
- Intelligent resource allocation and memory management
- Auto-scaling based on demand
๐ Universal Compatibility
- OpenAI API compatible - works with existing tools and libraries
- 300+ supported models from HuggingFace and ModelScope
- Multi-modal support - text, vision, code, and audio models
- Streaming responses for real-time applications
๐ ๏ธ Developer Experience
- Rich CLI interface with beautiful status displays
- RESTful admin APIs for programmatic control
- YAML configuration for easy model definitions
- Comprehensive logging and error handling
๐๏ธ Production Ready
- Docker containerization with GPU support
- Health checks and automatic recovery
- Resource monitoring and performance metrics
- Horizontal scaling support
๐ฆ Installation
๐ Quick Install (Recommended)
# Install from PyPI - that's it!
pip install polarisllm
# Start the engine
polaris start
๐ณ Docker Installation
# Run with Docker
docker run -p 7860:7860 polarisllm/polarisllm
# Or with docker-compose
git clone https://github.com/polarisllm/polarisLLM.git
cd polarisLLM
docker-compose up -d
๐ ๏ธ Development Installation
git clone https://github.com/polarisllm/polarisLLM.git
cd polarisLLM
pip install -e .
pip install ms-swift[llm] --upgrade
๐ Quick Start Guide
Step 1: Install & Start โก
# Install PolarisLLM (includes ms-swift dependency)
pip install polarisllm
# Start the runtime engine
polarisllm
Server starts on http://localhost:7860 with beautiful web interface
Step 2: Verify Installation โ
# Check if server is running
curl http://localhost:7860/health
# View available models
curl http://localhost:7860/v1/models
# Open API documentation
# Visit: http://localhost:7860/docs
Step 3: Load and Use Models ๐ค
# First, load a model using ms-swift (use python -m swift if swift command not in PATH)
swift deploy --model_type qwen2_5 --model_id Qwen/Qwen2.5-7B-Instruct
# OR: python -m swift deploy --model_type qwen2_5 --model_id Qwen/Qwen2.5-7B-Instruct
# Then use with OpenAI-compatible API
curl -X POST "http://localhost:7860/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-7b-instruct",
"messages": [{"role": "user", "content": "Hello! How are you?"}]
}'
Step 4: Use with Your Favorite Tools ๐ง
# Works with OpenAI Python client
import openai
client = openai.OpenAI(
base_url="http://localhost:7860/v1",
api_key="not-required" # No API key needed!
)
response = client.chat.completions.create(
model="qwen2.5-7b-instruct",
messages=[{"role": "user", "content": "Write a Python function to sort a list"}]
)
print(response.choices[0].message.content)
๐ฎ Real-World Examples
Example 1: Multi-Model AI Assistant
# Load different specialized models
polaris load qwen2.5-7b-instruct # General chat
polaris load deepseek-coder-6.7b # Code generation
polaris load deepseek-vl-7b-chat # Vision understanding
# Use different models for different tasks
curl -X POST "http://localhost:7860/v1/chat/completions" \
-d '{"model": "deepseek-coder-6.7b", "messages": [{"role": "user", "content": "Write a REST API in FastAPI"}]}'
Example 2: LangChain Integration
from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
# Connect to your local PolarisLLM
llm = OpenAI(
openai_api_base="http://localhost:7860/v1",
openai_api_key="not-required",
model_name="qwen2.5-7b-instruct"
)
# Use with LangChain as usual
prompt = PromptTemplate(template="Explain {topic} in simple terms")
chain = LLMChain(llm=llm, prompt=prompt)
result = chain.run(topic="machine learning")
Example 3: Batch Processing
import asyncio
import aiohttp
async def process_documents():
models = ["qwen2.5-7b-instruct", "deepseek-coder-6.7b"]
for model in models:
async with aiohttp.ClientSession() as session:
async with session.post(
"http://localhost:7860/v1/chat/completions",
json={
"model": model,
"messages": [{"role": "user", "content": "Analyze this document..."}]
}
) as response:
result = await response.json()
print(f"{model}: {result}")
๐ ๏ธ Available Commands
Server Commands
# Start the runtime engine (main command)
polarisllm
# Alternative start commands
polaris-llm
polaris-server
# Start with custom options
polarisllm --host 0.0.0.0 --port 8080
# View help
polarisllm --help
Model Management (via ms-swift)
# List available models
swift list-models
# OR: python -m swift list-models
# Deploy a chat model
swift deploy --model_type qwen2_5 --model_id Qwen/Qwen2.5-7B-Instruct
# OR: python -m swift deploy --model_type qwen2_5 --model_id Qwen/Qwen2.5-7B-Instruct
# Deploy a vision model
swift deploy --model_type deepseek_vl --model_id deepseek-ai/deepseek-vl-7b-chat
# Deploy a code model
swift deploy --model_type deepseek --model_id deepseek-ai/deepseek-coder-6.7b-instruct
# Check deployment status
swift list
# OR: python -m swift list
๐ค Supported Models (300+)
PolarisLLM supports the entire ms-swift model ecosystem. Here are some popular choices:
๐ฏ General Chat Models
- Qwen2.5-7B-Instruct: Alibaba's flagship model - excellent for general tasks
- Llama-3.1-8B-Instruct: Meta's latest - great reasoning capabilities
- Mistral-7B-Instruct: Efficient and fast - perfect for production
- DeepSeek-V2.5: Advanced reasoning and long context support
๐ป Code Generation Models
- DeepSeek-Coder-6.7B: State-of-the-art code generation
- CodeQwen1.5-7B: Multi-language programming support
- Qwen2.5-Coder-7B: Latest coding model with enhanced capabilities
๐๏ธ Vision-Language Models
- DeepSeek-VL-7B-Chat: Advanced vision understanding
- Qwen2-VL-7B-Instruct: Multi-modal reasoning
- LLaVA-NeXT: Image analysis and description
๐ต Multi-Modal Models
- Qwen2-Audio: Speech and audio understanding
- Qwen2.5-Omni: Text, image, and audio in one model
See the complete list of 300+ supported models in our Model Catalog
๐ง Configuration
Runtime Configuration (config/runtime.yaml)
host: "0.0.0.0"
port_range_start: 8000
port_range_end: 8100
max_concurrent_models: 5
model_timeout: 300
env_vars:
CUDA_VISIBLE_DEVICES: "0"
HF_HUB_CACHE: "./cache/huggingface"
Model Configuration (config/models/*.yaml)
name: "custom-model"
model_id: "path/to/model"
model_type: "qwen2_5"
template: "qwen2_5"
description: "Custom model description"
tags: ["chat", "custom"]
swift_args:
max_length: 8192
temperature: 0.7
๐ API Endpoints
OpenAI Compatible Endpoints
POST /v1/chat/completions- Create chat completionGET /v1/models- List available models
Admin Endpoints
POST /admin/models/load- Load a modelPOST /admin/models/{model_name}/unload- Unload a modelGET /admin/models/{model_name}/status- Get model statusGET /admin/status- Get runtime statusGET /admin/models/available- List available model configurationsGET /admin/models/running- List running models
Utility Endpoints
GET /health- Health checkGET /- API information
๐ณ Docker Deployment
Basic Deployment
docker-compose up -d
With GPU Support
- Install nvidia-docker2
- Uncomment GPU section in docker-compose.yml
- Start with GPU access:
docker-compose up -d
With Redis Cache
docker-compose --profile with-cache up -d
๐ Monitoring
Health Checks
The runtime includes built-in health monitoring:
curl http://localhost:7860/health
Resource Monitoring
View real-time resource usage:
python cli.py status
Logs
# Local logs
tail -f polaris.log
# Docker logs
docker-compose logs -f polaris-runtime
๐ Integration Examples
Python Client
import openai
client = openai.OpenAI(
base_url="http://localhost:7860/v1",
api_key="not-required"
)
response = client.chat.completions.create(
model="deepseek-vl-7b-chat",
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)
JavaScript Client
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:7860/v1',
apiKey: 'not-required'
});
const completion = await client.chat.completions.create({
model: 'qwen2.5-7b-instruct',
messages: [
{ role: 'user', content: 'Hello!' }
]
});
console.log(completion.choices[0].message.content);
๐ก๏ธ Production Considerations
- Resource Management: Monitor GPU/CPU usage and memory consumption
- Load Balancing: Use reverse proxy for multiple runtime instances
- Security: Add authentication for admin endpoints
- Logging: Configure structured logging for production monitoring
- Scaling: Use Kubernetes for large-scale deployments
๐ค Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Use Cases
๐ข Enterprise & Startups
- Private AI Infrastructure: Keep models in-house for data privacy
- Cost Optimization: Reduce API costs by 90% compared to cloud providers
- Multi-tenant Applications: Serve different models to different customers
- A/B Testing: Compare model performance with easy switching
๐จโ๐ป Developers & Researchers
- Local Development: Test AI features without API costs
- Model Comparison: Evaluate different models on the same dataset
- Fine-tuning Pipeline: Deploy custom fine-tuned models
- Prototype Rapidly: Build AI applications with zero setup friction
๐ Educational & Training
- AI Courses: Provide students with hands-on LLM experience
- Research Projects: Access to latest models for academic research
- Hackathons: Quick setup for AI-focused competitions
๐ Why Choose PolarisLLM Over Alternatives?
| Feature | PolarisLLM | Ollama | text-generation-webui | OpenAI API |
|---|---|---|---|---|
| Multi-model serving | โ Concurrent | โ ๏ธ Sequential | โ Single | โ Multiple |
| OpenAI compatibility | โ Full | โ Limited | โ None | โ Native |
| Model variety | โ 300+ models | โ ๏ธ GGUF only | โ ๏ธ Limited | โ ๏ธ Proprietary |
| Production ready | โ Yes | โ ๏ธ Basic | โ No | โ Yes |
| Self-hosted | โ Yes | โ Yes | โ Yes | โ No |
| Cost | โ Free | โ Free | โ Free | ๐ฐ Expensive |
| Setup time | โ < 2 minutes | โ ๏ธ 5-10 min | โ 30+ min | โ Instant |
๐ค Community & Support
Getting Help
- ๐ Documentation: Comprehensive guides and API reference
- ๐ฌ GitHub Discussions: Community Q&A and feature requests
- ๐ Issue Tracking: Bug reports and feature requests
- ๐ง Email Support: contact@polarisllm.dev
Contributing
- ๐ด Fork & PR: Contributions welcome!
- ๐งช Testing: Help test new models and features
- ๐ Documentation: Improve guides and examples
- ๐ Translation: Help localize for global users
Stay Updated
- โญ Star us on GitHub: Get notifications for releases
- ๐ฆ Follow @PolarisLLM: Latest updates and tips
- ๐ฐ Newsletter: Monthly model updates and tutorials
๐ Architecture
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ CLI Client โ โ FastAPI Server โ โ Runtime Core โ
โ โโโโโโ โโโโโโ โ
โ - Model Mgmt โ โ - OpenAI API โ โ - Model Manager โ
โ - Status Check โ โ - Admin API โ โ - Process Mgmt โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ โ
โ โ
โโโโโโโโโโผโโโโโโโโโ โโโโโโโโโโโผโโโโโโโโโโ
โ Model Instance โ โ Model Instance โ
โ โ โ โ
โ - ms-swift โ โ - ms-swift โ
โ - Port 8000 โ โ - Port 8001 โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโ
๐ Acknowledgments
- Built on the excellent ms-swift framework
- Inspired by OpenAI's API design
- Thanks to the open-source LLM community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polarisllm-1.3.4.tar.gz.
File metadata
- Download URL: polarisllm-1.3.4.tar.gz
- Upload date:
- Size: 31.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4dd858dc42c1b2452f5b69336f0000332137fda0999b056c9b71983753efcba
|
|
| MD5 |
652b4a4797467ba71d0be74010e66f21
|
|
| BLAKE2b-256 |
f079dae05792b766186a76cb027a15826b9ed0d86ebdb50de8a8b339fc9969ec
|
File details
Details for the file polarisllm-1.3.4-py3-none-any.whl.
File metadata
- Download URL: polarisllm-1.3.4-py3-none-any.whl
- Upload date:
- Size: 25.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a337b112d1bf29aea800d3038db4c87e38a8011333bee067ef37a0f3b997434
|
|
| MD5 |
48b7615f7eb462140693885b73a7ce33
|
|
| BLAKE2b-256 |
b08f44b6a35492be13a48f8e01034431a570fa0c93726028ff2fbc60dce690d7
|