Skip to main content

๐ŸŒŸ The Ultimate Multi-Model LLM Runtime Platform - Deploy, manage, and serve 300+ language models with OpenAI-compatible APIs. Built on ms-swift for production-ready performance.

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

๐ŸŒŸ PolarisLLM Runtime Engine

The Ultimate Multi-Model LLM Runtime Platform

PolarisLLM is a production-ready, high-performance runtime engine that transforms how you deploy and serve Large Language Models. Built on the robust ms-swift framework, it provides seamless OpenAI-compatible APIs while enabling dynamic multi-model serving, making it the perfect solution for developers, researchers, and enterprises who need flexible, scalable LLM infrastructure.

๐ŸŽฏ Why PolarisLLM?

๐Ÿš€ Turn Any Server Into an LLM Powerhouse

  • Deploy multiple models simultaneously on a single machine
  • Switch between models without restarts or downtime
  • Support for 300+ models including Qwen, Llama, DeepSeek, Mistral, and more

โšก Production-Ready Performance

  • Built on battle-tested ms-swift framework
  • Automatic resource management and optimization
  • Real-time health monitoring and auto-recovery

๐Ÿ”Œ Drop-in OpenAI Compatibility

  • Use existing OpenAI client libraries without modification
  • Seamless integration with popular frameworks like LangChain, LlamaIndex
  • Perfect for migration from proprietary APIs to self-hosted solutions

โœจ Key Features

๐ŸŽ›๏ธ Dynamic Model Management

  • Hot-swap models without server restarts
  • Concurrent serving of multiple models on different ports
  • Intelligent resource allocation and memory management
  • Auto-scaling based on demand

๐Ÿ”— Universal Compatibility

  • OpenAI API compatible - works with existing tools and libraries
  • 300+ supported models from HuggingFace and ModelScope
  • Multi-modal support - text, vision, code, and audio models
  • Streaming responses for real-time applications

๐Ÿ› ๏ธ Developer Experience

  • Rich CLI interface with beautiful status displays
  • RESTful admin APIs for programmatic control
  • YAML configuration for easy model definitions
  • Comprehensive logging and error handling

๐Ÿ—๏ธ Production Ready

  • Docker containerization with GPU support
  • Health checks and automatic recovery
  • Resource monitoring and performance metrics
  • Horizontal scaling support

๐Ÿ“ฆ Installation

๐ŸŽ‰ Quick Install (Recommended)

# Install from PyPI - that's it!
pip install polarisllm

# Start the engine
polaris start

๐Ÿณ Docker Installation

# Run with Docker
docker run -p 7860:7860 polarisllm/polarisllm

# Or with docker-compose
git clone https://github.com/polarisllm/polarisLLM.git
cd polarisLLM
docker-compose up -d

๐Ÿ› ๏ธ Development Installation

git clone https://github.com/polarisllm/polarisLLM.git
cd polarisLLM
pip install -e .
pip install ms-swift[llm] --upgrade

๐Ÿš€ Quick Start Guide

Step 1: Install & Start โšก

# Install PolarisLLM (includes ms-swift dependency)
pip install polarisllm

# Start the runtime engine
polarisllm

Server starts on http://localhost:7860 with beautiful web interface

Step 2: Verify Installation โœ…

# Check if server is running
curl http://localhost:7860/health

# View available models
curl http://localhost:7860/v1/models

# Open API documentation
# Visit: http://localhost:7860/docs

Step 3: Load and Use Models ๐Ÿค–

# First, load a model using ms-swift (use python -m swift if swift command not in PATH)
swift deploy --model_type qwen2_5 --model_id Qwen/Qwen2.5-7B-Instruct
# OR: python -m swift deploy --model_type qwen2_5 --model_id Qwen/Qwen2.5-7B-Instruct

# Then use with OpenAI-compatible API
curl -X POST "http://localhost:7860/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-7b-instruct", 
    "messages": [{"role": "user", "content": "Hello! How are you?"}]
  }'

Step 4: Use with Your Favorite Tools ๐Ÿ”ง

# Works with OpenAI Python client
import openai

client = openai.OpenAI(
    base_url="http://localhost:7860/v1",
    api_key="not-required"  # No API key needed!
)

response = client.chat.completions.create(
    model="qwen2.5-7b-instruct",
    messages=[{"role": "user", "content": "Write a Python function to sort a list"}]
)
print(response.choices[0].message.content)

๐ŸŽฎ Real-World Examples

Example 1: Multi-Model AI Assistant

# Load different specialized models
polaris load qwen2.5-7b-instruct      # General chat
polaris load deepseek-coder-6.7b      # Code generation  
polaris load deepseek-vl-7b-chat      # Vision understanding

# Use different models for different tasks
curl -X POST "http://localhost:7860/v1/chat/completions" \
  -d '{"model": "deepseek-coder-6.7b", "messages": [{"role": "user", "content": "Write a REST API in FastAPI"}]}'

Example 2: LangChain Integration

from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

# Connect to your local PolarisLLM
llm = OpenAI(
    openai_api_base="http://localhost:7860/v1",
    openai_api_key="not-required",
    model_name="qwen2.5-7b-instruct"
)

# Use with LangChain as usual
prompt = PromptTemplate(template="Explain {topic} in simple terms")
chain = LLMChain(llm=llm, prompt=prompt)
result = chain.run(topic="machine learning")

Example 3: Batch Processing

import asyncio
import aiohttp

async def process_documents():
    models = ["qwen2.5-7b-instruct", "deepseek-coder-6.7b"]
    
    for model in models:
        async with aiohttp.ClientSession() as session:
            async with session.post(
                "http://localhost:7860/v1/chat/completions",
                json={
                    "model": model,
                    "messages": [{"role": "user", "content": "Analyze this document..."}]
                }
            ) as response:
                result = await response.json()
                print(f"{model}: {result}")

๐Ÿ› ๏ธ Available Commands

Server Commands

# Start the runtime engine (main command)
polarisllm

# Alternative start commands
polaris-llm
polaris-server

# Start with custom options
polarisllm --host 0.0.0.0 --port 8080

# View help
polarisllm --help

Model Management (via ms-swift)

# List available models
swift list-models
# OR: python -m swift list-models

# Deploy a chat model
swift deploy --model_type qwen2_5 --model_id Qwen/Qwen2.5-7B-Instruct
# OR: python -m swift deploy --model_type qwen2_5 --model_id Qwen/Qwen2.5-7B-Instruct

# Deploy a vision model  
swift deploy --model_type deepseek_vl --model_id deepseek-ai/deepseek-vl-7b-chat

# Deploy a code model
swift deploy --model_type deepseek --model_id deepseek-ai/deepseek-coder-6.7b-instruct

# Check deployment status
swift list
# OR: python -m swift list

๐Ÿค– Supported Models (300+)

PolarisLLM supports the entire ms-swift model ecosystem. Here are some popular choices:

๐ŸŽฏ General Chat Models

  • Qwen2.5-7B-Instruct: Alibaba's flagship model - excellent for general tasks
  • Llama-3.1-8B-Instruct: Meta's latest - great reasoning capabilities
  • Mistral-7B-Instruct: Efficient and fast - perfect for production
  • DeepSeek-V2.5: Advanced reasoning and long context support

๐Ÿ’ป Code Generation Models

  • DeepSeek-Coder-6.7B: State-of-the-art code generation
  • CodeQwen1.5-7B: Multi-language programming support
  • Qwen2.5-Coder-7B: Latest coding model with enhanced capabilities

๐Ÿ‘๏ธ Vision-Language Models

  • DeepSeek-VL-7B-Chat: Advanced vision understanding
  • Qwen2-VL-7B-Instruct: Multi-modal reasoning
  • LLaVA-NeXT: Image analysis and description

๐ŸŽต Multi-Modal Models

  • Qwen2-Audio: Speech and audio understanding
  • Qwen2.5-Omni: Text, image, and audio in one model

See the complete list of 300+ supported models in our Model Catalog

๐Ÿ”ง Configuration

Runtime Configuration (config/runtime.yaml)

host: "0.0.0.0"
port_range_start: 8000
port_range_end: 8100
max_concurrent_models: 5
model_timeout: 300
env_vars:
  CUDA_VISIBLE_DEVICES: "0"
  HF_HUB_CACHE: "./cache/huggingface"

Model Configuration (config/models/*.yaml)

name: "custom-model"
model_id: "path/to/model"
model_type: "qwen2_5"
template: "qwen2_5"
description: "Custom model description"
tags: ["chat", "custom"]
swift_args:
  max_length: 8192
  temperature: 0.7

๐ŸŒ API Endpoints

OpenAI Compatible Endpoints

  • POST /v1/chat/completions - Create chat completion
  • GET /v1/models - List available models

Admin Endpoints

  • POST /admin/models/load - Load a model
  • POST /admin/models/{model_name}/unload - Unload a model
  • GET /admin/models/{model_name}/status - Get model status
  • GET /admin/status - Get runtime status
  • GET /admin/models/available - List available model configurations
  • GET /admin/models/running - List running models

Utility Endpoints

  • GET /health - Health check
  • GET / - API information

๐Ÿณ Docker Deployment

Basic Deployment

docker-compose up -d

With GPU Support

  1. Install nvidia-docker2
  2. Uncomment GPU section in docker-compose.yml
  3. Start with GPU access:
docker-compose up -d

With Redis Cache

docker-compose --profile with-cache up -d

๐Ÿ“Š Monitoring

Health Checks

The runtime includes built-in health monitoring:

curl http://localhost:7860/health

Resource Monitoring

View real-time resource usage:

python cli.py status

Logs

# Local logs
tail -f polaris.log

# Docker logs
docker-compose logs -f polaris-runtime

๐Ÿ”Œ Integration Examples

Python Client

import openai

client = openai.OpenAI(
    base_url="http://localhost:7860/v1",
    api_key="not-required"
)

response = client.chat.completions.create(
    model="deepseek-vl-7b-chat",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

JavaScript Client

import OpenAI from 'openai';

const client = new OpenAI({
    baseURL: 'http://localhost:7860/v1',
    apiKey: 'not-required'
});

const completion = await client.chat.completions.create({
    model: 'qwen2.5-7b-instruct',
    messages: [
        { role: 'user', content: 'Hello!' }
    ]
});

console.log(completion.choices[0].message.content);

๐Ÿ›ก๏ธ Production Considerations

  1. Resource Management: Monitor GPU/CPU usage and memory consumption
  2. Load Balancing: Use reverse proxy for multiple runtime instances
  3. Security: Add authentication for admin endpoints
  4. Logging: Configure structured logging for production monitoring
  5. Scaling: Use Kubernetes for large-scale deployments

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐ŸŒŸ Use Cases

๐Ÿข Enterprise & Startups

  • Private AI Infrastructure: Keep models in-house for data privacy
  • Cost Optimization: Reduce API costs by 90% compared to cloud providers
  • Multi-tenant Applications: Serve different models to different customers
  • A/B Testing: Compare model performance with easy switching

๐Ÿ‘จโ€๐Ÿ’ป Developers & Researchers

  • Local Development: Test AI features without API costs
  • Model Comparison: Evaluate different models on the same dataset
  • Fine-tuning Pipeline: Deploy custom fine-tuned models
  • Prototype Rapidly: Build AI applications with zero setup friction

๐Ÿ“š Educational & Training

  • AI Courses: Provide students with hands-on LLM experience
  • Research Projects: Access to latest models for academic research
  • Hackathons: Quick setup for AI-focused competitions

๐Ÿ† Why Choose PolarisLLM Over Alternatives?

Feature PolarisLLM Ollama text-generation-webui OpenAI API
Multi-model serving โœ… Concurrent โš ๏ธ Sequential โŒ Single โœ… Multiple
OpenAI compatibility โœ… Full โŒ Limited โŒ None โœ… Native
Model variety โœ… 300+ models โš ๏ธ GGUF only โš ๏ธ Limited โš ๏ธ Proprietary
Production ready โœ… Yes โš ๏ธ Basic โŒ No โœ… Yes
Self-hosted โœ… Yes โœ… Yes โœ… Yes โŒ No
Cost โœ… Free โœ… Free โœ… Free ๐Ÿ’ฐ Expensive
Setup time โœ… < 2 minutes โš ๏ธ 5-10 min โŒ 30+ min โœ… Instant

๐Ÿค Community & Support

Getting Help

  • ๐Ÿ“– Documentation: Comprehensive guides and API reference
  • ๐Ÿ’ฌ GitHub Discussions: Community Q&A and feature requests
  • ๐Ÿ› Issue Tracking: Bug reports and feature requests
  • ๐Ÿ“ง Email Support: contact@polarisllm.dev

Contributing

  • ๐Ÿด Fork & PR: Contributions welcome!
  • ๐Ÿงช Testing: Help test new models and features
  • ๐Ÿ“ Documentation: Improve guides and examples
  • ๐ŸŒ Translation: Help localize for global users

Stay Updated

  • โญ Star us on GitHub: Get notifications for releases
  • ๐Ÿฆ Follow @PolarisLLM: Latest updates and tips
  • ๐Ÿ“ฐ Newsletter: Monthly model updates and tutorials

๐Ÿ”„ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   CLI Client    โ”‚    โ”‚   FastAPI Server โ”‚    โ”‚  Runtime Core   โ”‚
โ”‚                 โ”‚โ”€โ”€โ”€โ”€โ”‚                  โ”‚โ”€โ”€โ”€โ”€โ”‚                 โ”‚
โ”‚ - Model Mgmt    โ”‚    โ”‚ - OpenAI API     โ”‚    โ”‚ - Model Manager โ”‚
โ”‚ - Status Check  โ”‚    โ”‚ - Admin API      โ”‚    โ”‚ - Process Mgmt  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                โ”‚                        โ”‚
                                โ”‚                        โ”‚
                       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                       โ”‚  Model Instance โ”‚    โ”‚  Model Instance   โ”‚
                       โ”‚                 โ”‚    โ”‚                   โ”‚
                       โ”‚ - ms-swift      โ”‚    โ”‚ - ms-swift        โ”‚
                       โ”‚ - Port 8000     โ”‚    โ”‚ - Port 8001       โ”‚
                       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐ŸŽ‰ Acknowledgments

  • Built on the excellent ms-swift framework
  • Inspired by OpenAI's API design
  • Thanks to the open-source LLM community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polarisllm-1.3.3.tar.gz (31.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polarisllm-1.3.3-py3-none-any.whl (25.7 kB view details)

Uploaded Python 3

File details

Details for the file polarisllm-1.3.3.tar.gz.

File metadata

  • Download URL: polarisllm-1.3.3.tar.gz
  • Upload date:
  • Size: 31.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for polarisllm-1.3.3.tar.gz
Algorithm Hash digest
SHA256 bf2c89ea1ea0876b4eee2927cf8e2291d0e197dbe17f32da47e15df840f39beb
MD5 99c0df0809f998f2efcb26e2b85bcbfb
BLAKE2b-256 aeba5724383e661e96197f34f363d5bf4613c77d039610a81d4d769366e6f2c3

See more details on using hashes here.

File details

Details for the file polarisllm-1.3.3-py3-none-any.whl.

File metadata

  • Download URL: polarisllm-1.3.3-py3-none-any.whl
  • Upload date:
  • Size: 25.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for polarisllm-1.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 957fce6dbc723bac9786fd39f2cb76b8eee8d6bfa2876250820addcbacd63272
MD5 5014e3758632a44e5f29f77c3b5e2dfb
BLAKE2b-256 bfc84af0ab6b7f8a88f11f8c9eb1a8a98258d75b41b33230edab5b004b1337c0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page