A one-click LLM serving deployment tool for Kubernetes

These details have not been verified by PyPI

Project links

Project description

NorthServing

A one-click LLM serving deployment tool for Kubernetes with Volcano job scheduling.

Overview

NorthServing (北服) is a Python-based tool that simplifies the deployment and management of Large Language Model (LLM) serving infrastructure on Kubernetes. It provides a unified command-line interface for deploying models using various backends (vLLM, SGLang, etc.) with support for multi-node, multi-GPU configurations.

Features

🚀 One-Click Deployment: Launch LLM serving with a single command
🔄 Multiple Backends: Support for vLLM, SGLang, and other inference engines
📊 Performance Benchmarking: Built-in benchmarking tools with Feishu reporting
🌐 Multi-Cluster Support: Deploy across different Kubernetes clusters
⚡ Advanced Configurations:
- PD (Prefill-Decode) separation mode
- Multi-node deployments with Ray
- Tensor/Pipeline parallelism
- Custom resource scheduling
🧪 Well-Tested: Comprehensive test suite with >80% coverage

Installation

Prerequisites

Python >= 3.8
Kubernetes cluster with Volcano scheduler
kubectl configured
Access to Infrawave API (for job management)

Install from PyPI

# Install from internal PyPI server
pip install northserve -i http://10.51.6.7:31624/simple/ --trusted-host 10.51.6.7 --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

# Or configure pip once (recommended)
mkdir -p ~/.pip
cat > ~/.pip/pip.conf << 'EOF'
[global]
index-url = http://10.51.6.7:31624/simple/
trusted-host = 10.51.6.7
EOF

# Then install normally
pip install northserve

Install from Source

git clone https://github.com/china-qijizhifeng/NorthServing.git
cd NorthServing

# Install dependencies from Tsinghua mirror
pip install -r requirements.txt -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

# Install in development mode
pip install -e .

For detailed installation guide, see INSTALL.md

Configuration

Set up your credentials using environment variables:

# Add to your ~/.bashrc or ~/.zshrc
export INFRAWAVES_USERNAME='your_username'
export INFRAWAVES_PASSWORD='your_password'

# Apply changes
source ~/.bashrc  # or source ~/.zshrc

Quick Start

Launch a Model

northserve launch \
  --model-name qwen2-72b-instruct \
  --model-path /gpfs/models/huggingface.co/Qwen/Qwen2-72B-Instruct/ \
  --replicas 1 \
  --gpus-per-pod 8 \
  --profile generation

List Running Models

northserve list

Stop a Model

northserve stop --model-name qwen2-72b-instruct

Command Reference

`northserve launch`

Launch a new LLM serving deployment.

Required Options:

--model-name: Model name for identification
--model-path: Path to model weights (optional for some backends)

Common Options:

--backend: Inference backend (default: vllm)
- vllm: vLLM inference engine
- sglang: SGLang inference engine
- bp-vllm: BP-optimized vLLM
- crossing: Crossing inference engine
--protocol: API protocol (default: openai)
- openai: OpenAI-compatible API
- anthropic: Anthropic-compatible API
--replicas: Number of replicas (default: 1)
--gpus-per-pod: GPUs per pod (default: 1)
--pods-per-job: Pods per job for multi-node (default: 1)
--gpu-type: GPU type - gpu, h20, 4090d
--namespace: Kubernetes namespace (default: qiji)
--priority-class-name: Priority class (default: low-priority-job)

Advanced Options:

--extra-cmds: Additional command-line arguments for the engine
--extra-envs: Extra environment variables (KEY=value KEY2=value2)
--tensor-parallel-size: Tensor parallelism (defaults to gpus-per-pod)
--pipeline-parallel-size: Pipeline parallelism (default: 1)
--prefill-nodes: Prefill nodes for PD separation (SGLang only)
--decode-nodes: Decode nodes for PD separation (SGLang only)
--use-host-network: Use host network
--standalone: Create standalone service with NodePort
-y, --yes: Skip confirmation prompts

Examples:

Simple deployment:

northserve launch --model-name llama2-7b --model-path /gpfs/models/llama2-7b --gpus-per-pod 1

Multi-GPU deployment:

northserve launch \
  --model-name qwen2-72b \
  --model-path /gpfs/models/qwen2-72b \
  --gpus-per-pod 8 \
  --replicas 2

With custom arguments:

northserve launch \
  --model-name mistral-large \
  --model-path /gpfs/models/mistral-large \
  --gpus-per-pod 8 \
  --extra-cmds "--max-num-batched-tokens=16384 --max-model-len=16384 --enforce-eager"

PD separation mode (SGLang):

northserve launch \
  --model-name qwen2-72b \
  --model-path /gpfs/models/qwen2-72b \
  --backend sglang \
  --gpus-per-pod 8 \
  --prefill-nodes 2 \
  --decode-nodes 4 \
  --minilb-replicas 4

`northserve stop`

Stop a running deployment.

Options:

--model-name: Model name to stop (required)
--backend: Backend type (default: vllm)
--namespace: Kubernetes namespace (default: qiji)
--standalone: Stop standalone service
-y, --yes: Skip confirmation

Example:

northserve stop --model-name qwen2-72b-instruct

`northserve list`

List all deployed models and their status.

Example:

northserve list

`northserve benchmark`

Performance benchmarking commands.

`northserve benchmark launch`

Launch a benchmark test on a running deployment.

Options:

--model-name: Model name to benchmark (required)
--model-path: Path to model weights (required)
--backend: Backend type
--namespace: Kubernetes namespace

Example:

northserve benchmark launch \
  --model-name qwen2-72b \
  --model-path /gpfs/models/qwen2-72b \
  --backend vllm

`northserve benchmark report`

Report benchmark results to Feishu.

Options:

--log-path: Path to benchmark logs (required)
--config-file: Path to Feishu config file (required)

Example:

northserve benchmark report \
  --log-path ~/.northserve/logs/qwen2-72b-vllm-server-0 \
  --config-file ~/.northserve/feishu.json

`northserve launch_north_llm_api`

Launch the North LLM API service.

Options:

--version: Version to deploy (default: v0.2.3)
--replicas: Number of replicas (default: 1)
--namespace: Kubernetes namespace (default: qiji)

Example:

northserve launch_north_llm_api --version v0.2.3 --replicas 2

`northserve stop_north_llm_api`

Stop the North LLM API service.

Example:

northserve stop_north_llm_api --version v0.2.3

Architecture

NorthServing follows a modular architecture:

┌─────────────────────────────────────┐
│         CLI Interface (Click)        │
└──────────────┬──────────────────────┘
               │
    ┌──────────┴──────────┐
    │                     │
┌───▼────┐         ┌─────▼──────┐
│Commands│         │Core Logic  │
└───┬────┘         └─────┬──────┘
    │                    │
    │    ┌───────────────┴────────────────┐
    │    │                                 │
    │ ┌──▼──────────┐          ┌─────────▼────────┐
    │ │Job Manager  │          │Config Builder    │
    │ └──┬──────────┘          └─────────┬────────┘
    │    │                                │
    │ ┌──▼────────────┐        ┌─────────▼────────┐
    └─┤API Clients    │        │Template Renderer │
      └───────────────┘        └──────────────────┘

Key Components

CLI Layer: Click-based command-line interface
Commands: Individual command implementations (launch, stop, list, etc.)
Core Logic:
- JobManager: Orchestrates deployment lifecycle
- ConfigBuilder: Builds deployment configurations
- TemplateRenderer: Jinja2 template rendering
- BenchmarkEngine: Performance testing
API Clients:
- InfrawaveClient: Infrawave API integration
- KubernetesClient: Direct kubectl operations
Models: Type-safe data models with validation
Utils: Validators, logger, helpers

Development

Running Tests

# Install dev dependencies
pip install -r requirements-dev.txt

# Run all tests
pytest

# Run with coverage
pytest --cov=northserve --cov-report=html

# Run specific test file
pytest tests/test_core/test_config_builder.py

Code Quality

# Format code
black northserve tests

# Sort imports
isort northserve tests

# Lint
flake8 northserve tests

# Type checking
mypy northserve

Migration from Shell Version

The Python version maintains backward compatibility with the shell-based version:

Same Command Interface: All commands work the same way
Same Configuration Files: YAML configs and templates unchanged
Same Output: Identical deployment behavior

To use the new version, simply install it and use northserve instead of the old shell script.

Troubleshooting

Common Issues

"Config file not found"

Ensure ~/.config/northjob/userinfo.conf exists with valid credentials

"Failed to create job"

Check Infrawave API connectivity
Verify your credentials are correct
Ensure you have permissions for the namespace

"Template not found"

Make sure you installed from the repository root
YAML templates should be in yaml_templates/ directory

"Invalid backend"

Use one of: vllm, sglang, bp-vllm, crossing
Note: nla-vllm is deprecated, use bp-vllm

Debug Mode

Enable debug logging:

export NORTHSERVE_LOG_LEVEL=DEBUG
northserve launch ...

Skip auto-update checks:

export NORTHSERVE_SKIP_UPDATE=1
northserve launch ...

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Make your changes with tests
Run the test suite
Submit a pull request

License

See LICENSE file for details.

Support

For issues and questions:

GitHub Issues: https://github.com/china-qijizhifeng/NorthServing/issues
Documentation: See this README and inline help (northserve --help)

Why NorthServing?

✅ Training-Inference Unified Scheduling: Uses Volcano jobs compatible with training workloads
✅ Multi-Backend Support: Unified interface for different inference engines
✅ Cross-Cluster Deployment: Deploy to multiple clusters with unified ingress
✅ Production Ready: Mature codebase with comprehensive testing
✅ Easy Automation: Command-line interface perfect for CI/CD pipelines

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.6

Jan 20, 2026

This version

2.0.5

Jan 20, 2026

2.0.4

Jan 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

northserve-2.0.5.tar.gz (57.9 kB view details)

Uploaded Jan 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

northserve-2.0.5-py3-none-any.whl (76.2 kB view details)

Uploaded Jan 20, 2026 Python 3

File details

Details for the file northserve-2.0.5.tar.gz.

File metadata

Download URL: northserve-2.0.5.tar.gz
Upload date: Jan 20, 2026
Size: 57.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for northserve-2.0.5.tar.gz
Algorithm	Hash digest
SHA256	`42f4fe60c7a63f05c856b8645af2da5e87267ef186ee68478a6a713d45be82af`
MD5	`ae876f60dd225fe7b3b5912a7f94c574`
BLAKE2b-256	`fb575b4e5289d58929653d09c77bb37fa45cab9b14596d268a8fec5330743ec0`

See more details on using hashes here.

File details

Details for the file northserve-2.0.5-py3-none-any.whl.

File metadata

Download URL: northserve-2.0.5-py3-none-any.whl
Upload date: Jan 20, 2026
Size: 76.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for northserve-2.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1dfeef03890c290181876a73e61e1bcd381f072851f9d1e296ad1919f8b30712`
MD5	`cb9168f70f22965ffa2989d206b160c8`
BLAKE2b-256	`9ae058a71347a2ab102b41f1c616379810e063e544d471f1ae91a5d0e052af68`

See more details on using hashes here.

northserve 2.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

NorthServing

Overview

Features

Installation

Prerequisites

Install from PyPI

Install from Source

Configuration

Quick Start

Launch a Model

List Running Models

Stop a Model

Command Reference

northserve launch

northserve stop

northserve list

northserve benchmark

northserve benchmark launch

northserve benchmark report

northserve launch_north_llm_api

northserve stop_north_llm_api

Architecture

Key Components

Development

Running Tests

Code Quality

Migration from Shell Version

Troubleshooting

Common Issues

Debug Mode

Contributing

License

Support

Why NorthServing?

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`northserve launch`

`northserve stop`

`northserve list`

`northserve benchmark`

`northserve benchmark launch`

`northserve benchmark report`

`northserve launch_north_llm_api`

`northserve stop_north_llm_api`