Skip to main content

sageLLM: Modular LLM inference engine with PD separation for domestic computing power

Project description

sageLLM

Protocol Compliance (Mandatory)

🚀 Modular LLM Inference Engine for Domestic Computing Power

Ollama-like experience for Chinese hardware ecosystems (Huawei Ascend, NVIDIA)


✨ Features

  • 🎯 One-Click Install - pip install isagellm gets you started immediately
  • 🧠 CPU-First - Default CPU engine, no GPU required
  • 🇨🇳 Domestic Hardware - First-class support for Huawei Ascend NPU
  • 📊 Observable - Built-in metrics (TTFT, TBT, throughput, KV usage)
  • 🧩 Plugin System - Extend with custom backends and engines

📦 Quick Install

# Install sageLLM (CPU-first, no GPU required)
pip install isagellm

# With Control Plane (request routing & scheduling)
pip install 'isagellm[control-plane]'

# With API Gateway (OpenAI-compatible REST API)
pip install 'isagellm[gateway]'

# Full server (Control Plane + Gateway)
pip install 'isagellm[server]'

# With CUDA support
pip install 'isagellm[cuda]'

# All features
pip install 'isagellm[all]'

🚀 国内加速安装 PyTorch(推荐)

由于 PyTorch CUDA 版本从官方源下载较慢(~800MB),我们在 GitHub Releases 提供预先下载的 wheels:

# 方法 1:使用 sagellm CLI (推荐,最简单)
pip install isagellm
sage-llm install cuda --github     # 从 GitHub 下载,快速
sage-llm install cuda              # 从官方源下载(默认)

# 方法 2:直接使用 pip --find-links
pip install torch==2.5.1+cu121 torchvision torchaudio \
  --find-links https://github.com/intellistream/sagellm-pytorch-wheels/releases/download/v2.5.1-cu121/ \
  --trusted-host github.com

其他支持的后端

  • sage-llm install ascend - 华为昇腾 NPU
  • sage-llm install kunlun - 百度昆仑 XPU
  • sage-llm install haiguang - 海光 DCU
  • sage-llm install cpu - CPU-only(最小下载)

💡 为什么使用 GitHub 加速?

  • ✅ 国内访问速度快(GitHub CDN)
  • ✅ 无需配置镜像源
  • ✅ 官方 wheels,100% 可信

📦 Wheels 仓库: https://github.com/intellistream/sagellm-pytorch-wheels

🚀 Quick Start

CLI (像 vLLM/Ollama 一样简单)

# 一键启动(完整栈:Gateway + Engine)
pip install 'isagellm[gateway]'
sage-llm serve --model Qwen2-7B

# ✅ OpenAI API 自动可用
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen2-7B",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# 查看系统信息
sage-llm info

# 单次推理(不启动服务器)
sage-llm run -p "What is LLM inference?"

# 高级用法:分布式部署(分别启动各组件)
sage-llm serve --engine-only --port 9000   # 仅引擎
sage-llm gateway --port 8000                # 仅 Gateway

Python API (Control Plane - Recommended)

import asyncio

from sagellm import ControlPlaneManager, BackendConfig, EngineConfig

# Install with: pip install 'isagellm[control-plane]'
async def main() -> None:
    manager = ControlPlaneManager(
        backend_config=BackendConfig(kind="cpu", device="cpu"),
        engine_configs=[
            EngineConfig(
                kind="cpu",
                model="sshleifer/tiny-gpt2",
                model_path="sshleifer/tiny-gpt2"
            )
        ]
    )

    await manager.start()
    try:
        # Requests are automatically routed to available engines
        response = await manager.execute_request(
            prompt="Hello, world!",
            max_tokens=128
        )
        print(response.output_text)
        print(f"TTFT: {response.metrics.ttft_ms:.2f} ms")
        print(f"Throughput: {response.metrics.throughput_tps:.2f} tokens/s")
    finally:
        await manager.stop()


asyncio.run(main())

⚠️ Important: Direct engine creation (create_engine()) is not exported from the umbrella package. All production code must use ControlPlaneManager for proper request routing, scheduling, and lifecycle management.

Configuration

# ~/.sage-llm/config.yaml
backend:
  kind: cpu  # Options: cpu, pytorch-cuda, pytorch-ascend
  device: cpu

engine:
  kind: cpu
  model: sshleifer/tiny-gpt2

control_plane:
  endpoint: "localhost:8080"

📊 Metrics & Validation

sageLLM provides comprehensive performance metrics:

{
  "ttft_ms": 45.2,
  "tbt_ms": 12.5,
  "throughput_tps": 80.0,
  "peak_mem_mb": 24576,
  "kv_used_tokens": 4096,
  "prefix_hit_rate": 0.85
}

Run benchmarks:

sage-llm demo --workload year1 --output metrics.json

🏗️ Architecture

isagellm (umbrella package)
├── isagellm-protocol       # Protocol v0.1 types
│   └── Request, Response, Metrics, Error, StreamEvent
├── isagellm-backend        # Hardware abstraction (L1 - Foundation)
│   └── BackendProvider, CPUBackend, (CUDABackend, AscendBackend)
├── isagellm-comm           # Communication primitives (L2 - Infrastructure)
│   └── Topology, CollectiveOps (all_reduce/gather), P2P (send/recv), Overlap
├── isagellm-kv-cache       # KV cache management (L2 - Optional)
│   └── PrefixCache, MemoryPool, EvictionPolicies, Predictor, KV Transfer
├── isagellm-compression    # Inference acceleration (quantization, sparsity, etc.) (L2 - Optional)
│   └── Quantization, Sparsity, SpeculativeDecoding, Fusion
├── isagellm-core           # Engine core & runtime (L3)
│   └── Config, Engine, Factory, DemoRunner, Adapters (vLLM/LMDeploy)
├── isagellm-control-plane  # Request routing & scheduling (L4 - Optional)
│   └── ControlPlaneManager, Router, Policies, Lifecycle
└── isagellm-gateway        # OpenAI-compatible REST API (L5 - Optional)
    └── FastAPI server, /v1/chat/completions, Session management

🔧 Development

Quick Setup (Development Mode)

# Clone all repositories
./scripts/clone-all-repos.sh

# Install all packages in editable mode
./quickstart.sh

# Open all repos in VS Code Multi-root Workspace
code sagellm.code-workspace

📖 See WORKSPACE_GUIDE.md for Multi-root Workspace usage.

Testing

# Clone and setup
git clone https://github.com/IntelliStream/sagellm.git
cd sagellm
pip install -e ".[dev]"

# Run tests
pytest -v

# Format & lint
ruff format .
ruff check . --fix

# Type check
mypy src/sagellm/

# Verify dependency hierarchy
python scripts/verify_dependencies.py

📖 Development Resources


📚 Documentation Index

用户文档

开发者文档

API 文档

子包文档

📚 Package Details

Package PyPI Name Import Name Description
sagellm isagellm sagellm Umbrella package (install this)
sagellm-protocol isagellm-protocol sagellm_protocol Protocol v0.1 types
sagellm-core isagellm-core sagellm_core Runtime & config
sagellm-backend isagellm-backend sagellm_backend Hardware abstraction

📄 License

Proprietary - IntelliStream. Internal use only.


Built with ❤️ by IntelliStream Team for domestic AI infrastructure

# test

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

isagellm-0.4.0.28.tar.gz (83.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

isagellm-0.4.0.28-py2.py3-none-any.whl (82.1 kB view details)

Uploaded Python 2Python 3

File details

Details for the file isagellm-0.4.0.28.tar.gz.

File metadata

  • Download URL: isagellm-0.4.0.28.tar.gz
  • Upload date:
  • Size: 83.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for isagellm-0.4.0.28.tar.gz
Algorithm Hash digest
SHA256 ff0ff9047446889b92fe52794bb5d595bfa57b046b6e5c620d778f87c7c7f459
MD5 35bcbe5ba6ca4a5e1e85daf33b7c78e4
BLAKE2b-256 770b71915e9ab6a80a885d1e105ce58faaf60d854a3c2a6888a242b8fb3d1aab

See more details on using hashes here.

File details

Details for the file isagellm-0.4.0.28-py2.py3-none-any.whl.

File metadata

  • Download URL: isagellm-0.4.0.28-py2.py3-none-any.whl
  • Upload date:
  • Size: 82.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for isagellm-0.4.0.28-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 f680e3c0eabaffdb1a32bcb90d1c64e4d5924353d18d7abb04e0819bdf9e9b13
MD5 a65eac6c0e5ff5c31e154d4d54bb19df
BLAKE2b-256 3be6a38ef3894ec954ac5b7c668a6f7357512a825ee41d47be07c6c30af9af10

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page