sageLLM: Modular LLM inference engine with PD separation for domestic computing power

These details have not been verified by PyPI

Project links

Project description

sageLLM

Protocol Compliance (Mandatory)

MUST follow Protocol v0.1: https://github.com/intellistream/sagellm-docs/blob/main/docs/specs/protocol_v0.1.md
Any globally shared definitions (fields, error codes, metrics, IDs, schemas) MUST be added to Protocol first.

🚀 Modular LLM Inference Engine for Domestic Computing Power

Ollama-like experience for Chinese hardware ecosystems (Huawei Ascend, NVIDIA)

✨ Features

🎯 One-Click Install - pip install isagellm gets you started immediately
🧠 CPU-First - Default CPU engine, no GPU required
🇨🇳 Domestic Hardware - First-class support for Huawei Ascend NPU
📊 Observable - Built-in metrics (TTFT, TBT, throughput, KV usage)
🧩 Plugin System - Extend with custom backends and engines

📦 Quick Install

# Install sageLLM (CPU-first, no GPU required)
pip install isagellm

# With Control Plane (request routing & scheduling)
pip install 'isagellm[control-plane]'

# With API Gateway (OpenAI-compatible REST API)
pip install 'isagellm[gateway]'

# Full server (Control Plane + Gateway)
pip install 'isagellm[server]'

# With CUDA support
pip install 'isagellm[cuda]'

# All features
pip install 'isagellm[all]'

🚀 Quick Start

CLI (像 vLLM/Ollama 一样简单)

# 一键启动（完整栈：Gateway + Engine）
pip install 'isagellm[gateway]'
sage-llm serve --model Qwen2-7B

# ✅ OpenAI API 自动可用
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen2-7B",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# 查看系统信息
sage-llm info

# 单次推理（不启动服务器）
sage-llm run -p "What is LLM inference?"

# 高级用法：分布式部署（分别启动各组件）
sage-llm serve --engine-only --port 9000   # 仅引擎
sage-llm gateway --port 8000                # 仅 Gateway

Python API (Control Plane - Recommended)

import asyncio

from sagellm import ControlPlaneManager, BackendConfig, EngineConfig

# Install with: pip install 'isagellm[control-plane]'
async def main() -> None:
    manager = ControlPlaneManager(
        backend_config=BackendConfig(kind="cpu", device="cpu"),
        engine_configs=[
            EngineConfig(
                kind="cpu",
                model="sshleifer/tiny-gpt2",
                model_path="sshleifer/tiny-gpt2"
            )
        ]
    )

    await manager.start()
    try:
        # Requests are automatically routed to available engines
        response = await manager.execute_request(
            prompt="Hello, world!",
            max_tokens=128
        )
        print(response.output_text)
        print(f"TTFT: {response.metrics.ttft_ms:.2f} ms")
        print(f"Throughput: {response.metrics.throughput_tps:.2f} tokens/s")
    finally:
        await manager.stop()


asyncio.run(main())

⚠️ Important: Direct engine creation (create_engine()) is not exported from the umbrella package. All production code must use ControlPlaneManager for proper request routing, scheduling, and lifecycle management.

Configuration

# ~/.sage-llm/config.yaml
backend:
  kind: cpu  # Options: cpu, pytorch-cuda, pytorch-ascend
  device: cpu

engine:
  kind: cpu
  model: sshleifer/tiny-gpt2

control_plane:
  endpoint: "localhost:8080"

📊 Metrics & Validation

sageLLM provides comprehensive performance metrics:

{
  "ttft_ms": 45.2,
  "tbt_ms": 12.5,
  "throughput_tps": 80.0,
  "peak_mem_mb": 24576,
  "kv_used_tokens": 4096,
  "prefix_hit_rate": 0.85
}

Run benchmarks:

sage-llm demo --workload year1 --output metrics.json

🏗️ Architecture

isagellm (umbrella package)
├── isagellm-protocol       # Protocol v0.1 types
│   └── Request, Response, Metrics, Error, StreamEvent
├── isagellm-backend        # Hardware abstraction (L1 - Foundation)
│   └── BackendProvider, CPUBackend, (CUDABackend, AscendBackend)
├── isagellm-comm           # Communication primitives (L2 - Infrastructure)
│   └── Topology, CollectiveOps (all_reduce/gather), P2P (send/recv), Overlap
├── isagellm-kv-cache       # KV cache management (L2 - Optional)
│   └── PrefixCache, MemoryPool, EvictionPolicies, Predictor, KV Transfer
├── isagellm-compression    # Inference acceleration (quantization, sparsity, etc.) (L2 - Optional)
│   └── Quantization, Sparsity, SpeculativeDecoding, Fusion
├── isagellm-core           # Engine core & runtime (L3)
│   └── Config, Engine, Factory, DemoRunner, Adapters (vLLM/LMDeploy)
├── isagellm-control-plane  # Request routing & scheduling (L4 - Optional)
│   └── ControlPlaneManager, Router, Policies, Lifecycle
└── isagellm-gateway        # OpenAI-compatible REST API (L5 - Optional)
    └── FastAPI server, /v1/chat/completions, Session management

🔧 Development

Quick Setup (Development Mode)

# Clone all repositories
./scripts/clone-all-repos.sh

# Install all packages in editable mode
./quickstart.sh

# Open all repos in VS Code Multi-root Workspace
code sagellm.code-workspace

📖 See WORKSPACE_GUIDE.md for Multi-root Workspace usage.

Testing

# Clone and setup
git clone https://github.com/IntelliStream/sagellm.git
cd sagellm
pip install -e ".[dev]"

# Run tests
pytest -v

# Format & lint
ruff format .
ruff check . --fix

# Type check
mypy src/sagellm/

# Verify dependency hierarchy
python scripts/verify_dependencies.py

📖 Development Resources

DEPLOYMENT_GUIDE.md - 完整部署与配置指南
TROUBLESHOOTING.md - 故障排查快速参考
ENVIRONMENT_VARIABLES.md - 环境变量完整参考
DEVELOPER_GUIDE.md - 开发者指南
WORKSPACE_GUIDE.md - Multi-root Workspace 使用
INFERENCE_FLOW.md - 推理流程详解
PR_CHECKLIST.md - Pull Request 检查清单

📚 Documentation Index

用户文档

快速开始 - 5 分钟上手
部署指南 - 生产环境部署
配置参考 - 完整配置选项
环境变量 - 环境变量参考
故障排查 - 常见问题解决

开发者文档

开发指南 - 贡献代码
架构设计 - 系统架构
Workspace 使用 - Multi-root 工作区
PR 检查清单 - 提交前检查

API 文档

OpenAI 兼容 API - 参见 sagellm-gateway
Python API - 参见 API_REFERENCE.md（待补充）

子包文档

sagellm-protocol - 协议定义
sagellm-backend - 后端抽象
sagellm-core - 引擎核心
sagellm-control-plane - 控制面
sagellm-gateway - API 网关
sagellm-benchmark - 基准测试
DEVELOPER_GUIDE.md - 架构规范与开发指南
PR_CHECKLIST.md - Pull Request 审查清单
scripts/verify_dependencies.py - 依赖层次验证

📚 Package Details

Package	PyPI Name	Import Name	Description
sagellm	`isagellm`	`sagellm`	Umbrella package (install this)
sagellm-protocol	`isagellm-protocol`	`sagellm_protocol`	Protocol v0.1 types
sagellm-core	`isagellm-core`	`sagellm_core`	Runtime & config
sagellm-backend	`isagellm-backend`	`sagellm_backend`	Hardware abstraction

📄 License

Proprietary - IntelliStream. Internal use only.

_{Built with ❤️ by IntelliStream Team for domestic AI infrastructure}

# test

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.5.4.70

Mar 12, 2026

0.5.4.59

Mar 6, 2026

0.5.4.58

Mar 5, 2026

0.5.4.55

Mar 5, 2026

0.5.4.44

Mar 4, 2026

0.5.4.43

Mar 4, 2026

0.5.4.42

Mar 4, 2026

0.5.4.41

Mar 4, 2026

0.5.4.39

Mar 4, 2026

0.5.4.38

Mar 4, 2026

0.5.4.37

Mar 4, 2026

0.5.4.36

Mar 4, 2026

0.5.4.35

Mar 4, 2026

0.5.4.34

Mar 3, 2026

0.5.4.33

Mar 3, 2026

0.5.4.32

Mar 3, 2026

0.5.4.31

Mar 3, 2026

0.5.4.30

Mar 3, 2026

0.5.4.29

Mar 3, 2026

0.5.4.27

Mar 3, 2026

0.5.4.26

Mar 3, 2026

0.5.4.25

Mar 3, 2026

0.5.4.24

Mar 3, 2026

0.5.4.23

Mar 3, 2026

0.5.4.22

Mar 1, 2026

0.5.4.18

Mar 1, 2026

0.5.4.17

Mar 1, 2026

0.5.4.16

Mar 1, 2026

0.5.4.15

Mar 1, 2026

0.5.4.14

Mar 1, 2026

0.5.4.13

Mar 1, 2026

0.5.4.11

Mar 1, 2026

0.5.4.10

Mar 1, 2026

0.5.4.9

Mar 1, 2026

0.5.4.3

Feb 28, 2026

0.5.4.1

Feb 27, 2026

0.5.4.0

Feb 27, 2026

0.5.3.18

Feb 27, 2026

0.5.3.17

Feb 27, 2026

0.5.3.15

Feb 27, 2026

0.5.3.14

Feb 26, 2026

0.5.3.13

Feb 26, 2026

0.5.3.12

Feb 26, 2026

0.5.3.8

Feb 26, 2026

0.5.3.6

Feb 26, 2026

0.5.3.4

Feb 26, 2026

0.5.3.3

Feb 24, 2026

0.5.3.2

Feb 23, 2026

0.5.3.1

Feb 23, 2026

0.5.3.0

Feb 23, 2026

0.5.2.0

Feb 23, 2026

0.5.1.9

Feb 23, 2026

0.5.1.8

Feb 20, 2026

0.5.1.7

Feb 20, 2026

0.5.1.6

Feb 20, 2026

0.5.1.5

Feb 20, 2026

0.5.1.4

Feb 20, 2026

0.5.1.3

Feb 20, 2026

0.5.1.2

Feb 19, 2026

0.5.1.1

Feb 18, 2026

0.5.1.0

Feb 17, 2026

0.4.2.2

Feb 17, 2026

0.4.2.1

Feb 15, 2026

0.4.2.0

Feb 12, 2026

0.4.1.17

Feb 7, 2026

0.4.1.16

Feb 7, 2026

0.4.1.10

Feb 3, 2026

0.4.1.2

Feb 1, 2026

0.4.1.1

Feb 1, 2026

0.4.1.0

Jan 31, 2026

0.4.0.37

Jan 31, 2026

0.4.0.36

Jan 31, 2026

0.4.0.35

Jan 31, 2026

0.4.0.34

Jan 30, 2026

0.4.0.33

Jan 30, 2026

0.4.0.32

Jan 30, 2026

0.4.0.31

Jan 30, 2026

0.4.0.30

Jan 30, 2026

0.4.0.29

Jan 30, 2026

0.4.0.28

Jan 30, 2026

0.4.0.27

Jan 30, 2026

0.4.0.26

Jan 30, 2026

0.4.0.25

Jan 30, 2026

0.4.0.24

Jan 30, 2026

0.4.0.23

Jan 30, 2026

0.4.0.22

Jan 30, 2026

0.4.0.21

Jan 30, 2026

0.4.0.20

Jan 30, 2026

0.4.0.19

Jan 30, 2026

0.4.0.17

Jan 30, 2026

0.4.0.16

Jan 30, 2026

0.4.0.15

Jan 30, 2026

0.4.0.14

Jan 30, 2026

0.4.0.13

Jan 30, 2026

0.4.0.12

Jan 30, 2026

0.4.0.11

Jan 30, 2026

0.4.0.10

Jan 30, 2026

0.4.0.9

Jan 30, 2026

0.4.0.8

Jan 30, 2026

This version

0.4.0.7

Jan 30, 2026

0.4.0.6

Jan 30, 2026

0.4.0.5

Jan 30, 2026

0.4.0.4

Jan 30, 2026

0.4.0.3

Jan 30, 2026

0.4.0.2

Jan 29, 2026

0.4.0.1

Jan 29, 2026

0.3.1.8

Jan 29, 2026

0.3.1.7

Jan 29, 2026

0.3.1.6

Jan 29, 2026

0.3.1.5

Jan 29, 2026

0.3.1.4

Jan 29, 2026

0.3.1.3

Jan 28, 2026

0.3.1.2

Jan 28, 2026

0.3.1.1

Jan 28, 2026

0.3.1.0

Jan 28, 2026

0.3.0.22

Jan 28, 2026

0.3.0.21

Jan 27, 2026

0.3.0.20

Jan 27, 2026

0.3.0.19

Jan 27, 2026

0.3.0.18

Jan 27, 2026

0.3.0.17

Jan 27, 2026

0.3.0.16

Jan 27, 2026

0.3.0.15

Jan 27, 2026

0.3.0.14

Jan 27, 2026

0.3.0.13

Jan 27, 2026

0.3.0.12

Jan 27, 2026

0.3.0.11

Jan 27, 2026

0.3.0.9

Jan 27, 2026

0.3.0.8

Jan 27, 2026

0.3.0.6

Jan 27, 2026

0.3.0.5

Jan 27, 2026

0.3.0.4

Jan 27, 2026

0.3.0.3

Jan 27, 2026

0.3.0.2

Jan 27, 2026

0.3.0.1

Jan 27, 2026

0.3.0.0

Jan 27, 2026

0.2.3.3

Jan 26, 2026

0.2.3.2

Jan 26, 2026

0.2.3.1

Jan 26, 2026

0.2.3.0

Jan 26, 2026

0.2.2.8

Jan 25, 2026

0.2.2.7

Jan 25, 2026

0.2.2.4

Jan 25, 2026

0.2.2.3

Jan 25, 2026

0.2.2.2

Jan 21, 2026

0.2.2.1

Jan 21, 2026

0.2.2.0

Jan 20, 2026

0.2.1.0

Jan 20, 2026

0.2.0.0

Jan 20, 2026

0.1.0.10

Jan 18, 2026

0.1.0.8

Jan 17, 2026

0.1.0.7

Jan 17, 2026

0.1.0.6

Jan 17, 2026

0.1.0.5

Jan 17, 2026

0.1.0.4

Jan 17, 2026

0.1.0.3

Jan 17, 2026

0.1.0.2

Jan 15, 2026

0.1.0.1

Jan 15, 2026

0.1.0

Jan 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

isagellm-0.4.0.7.tar.gz (65.3 kB view details)

Uploaded Jan 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

isagellm-0.4.0.7-py2.py3-none-any.whl (63.9 kB view details)

Uploaded Jan 30, 2026 Python 2Python 3

File details

Details for the file isagellm-0.4.0.7.tar.gz.

File metadata

Download URL: isagellm-0.4.0.7.tar.gz
Upload date: Jan 30, 2026
Size: 65.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for isagellm-0.4.0.7.tar.gz
Algorithm	Hash digest
SHA256	`ba0b3d6de633ed3364e0ee48cc6d580d3821b7e7e164e23ce209eec864e5c004`
MD5	`1d232f9bd7c900d6ead26ef9d354ec86`
BLAKE2b-256	`57adcd1e56167a2c2e55ca0bc6f56c9bed977a14d4d202b6138f846cdf548282`

See more details on using hashes here.

File details

Details for the file isagellm-0.4.0.7-py2.py3-none-any.whl.

File metadata

Download URL: isagellm-0.4.0.7-py2.py3-none-any.whl
Upload date: Jan 30, 2026
Size: 63.9 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for isagellm-0.4.0.7-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`0b7d2645c6629587b4f391e48870c53db21a574e908894e955c277dc22a1c1cc`
MD5	`797ae408e5d727ff5e7a309ec4fed17a`
BLAKE2b-256	`6f98ce39e91b1ffac6c5512bfbe596907980dcd3f6062492d1e0788dde11b35c`

See more details on using hashes here.

isagellm 0.4.0.7

Navigation

Verified details

Owner

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sageLLM

Protocol Compliance (Mandatory)

✨ Features

📦 Quick Install

🚀 Quick Start

CLI (像 vLLM/Ollama 一样简单)

Python API (Control Plane - Recommended)

Configuration

📊 Metrics & Validation

🏗️ Architecture

🔧 Development

Quick Setup (Development Mode)

Testing

📖 Development Resources

📚 Documentation Index

用户文档

开发者文档

API 文档

子包文档

📚 Package Details

📄 License

Project details

Verified details

Owner

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes