sageLLM: Modular LLM inference engine with PD separation for domestic computing power
Project description
sageLLM
Protocol Compliance (Mandatory)
- MUST follow Protocol v0.1: https://github.com/intellistream/sagellm-docs/blob/main/docs/specs/protocol_v0.1.md
- Any globally shared definitions (fields, error codes, metrics, IDs, schemas) MUST be added to Protocol first.
🚀 Modular LLM Inference Engine for Domestic Computing Power
Ollama-like experience for Chinese hardware ecosystems (Huawei Ascend, NVIDIA)
✨ Features
- 🎯 One-Click Install -
pip install isagellmgets you started immediately - 🧠 CPU-First - Default CPU engine, no GPU required
- 🇨🇳 Domestic Hardware - First-class support for Huawei Ascend NPU
- 📊 Observable - Built-in metrics (TTFT, TBT, throughput, KV usage)
- 🧩 Plugin System - Extend with custom backends and engines
📦 Quick Install
# Install sageLLM (CPU-first, no GPU required)
pip install isagellm
# With Control Plane (request routing & scheduling)
pip install 'isagellm[control-plane]'
# With API Gateway (OpenAI-compatible REST API)
pip install 'isagellm[gateway]'
# Full server (Control Plane + Gateway)
pip install 'isagellm[server]'
# With CUDA support
pip install 'isagellm[cuda]'
# All features
pip install 'isagellm[all]'
🚀 Quick Start
CLI (像 vLLM/Ollama 一样简单)
# 一键启动(完整栈:Gateway + Engine)
pip install 'isagellm[gateway]'
sage-llm serve --model Qwen2-7B
# ✅ OpenAI API 自动可用
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen2-7B",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# 查看系统信息
sage-llm info
# 单次推理(不启动服务器)
sage-llm run -p "What is LLM inference?"
# 高级用法:分布式部署(分别启动各组件)
sage-llm serve --engine-only --port 9000 # 仅引擎
sage-llm gateway --port 8000 # 仅 Gateway
Python API (Control Plane - Recommended)
import asyncio
from sagellm import ControlPlaneManager, BackendConfig, EngineConfig
# Install with: pip install 'isagellm[control-plane]'
async def main() -> None:
manager = ControlPlaneManager(
backend_config=BackendConfig(kind="cpu", device="cpu"),
engine_configs=[
EngineConfig(
kind="cpu",
model="sshleifer/tiny-gpt2",
model_path="sshleifer/tiny-gpt2"
)
]
)
await manager.start()
try:
# Requests are automatically routed to available engines
response = await manager.execute_request(
prompt="Hello, world!",
max_tokens=128
)
print(response.output_text)
print(f"TTFT: {response.metrics.ttft_ms:.2f} ms")
print(f"Throughput: {response.metrics.throughput_tps:.2f} tokens/s")
finally:
await manager.stop()
asyncio.run(main())
⚠️ Important: Direct engine creation (create_engine()) is not exported from the umbrella
package. All production code must use ControlPlaneManager for proper request routing, scheduling,
and lifecycle management.
Configuration
# ~/.sage-llm/config.yaml
backend:
kind: cpu # Options: cpu, pytorch-cuda, pytorch-ascend
device: cpu
engine:
kind: cpu
model: sshleifer/tiny-gpt2
control_plane:
endpoint: "localhost:8080"
📊 Metrics & Validation
sageLLM provides comprehensive performance metrics:
{
"ttft_ms": 45.2,
"tbt_ms": 12.5,
"throughput_tps": 80.0,
"peak_mem_mb": 24576,
"kv_used_tokens": 4096,
"prefix_hit_rate": 0.85
}
Run benchmarks:
sage-llm demo --workload year1 --output metrics.json
🏗️ Architecture
isagellm (umbrella package)
├── isagellm-protocol # Protocol v0.1 types
│ └── Request, Response, Metrics, Error, StreamEvent
├── isagellm-backend # Hardware abstraction (L1 - Foundation)
│ └── BackendProvider, CPUBackend, (CUDABackend, AscendBackend)
├── isagellm-comm # Communication primitives (L2 - Infrastructure)
│ └── Topology, CollectiveOps (all_reduce/gather), P2P (send/recv), Overlap
├── isagellm-kv-cache # KV cache management (L2 - Optional)
│ └── PrefixCache, MemoryPool, EvictionPolicies, Predictor, KV Transfer
├── isagellm-compression # Inference acceleration (quantization, sparsity, etc.) (L2 - Optional)
│ └── Quantization, Sparsity, SpeculativeDecoding, Fusion
├── isagellm-core # Engine core & runtime (L3)
│ └── Config, Engine, Factory, DemoRunner, Adapters (vLLM/LMDeploy)
├── isagellm-control-plane # Request routing & scheduling (L4 - Optional)
│ └── ControlPlaneManager, Router, Policies, Lifecycle
└── isagellm-gateway # OpenAI-compatible REST API (L5 - Optional)
└── FastAPI server, /v1/chat/completions, Session management
🔧 Development
Quick Setup (Development Mode)
# Clone all repositories
./scripts/clone-all-repos.sh
# Install all packages in editable mode
./quickstart.sh
# Open all repos in VS Code Multi-root Workspace
code sagellm.code-workspace
📖 See WORKSPACE_GUIDE.md for Multi-root Workspace usage.
Testing
# Clone and setup
git clone https://github.com/IntelliStream/sagellm.git
cd sagellm
pip install -e ".[dev]"
# Run tests
pytest -v
# Format & lint
ruff format .
ruff check . --fix
# Type check
mypy src/sagellm/
# Verify dependency hierarchy
python scripts/verify_dependencies.py
📖 Development Resources
- DEPLOYMENT_GUIDE.md - 完整部署与配置指南
- TROUBLESHOOTING.md - 故障排查快速参考
- ENVIRONMENT_VARIABLES.md - 环境变量完整参考
- DEVELOPER_GUIDE.md - 开发者指南
- WORKSPACE_GUIDE.md - Multi-root Workspace 使用
- INFERENCE_FLOW.md - 推理流程详解
- PR_CHECKLIST.md - Pull Request 检查清单
📚 Documentation Index
用户文档
开发者文档
- 开发指南 - 贡献代码
- 架构设计 - 系统架构
- Workspace 使用 - Multi-root 工作区
- PR 检查清单 - 提交前检查
API 文档
- OpenAI 兼容 API - 参见 sagellm-gateway
- Python API - 参见 API_REFERENCE.md(待补充)
子包文档
-
sagellm-protocol - 协议定义
-
sagellm-backend - 后端抽象
-
sagellm-core - 引擎核心
-
sagellm-control-plane - 控制面
-
sagellm-gateway - API 网关
-
sagellm-benchmark - 基准测试
-
DEVELOPER_GUIDE.md - 架构规范与开发指南
-
PR_CHECKLIST.md - Pull Request 审查清单
-
scripts/verify_dependencies.py - 依赖层次验证
📚 Package Details
| Package | PyPI Name | Import Name | Description |
|---|---|---|---|
| sagellm | isagellm |
sagellm |
Umbrella package (install this) |
| sagellm-protocol | isagellm-protocol |
sagellm_protocol |
Protocol v0.1 types |
| sagellm-core | isagellm-core |
sagellm_core |
Runtime & config |
| sagellm-backend | isagellm-backend |
sagellm_backend |
Hardware abstraction |
📄 License
Proprietary - IntelliStream. Internal use only.
Built with ❤️ by IntelliStream Team for domestic AI infrastructure
# testProject details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file isagellm-0.3.1.4.tar.gz.
File metadata
- Download URL: isagellm-0.3.1.4.tar.gz
- Upload date:
- Size: 64.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
206c2a72e787b4297bdf6bb1ca86f7995c979fe13c774884b5d2a6db4a17664c
|
|
| MD5 |
67648e697a26a88458192d7885829459
|
|
| BLAKE2b-256 |
547c54cb67f2a52ed0769c31c114e73f122865376ad0a9442aa6b4ada2bff417
|
File details
Details for the file isagellm-0.3.1.4-py2.py3-none-any.whl.
File metadata
- Download URL: isagellm-0.3.1.4-py2.py3-none-any.whl
- Upload date:
- Size: 63.0 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c668ddc421afe358d69e6d12326a6882ac463ac571705282ade39801aacc8a8
|
|
| MD5 |
ce1c1a23159bc9031e87e76f37be21ac
|
|
| BLAKE2b-256 |
d90ea7628230bb1ad23f6231be4341cdd1fd8122cbaa48db46e75fe682835158
|