Skip to main content

sageLLM: Modular LLM inference engine for domestic computing power (Huawei Ascend, NVIDIA)

Project description

sageLLM

🚀 Modular LLM Inference Engine for Domestic Computing Power

Ollama-like experience for Chinese hardware ecosystems (Huawei Ascend, NVIDIA)


✨ Features

  • 🎯 One-Click Install - pip install isagellm gets you started immediately
  • 🔌 Mock-First - Test without GPU, perfect for CI/CD
  • 🇨🇳 Domestic Hardware - First-class support for Huawei Ascend NPU
  • 📊 Observable - Built-in metrics (TTFT, TBT, throughput, KV usage)
  • 🧩 Plugin System - Extend with custom backends and engines

📦 Quick Install

# Install sageLLM (includes mock backend, no GPU required)
pip install isagellm

# With Control Plane (request routing & scheduling)
pip install 'isagellm[control-plane]'

# With API Gateway (OpenAI-compatible REST API)
pip install 'isagellm[gateway]'

# Full server (Control Plane + Gateway)
pip install 'isagellm[server]'

# With CUDA support
pip install 'isagellm[cuda]'

# All features
pip install 'isagellm[all]'

🚀 Quick Start

启动模式选择

sageLLM 支持两种启动模式,满足不同场景需求:

模式 使用场景 依赖 命令示例
Mock CI/测试/本地开发 无需 GPU sage-llm serve --mock
生产 真实推理服务 GPU/CPU 后端 sage-llm serve --control-plane

⚠️ Fail-Fast 保证:非 mock 模式下,若依赖缺失或配置错误,系统将立即报错退出,不会静默回退到 mock 模式。

CLI (like ollama)

# Show system info
sage-llm info

# Mock 模式(无 GPU 依赖)
sage-llm serve --mock
sage-llm run -p "What is LLM inference?" --mock
sage-llm demo --workload year1 --mock

# 生产模式(需要安装 control-plane)
# pip install 'isagellm[server]'
sage-llm serve --control-plane
sage-llm gateway --control-plane --port 8080

# 如果缺少依赖,将看到:
# ❌ Error: Control Plane required but not installed
#    Install: pip install 'isagellm[control-plane]'

Python API

from sagellm import Request, MockEngine

# Create mock engine (no GPU needed)
engine = MockEngine()

# Run inference
request = Request(
    request_id="demo-001",
    prompt="Hello, world!",
    max_tokens=128,
)
response = engine.generate(request)

print(f"Response: {response.text}")
print(f"TTFT: {response.metrics.ttft_ms:.2f} ms")
print(f"Throughput: {response.metrics.throughput_tps:.2f} tokens/s")

Configuration

# ~/.sage-llm/config.yaml
backend:
  kind: mock  # Options: mock, cpu, cuda, ascend

  # Fail-fast 配置:如果指定了非 mock 后端但不可用,将报错退出
  strict_mode: true  # 默认为 true,符合申报书要求

engine:
  kind: mock
  model: Qwen/Qwen2-7B

# Mock 模式配置
mock:
  enabled: false      # true 时强制使用 mock,无论其他配置
  deterministic: true # mock 输出是否固定(用于回归测试)

# 生产模式最低要求
production:
  control_plane:
    required: true    # true 时缺少 control-plane 将报错(非 mock 模式)
    endpoint: "localhost:8080"
  backend:
    required: true    # true 时缺少真实后端将报错
    fallback_to_mock: false  # 禁止自动降级到 mock(fail-fast)

workload:
  segments:
    - short   # 128 in → 128 out
    - long    # 2048 in → 512 out
    - stress  # concurrent requests

📊 Year 1 Demo Contract

sageLLM must produce these metrics for validation:

{
  "ttft_ms": 45.2,
  "tbt_ms": 12.5,
  "throughput_tps": 80.0,
  "peak_mem_mb": 24576,
  "kv_used_tokens": 4096,
  "prefix_hit_rate": 0.85,
  "evict_count": 3
}

Run validation:

sage-llm demo --workload year1 --output metrics.json

🏗️ Architecture

isagellm (umbrella package)
├── isagellm-protocol       # Protocol v0.1 types
│   └── Request, Response, Metrics, Error, StreamEvent
├── isagellm-core           # Runtime & Demo Runner
│   └── Config, Engine, Factory, DemoRunner
├── isagellm-backend        # Hardware abstraction
│   └── BackendProvider, MockBackend, (CUDABackend, AscendBackend)
├── isagellm-control-plane  # Request routing & scheduling (optional)
│   └── ControlPlaneManager, Router, Policies, Lifecycle
└── isagellm-gateway        # OpenAI-compatible REST API (optional)
    └── FastAPI server, /v1/chat/completions, Session management

🔧 Development

Quick Setup (Development Mode)

# Clone all repositories
./scripts/clone-all-repos.sh

# Install all packages in editable mode
./quickstart.sh

# Open all repos in VS Code Multi-root Workspace
code sagellm.code-workspace

📖 See WORKSPACE_GUIDE.md for Multi-root Workspace usage.

Testing

# Clone and setup
git clone https://github.com/IntelliStream/sagellm.git
cd sagellm
pip install -e ".[dev]"

# Run tests
pytest -v

# Format & lint
ruff format .
ruff check . --fix

# Type check
mypy src/sagellm/

# Verify dependency hierarchy
python scripts/verify_dependencies.py

📖 Development Resources


📚 Documentation Index

用户文档

开发者文档

API 文档

子包文档

📚 Package Details

Package PyPI Name Import Name Description
sagellm isagellm sagellm Umbrella package (install this)
sagellm-protocol isagellm-protocol sagellm_protocol Protocol v0.1 types
sagellm-core isagellm-core sagellm_core Runtime & config
sagellm-backend isagellm-backend sagellm_backend Hardware abstraction

🎯 Roadmap

  • Year 1: Core inference with KV cache, prefix sharing, basic eviction
  • Year 2: Multi-node inference, advanced scheduling
  • Year 3: Full production-ready deployment

📄 License

Proprietary - IntelliStream. Internal use only.


Built with ❤️ by IntelliStream Team for domestic AI infrastructure

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

isagellm-0.1.0.8-cp311-none-any.whl (53.2 kB view details)

Uploaded CPython 3.11

File details

Details for the file isagellm-0.1.0.8-cp311-none-any.whl.

File metadata

  • Download URL: isagellm-0.1.0.8-cp311-none-any.whl
  • Upload date:
  • Size: 53.2 kB
  • Tags: CPython 3.11
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for isagellm-0.1.0.8-cp311-none-any.whl
Algorithm Hash digest
SHA256 8d7c48b345e15a674141c39cc75e7f3e55916756a51bbed56976e08ddbcc1fe8
MD5 aaace4e82c587af58a8167b26c91c27d
BLAKE2b-256 9f1f93c5895ba0854f1d284364b68df0f806193a059165767e7178030da76f8a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page