sageLLM: Modular LLM inference engine for domestic computing power (Huawei Ascend, NVIDIA)

These details have not been verified by PyPI

Project links

Project description

sageLLM

🚀 Modular LLM Inference Engine for Domestic Computing Power

Ollama-like experience for Chinese hardware ecosystems (Huawei Ascend, NVIDIA)

✨ Features

🎯 One-Click Install - pip install isagellm gets you started immediately
🔌 Mock-First - Test without GPU, perfect for CI/CD
🇨🇳 Domestic Hardware - First-class support for Huawei Ascend NPU
📊 Observable - Built-in metrics (TTFT, TBT, throughput, KV usage)
🧩 Plugin System - Extend with custom backends and engines

📦 Quick Install

# Install sageLLM (includes mock backend, no GPU required)
pip install isagellm

# With Control Plane (request routing & scheduling)
pip install 'isagellm[control-plane]'

# With API Gateway (OpenAI-compatible REST API)
pip install 'isagellm[gateway]'

# Full server (Control Plane + Gateway)
pip install 'isagellm[server]'

# With CUDA support
pip install 'isagellm[cuda]'

# All features
pip install 'isagellm[all]'

🚀 Quick Start

CLI (like ollama)

# Show system info
sage-llm info

# Start mock server (no GPU required)
sage-llm serve --mock

# Single inference
sage-llm run -p "What is LLM inference?" --mock

# Run Year1 demo validation
sage-llm demo --workload year1 --mock

# Start OpenAI-compatible API gateway
sage-llm gateway --mock --port 8080

Python API

from sagellm import Request, MockEngine

# Create mock engine (no GPU needed)
engine = MockEngine()

# Run inference
request = Request(
    request_id="demo-001",
    prompt="Hello, world!",
    max_tokens=128,
)
response = engine.generate(request)

print(f"Response: {response.text}")
print(f"TTFT: {response.metrics.ttft_ms:.2f} ms")
print(f"Throughput: {response.metrics.throughput_tps:.2f} tokens/s")

Configuration

# ~/.sage-llm/config.yaml
backend:
  kind: mock  # or: cuda, ascend

engine:
  kind: mock
  model: Qwen/Qwen2-7B

workload:
  segments:
    - short   # 128 in → 128 out
    - long    # 2048 in → 512 out
    - stress  # concurrent requests

📊 Year 1 Demo Contract

sageLLM must produce these metrics for validation:

{
  "ttft_ms": 45.2,
  "tbt_ms": 12.5,
  "throughput_tps": 80.0,
  "peak_mem_mb": 24576,
  "kv_used_tokens": 4096,
  "prefix_hit_rate": 0.85,
  "evict_count": 3
}

Run validation:

sage-llm demo --workload year1 --output metrics.json

🏗️ Architecture

isagellm (umbrella package)
├── isagellm-protocol       # Protocol v0.1 types
│   └── Request, Response, Metrics, Error, StreamEvent
├── isagellm-core           # Runtime & Demo Runner
│   └── Config, Engine, Factory, DemoRunner
├── isagellm-backend        # Hardware abstraction
│   └── BackendProvider, MockBackend, (CUDABackend, AscendBackend)
├── isagellm-control-plane  # Request routing & scheduling (optional)
│   └── ControlPlaneManager, Router, Policies, Lifecycle
└── isagellm-gateway        # OpenAI-compatible REST API (optional)
    └── FastAPI server, /v1/chat/completions, Session management

🔧 Development

Quick Setup (Development Mode)

# Clone all repositories
./scripts/clone-all-repos.sh

# Install all packages in editable mode
./quickstart.sh

# Open all repos in VS Code Multi-root Workspace
code sagellm.code-workspace

📖 See WORKSPACE_GUIDE.md for Multi-root Workspace usage.

Testing

# Clone and setup
git clone https://github.com/IntelliStream/sagellm.git
cd sagellm
pip install -e ".[dev]"

# Run tests
pytest -v

# Format & lint
ruff format .
ruff check . --fix

# Type check
mypy src/sagellm/

# Verify dependency hierarchy
python scripts/verify_dependencies.py

📖 Development Resources

DEVELOPER_GUIDE.md - 架构规范与开发指南
PR_CHECKLIST.md - Pull Request 审查清单
scripts/verify_dependencies.py - 依赖层次验证

📚 Package Details

Package	PyPI Name	Import Name	Description
sagellm	`isagellm`	`sagellm`	Umbrella package (install this)
sagellm-protocol	`isagellm-protocol`	`sagellm_protocol`	Protocol v0.1 types
sagellm-core	`isagellm-core`	`sagellm_core`	Runtime & config
sagellm-backend	`isagellm-backend`	`sagellm_backend`	Hardware abstraction

🎯 Roadmap

Year 1: Core inference with KV cache, prefix sharing, basic eviction
Year 2: Multi-node inference, advanced scheduling
Year 3: Full production-ready deployment

📄 License

Proprietary - IntelliStream. Internal use only.

_{Built with ❤️ by IntelliStream Team for domestic AI infrastructure}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.5.4.70

Mar 12, 2026

0.5.4.59

Mar 6, 2026

0.5.4.58

Mar 5, 2026

0.5.4.55

Mar 5, 2026

0.5.4.44

Mar 4, 2026

0.5.4.43

Mar 4, 2026

0.5.4.42

Mar 4, 2026

0.5.4.41

Mar 4, 2026

0.5.4.39

Mar 4, 2026

0.5.4.38

Mar 4, 2026

0.5.4.37

Mar 4, 2026

0.5.4.36

Mar 4, 2026

0.5.4.35

Mar 4, 2026

0.5.4.34

Mar 3, 2026

0.5.4.33

Mar 3, 2026

0.5.4.32

Mar 3, 2026

0.5.4.31

Mar 3, 2026

0.5.4.30

Mar 3, 2026

0.5.4.29

Mar 3, 2026

0.5.4.27

Mar 3, 2026

0.5.4.26

Mar 3, 2026

0.5.4.25

Mar 3, 2026

0.5.4.24

Mar 3, 2026

0.5.4.23

Mar 3, 2026

0.5.4.22

Mar 1, 2026

0.5.4.18

Mar 1, 2026

0.5.4.17

Mar 1, 2026

0.5.4.16

Mar 1, 2026

0.5.4.15

Mar 1, 2026

0.5.4.14

Mar 1, 2026

0.5.4.13

Mar 1, 2026

0.5.4.11

Mar 1, 2026

0.5.4.10

Mar 1, 2026

0.5.4.9

Mar 1, 2026

0.5.4.3

Feb 28, 2026

0.5.4.1

Feb 27, 2026

0.5.4.0

Feb 27, 2026

0.5.3.18

Feb 27, 2026

0.5.3.17

Feb 27, 2026

0.5.3.15

Feb 27, 2026

0.5.3.14

Feb 26, 2026

0.5.3.13

Feb 26, 2026

0.5.3.12

Feb 26, 2026

0.5.3.8

Feb 26, 2026

0.5.3.6

Feb 26, 2026

0.5.3.4

Feb 26, 2026

0.5.3.3

Feb 24, 2026

0.5.3.2

Feb 23, 2026

0.5.3.1

Feb 23, 2026

0.5.3.0

Feb 23, 2026

0.5.2.0

Feb 23, 2026

0.5.1.9

Feb 23, 2026

0.5.1.8

Feb 20, 2026

0.5.1.7

Feb 20, 2026

0.5.1.6

Feb 20, 2026

0.5.1.5

Feb 20, 2026

0.5.1.4

Feb 20, 2026

0.5.1.3

Feb 20, 2026

0.5.1.2

Feb 19, 2026

0.5.1.1

Feb 18, 2026

0.5.1.0

Feb 17, 2026

0.4.2.2

Feb 17, 2026

0.4.2.1

Feb 15, 2026

0.4.2.0

Feb 12, 2026

0.4.1.17

Feb 7, 2026

0.4.1.16

Feb 7, 2026

0.4.1.10

Feb 3, 2026

0.4.1.2

Feb 1, 2026

0.4.1.1

Feb 1, 2026

0.4.1.0

Jan 31, 2026

0.4.0.37

Jan 31, 2026

0.4.0.36

Jan 31, 2026

0.4.0.35

Jan 31, 2026

0.4.0.34

Jan 30, 2026

0.4.0.33

Jan 30, 2026

0.4.0.32

Jan 30, 2026

0.4.0.31

Jan 30, 2026

0.4.0.30

Jan 30, 2026

0.4.0.29

Jan 30, 2026

0.4.0.28

Jan 30, 2026

0.4.0.27

Jan 30, 2026

0.4.0.26

Jan 30, 2026

0.4.0.25

Jan 30, 2026

0.4.0.24

Jan 30, 2026

0.4.0.23

Jan 30, 2026

0.4.0.22

Jan 30, 2026

0.4.0.21

Jan 30, 2026

0.4.0.20

Jan 30, 2026

0.4.0.19

Jan 30, 2026

0.4.0.17

Jan 30, 2026

0.4.0.16

Jan 30, 2026

0.4.0.15

Jan 30, 2026

0.4.0.14

Jan 30, 2026

0.4.0.13

Jan 30, 2026

0.4.0.12

Jan 30, 2026

0.4.0.11

Jan 30, 2026

0.4.0.10

Jan 30, 2026

0.4.0.9

Jan 30, 2026

0.4.0.8

Jan 30, 2026

0.4.0.7

Jan 30, 2026

0.4.0.6

Jan 30, 2026

0.4.0.5

Jan 30, 2026

0.4.0.4

Jan 30, 2026

0.4.0.3

Jan 30, 2026

0.4.0.2

Jan 29, 2026

0.4.0.1

Jan 29, 2026

0.3.1.8

Jan 29, 2026

0.3.1.7

Jan 29, 2026

0.3.1.6

Jan 29, 2026

0.3.1.5

Jan 29, 2026

0.3.1.4

Jan 29, 2026

0.3.1.3

Jan 28, 2026

0.3.1.2

Jan 28, 2026

0.3.1.1

Jan 28, 2026

0.3.1.0

Jan 28, 2026

0.3.0.22

Jan 28, 2026

0.3.0.21

Jan 27, 2026

0.3.0.20

Jan 27, 2026

0.3.0.19

Jan 27, 2026

0.3.0.18

Jan 27, 2026

0.3.0.17

Jan 27, 2026

0.3.0.16

Jan 27, 2026

0.3.0.15

Jan 27, 2026

0.3.0.14

Jan 27, 2026

0.3.0.13

Jan 27, 2026

0.3.0.12

Jan 27, 2026

0.3.0.11

Jan 27, 2026

0.3.0.9

Jan 27, 2026

0.3.0.8

Jan 27, 2026

0.3.0.6

Jan 27, 2026

0.3.0.5

Jan 27, 2026

0.3.0.4

Jan 27, 2026

0.3.0.3

Jan 27, 2026

0.3.0.2

Jan 27, 2026

0.3.0.1

Jan 27, 2026

0.3.0.0

Jan 27, 2026

0.2.3.3

Jan 26, 2026

0.2.3.2

Jan 26, 2026

0.2.3.1

Jan 26, 2026

0.2.3.0

Jan 26, 2026

0.2.2.8

Jan 25, 2026

0.2.2.7

Jan 25, 2026

0.2.2.4

Jan 25, 2026

0.2.2.3

Jan 25, 2026

0.2.2.2

Jan 21, 2026

0.2.2.1

Jan 21, 2026

0.2.2.0

Jan 20, 2026

0.2.1.0

Jan 20, 2026

0.2.0.0

Jan 20, 2026

0.1.0.10

Jan 18, 2026

0.1.0.8

Jan 17, 2026

0.1.0.7

Jan 17, 2026

0.1.0.6

Jan 17, 2026

0.1.0.5

Jan 17, 2026

0.1.0.4

Jan 17, 2026

This version

0.1.0.3

Jan 17, 2026

0.1.0.2

Jan 15, 2026

0.1.0.1

Jan 15, 2026

0.1.0

Jan 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

isagellm-0.1.0.3-cp311-none-any.whl (51.7 kB view details)

Uploaded Jan 17, 2026 CPython 3.11

File details

Details for the file isagellm-0.1.0.3-cp311-none-any.whl.

File metadata

Download URL: isagellm-0.1.0.3-cp311-none-any.whl
Upload date: Jan 17, 2026
Size: 51.7 kB
Tags: CPython 3.11
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for isagellm-0.1.0.3-cp311-none-any.whl
Algorithm	Hash digest
SHA256	`ade4e649cf0031dd73645ccbd241b6ac9a817c1cb0ee443f679050fb5b2c2f18`
MD5	`bcca36bb231019d9c60b598138a161c0`
BLAKE2b-256	`592ee7b833ed4f7f2b44c761b3b4539f7bef8ef8ab86eb46ff17e055215fd838`

See more details on using hashes here.

isagellm 0.1.0.3

Navigation

Verified details

Owner

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sageLLM

✨ Features

📦 Quick Install

🚀 Quick Start

CLI (like ollama)

Python API

Configuration

📊 Year 1 Demo Contract

🏗️ Architecture

🔧 Development

Quick Setup (Development Mode)

Testing

📖 Development Resources

📚 Package Details

🎯 Roadmap

📄 License

Project details

Verified details

Owner

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes