sageLLM: Modular LLM inference engine for domestic computing power (Huawei Ascend, NVIDIA)
Project description
sageLLM
๐ Modular LLM Inference Engine for Domestic Computing Power
Ollama-like experience for Chinese hardware ecosystems (Huawei Ascend, NVIDIA)
โจ Features
- ๐ฏ One-Click Install -
pip install isagellmgets you started immediately - ๐ Mock-First - Test without GPU, perfect for CI/CD
- ๐จ๐ณ Domestic Hardware - First-class support for Huawei Ascend NPU
- ๐ Observable - Built-in metrics (TTFT, TBT, throughput, KV usage)
- ๐งฉ Plugin System - Extend with custom backends and engines
๐ฆ Quick Install
# Install sageLLM (includes mock backend, no GPU required)
pip install isagellm
# With Control Plane (request routing & scheduling)
pip install 'isagellm[control-plane]'
# With API Gateway (OpenAI-compatible REST API)
pip install 'isagellm[gateway]'
# Full server (Control Plane + Gateway)
pip install 'isagellm[server]'
# With CUDA support
pip install 'isagellm[cuda]'
# All features
pip install 'isagellm[all]'
๐ Quick Start
CLI (like ollama)
# Show system info
sage-llm info
# Start mock server (no GPU required)
sage-llm serve --mock
# Single inference
sage-llm run -p "What is LLM inference?" --mock
# Run Year1 demo validation
sage-llm demo --workload year1 --mock
# Start OpenAI-compatible API gateway
sage-llm gateway --mock --port 8080
Python API
from sagellm import Request, MockEngine
# Create mock engine (no GPU needed)
engine = MockEngine()
# Run inference
request = Request(
request_id="demo-001",
prompt="Hello, world!",
max_tokens=128,
)
response = engine.generate(request)
print(f"Response: {response.text}")
print(f"TTFT: {response.metrics.ttft_ms:.2f} ms")
print(f"Throughput: {response.metrics.throughput_tps:.2f} tokens/s")
Configuration
# ~/.sage-llm/config.yaml
backend:
kind: mock # or: cuda, ascend
engine:
kind: mock
model: Qwen/Qwen2-7B
workload:
segments:
- short # 128 in โ 128 out
- long # 2048 in โ 512 out
- stress # concurrent requests
๐ Year 1 Demo Contract
sageLLM must produce these metrics for validation:
{
"ttft_ms": 45.2,
"tbt_ms": 12.5,
"throughput_tps": 80.0,
"peak_mem_mb": 24576,
"kv_used_tokens": 4096,
"prefix_hit_rate": 0.85,
"evict_count": 3
}
Run validation:
sage-llm demo --workload year1 --output metrics.json
๐๏ธ Architecture
isagellm (umbrella package)
โโโ isagellm-protocol # Protocol v0.1 types
โ โโโ Request, Response, Metrics, Error, StreamEvent
โโโ isagellm-core # Runtime & Demo Runner
โ โโโ Config, Engine, Factory, DemoRunner
โโโ isagellm-backend # Hardware abstraction
โ โโโ BackendProvider, MockBackend, (CUDABackend, AscendBackend)
โโโ isagellm-control-plane # Request routing & scheduling (optional)
โ โโโ ControlPlaneManager, Router, Policies, Lifecycle
โโโ isagellm-gateway # OpenAI-compatible REST API (optional)
โโโ FastAPI server, /v1/chat/completions, Session management
๐ง Development
Quick Setup (Development Mode)
# Clone all repositories
./scripts/clone-all-repos.sh
# Install all packages in editable mode
./quickstart.sh
# Open all repos in VS Code Multi-root Workspace
code sagellm.code-workspace
๐ See WORKSPACE_GUIDE.md for Multi-root Workspace usage.
Testing
# Clone and setup
git clone https://github.com/IntelliStream/sagellm.git
cd sagellm
pip install -e ".[dev]"
# Run tests
pytest -v
# Format & lint
ruff format .
ruff check . --fix
# Type check
mypy src/sagellm/
# Verify dependency hierarchy
python scripts/verify_dependencies.py
๐ Development Resources
- DEVELOPER_GUIDE.md - ๆถๆ่ง่ไธๅผๅๆๅ
- PR_CHECKLIST.md - Pull Request ๅฎกๆฅๆธ ๅ
- scripts/verify_dependencies.py - ไพ่ตๅฑๆฌก้ช่ฏ
๐ Package Details
| Package | PyPI Name | Import Name | Description |
|---|---|---|---|
| sagellm | isagellm |
sagellm |
Umbrella package (install this) |
| sagellm-protocol | isagellm-protocol |
sagellm_protocol |
Protocol v0.1 types |
| sagellm-core | isagellm-core |
sagellm_core |
Runtime & config |
| sagellm-backend | isagellm-backend |
sagellm_backend |
Hardware abstraction |
๐ฏ Roadmap
- Year 1: Core inference with KV cache, prefix sharing, basic eviction
- Year 2: Multi-node inference, advanced scheduling
- Year 3: Full production-ready deployment
๐ License
Proprietary - IntelliStream. Internal use only.
Built with โค๏ธ by IntelliStream Team for domestic AI infrastructure
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file isagellm-0.1.0.3-cp311-none-any.whl.
File metadata
- Download URL: isagellm-0.1.0.3-cp311-none-any.whl
- Upload date:
- Size: 51.7 kB
- Tags: CPython 3.11
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ade4e649cf0031dd73645ccbd241b6ac9a817c1cb0ee443f679050fb5b2c2f18
|
|
| MD5 |
bcca36bb231019d9c60b598138a161c0
|
|
| BLAKE2b-256 |
592ee7b833ed4f7f2b44c761b3b4539f7bef8ef8ab86eb46ff17e055215fd838
|