Skip to main content

sageLLM: Modular LLM inference engine for domestic computing power (Huawei Ascend, NVIDIA)

Project description

sageLLM

Protocol Compliance (Mandatory)

🚀 Modular LLM Inference Engine for Domestic Computing Power

Ollama-like experience for Chinese hardware ecosystems (Huawei Ascend, NVIDIA)


✨ Features

  • 🎯 One-Click Install - pip install isagellm gets you started immediately
  • 🧠 CPU-First - Default CPU engine, no GPU required
  • 🇨🇳 Domestic Hardware - First-class support for Huawei Ascend NPU
  • 📊 Observable - Built-in metrics (TTFT, TBT, throughput, KV usage)
  • 🧩 Plugin System - Extend with custom backends and engines

📦 Quick Install

# Install sageLLM (CPU-first, no GPU required)
pip install isagellm

# With Control Plane (request routing & scheduling)
pip install 'isagellm[control-plane]'

# With API Gateway (OpenAI-compatible REST API)
pip install 'isagellm[gateway]'

# Full server (Control Plane + Gateway)
pip install 'isagellm[server]'

# With CUDA support
pip install 'isagellm[cuda]'

# All features
pip install 'isagellm[all]'

🚀 Quick Start

CLI (like ollama)

# Show system info
sage-llm info

# Default mode (CPU engine, no GPU required)
sage-llm serve
sage-llm run -p "What is LLM inference?"

# Production mode (requires control-plane)
# pip install 'isagellm[server]'
sage-llm serve --control-plane
sage-llm gateway --port 8080

Python API

import asyncio

from sagellm import BackendConfig, EngineConfig, Request, create_backend, create_engine

# Create CPU backend + engine (no GPU needed)
async def main() -> None:
  backend = create_backend(BackendConfig(kind="cpu", device="cpu"))
  engine = create_engine(
    EngineConfig(kind="cpu", model="sshleifer/tiny-gpt2", model_path="sshleifer/tiny-gpt2"),
    backend,
  )

  await engine.start()
  try:
    request = Request(
      request_id="demo-001",
      prompt="Hello, world!",
      max_tokens=128,
      stream=False,
    )
    response = await engine.execute(request)
    print(response.output_text)
  finally:
    await engine.stop()


asyncio.run(main())

print(f"Response: {response.text}")
print(f"TTFT: {response.metrics.ttft_ms:.2f} ms")
print(f"Throughput: {response.metrics.throughput_tps:.2f} tokens/s")

Configuration

# ~/.sage-llm/config.yaml
backend:
  kind: cpu  # Options: cpu, pytorch-cuda, pytorch-ascend
  device: cpu

engine:
  kind: cpu
  model: sshleifer/tiny-gpt2

control_plane:
  endpoint: "localhost:8080"

📊 Metrics & Validation

sageLLM provides comprehensive performance metrics:

{
  "ttft_ms": 45.2,
  "tbt_ms": 12.5,
  "throughput_tps": 80.0,
  "peak_mem_mb": 24576,
  "kv_used_tokens": 4096,
  "prefix_hit_rate": 0.85
}

Run benchmarks:

sage-llm demo --workload year1 --output metrics.json

🏗️ Architecture

isagellm (umbrella package)
├── isagellm-protocol       # Protocol v0.1 types
│   └── Request, Response, Metrics, Error, StreamEvent
├── isagellm-backend        # Hardware abstraction (L1 - Foundation)
│   └── BackendProvider, CPUBackend, (CUDABackend, AscendBackend)
├── isagellm-comm           # Communication primitives (L2 - Infrastructure)
│   └── Topology, CollectiveOps (all_reduce/gather), P2P (send/recv), Overlap
├── isagellm-kv-cache       # KV cache management (L2 - Optional)
│   └── PrefixCache, MemoryPool, EvictionPolicies, Predictor, KV Transfer
├── isagellm-compression    # Inference acceleration (quantization, sparsity, etc.) (L2 - Optional)
│   └── Quantization, Sparsity, SpeculativeDecoding, Fusion
├── isagellm-core           # Engine core & runtime (L3)
│   └── Config, Engine, Factory, DemoRunner, Adapters (vLLM/LMDeploy)
├── isagellm-control-plane  # Request routing & scheduling (L4 - Optional)
│   └── ControlPlaneManager, Router, Policies, Lifecycle
└── isagellm-gateway        # OpenAI-compatible REST API (L5 - Optional)
    └── FastAPI server, /v1/chat/completions, Session management

🔧 Development

Quick Setup (Development Mode)

# Clone all repositories
./scripts/clone-all-repos.sh

# Install all packages in editable mode
./quickstart.sh

# Open all repos in VS Code Multi-root Workspace
code sagellm.code-workspace

📖 See WORKSPACE_GUIDE.md for Multi-root Workspace usage.

Testing

# Clone and setup
git clone https://github.com/IntelliStream/sagellm.git
cd sagellm
pip install -e ".[dev]"

# Run tests
pytest -v

# Format & lint
ruff format .
ruff check . --fix

# Type check
mypy src/sagellm/

# Verify dependency hierarchy
python scripts/verify_dependencies.py

📖 Development Resources


📚 Documentation Index

用户文档

开发者文档

API 文档

子包文档

📚 Package Details

Package PyPI Name Import Name Description
sagellm isagellm sagellm Umbrella package (install this)
sagellm-protocol isagellm-protocol sagellm_protocol Protocol v0.1 types
sagellm-core isagellm-core sagellm_core Runtime & config
sagellm-backend isagellm-backend sagellm_backend Hardware abstraction

📄 License

Proprietary - IntelliStream. Internal use only.


Built with ❤️ by IntelliStream Team for domestic AI infrastructure

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

isagellm-0.2.2.7-cp311-none-any.whl (54.1 kB view details)

Uploaded CPython 3.11

File details

Details for the file isagellm-0.2.2.7-cp311-none-any.whl.

File metadata

  • Download URL: isagellm-0.2.2.7-cp311-none-any.whl
  • Upload date:
  • Size: 54.1 kB
  • Tags: CPython 3.11
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for isagellm-0.2.2.7-cp311-none-any.whl
Algorithm Hash digest
SHA256 249c49a629cab14e36c7a9eef63de7a2af5e095c57c634ed7979ce7f87c776fb
MD5 ed149ad86026a0e65d4a0f28a18d27c9
BLAKE2b-256 9714b5d91ddac2070de8c53a35be9d0b43081c2dcb6b712112fa79d9585a0ea5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page