sageLLM backend provider abstraction and mock implementation

These details have not been verified by PyPI

Project description

sagellm-backend

BackendProvider 抽象与内置 mock 实现。

仓库名：sagellm-backend
PyPI 包名：isagellm-backend
import 名：sagellm_backend

架构层级

Level 0: sagellm-protocol (最基础，无依赖)
    ↓
Level 1: sagellm-backend (依赖 protocol) ← 当前包
    ↓
Level 2: sagellm-core (依赖 protocol + backend)
    ↓
Level 3: 功能模块（依赖 protocol + backend + core）

依赖说明：

✅ 生产依赖：isagellm-protocol (仅依赖 protocol)
✅ 开发依赖：isagellm-core (仅用于测试 entry point 机制)
❌ 不依赖：sagellm-core 的任何运行时功能

Entry Points

group：sagellm.backends
内置 kind：mock

其他硬件后端（ascend_cann/cuda 等）可通过独立插件包扩展，或在本仓库内添加 provider。

组件说明

BackendProvider 抽象

定义统一的硬件抽象接口
覆盖：stream/event、集合通信、KV 块管理、kernel 注册
能力矩阵驱动（CapabilityDescriptor）

Mock 实现

支持 CI 测试（无需真实硬件）
能力矩阵可配置（模拟不同后端能力）
Fail-fast 错误处理

安装

# 从 PyPI 安装（自动安装 protocol 依赖）
pip install isagellm-backend

🚀 开发者快速开始

git clone git@github.com:intellistream/sagellm-backend.git
cd sagellm-backend
./quickstart.sh   # 一键安装开发环境（含依赖）

# 或手动安装
pip install -e ".[dev]"

运行测试：

pytest tests/ -v

💡 提示：isagellm-protocol 会自动从 PyPI 安装。如需本地联调 protocol：
pip install -e ../sagellm-protocol

开发者指南

本项目使用完整的代码质量保障机制：

✅ Pre-commit hooks: 自动格式化和 lint
✅ Unit tests: 17 个单元测试 + 集成测试
✅ CI/CD: GitHub Actions 自动化测试
✅ Type checking: 100% 类型注解覆盖

详见 CONTRIBUTING.md。

使用示例

from sagellm_backend import MockBackendProvider, DType

# 创建 mock backend
backend = MockBackendProvider(
    supported_dtypes=[DType.FP16, DType.BF16],
    has_collective=True,
)

# 查询能力
cap = backend.capability()
print(cap.supported_dtypes)

# 分配 KV 块
block = backend.kv_block_alloc(128, DType.FP16)

HFCudaEngine（HuggingFace CUDA）

HuggingFace Transformers 的 CUDA 推理引擎，遵循 fail-fast 与 mock-first 约束。

依赖：pip install torch transformers accelerate；量化需额外 pip install bitsandbytes
必填字段（无隐式默认）：engine_id, model_path, device(必须以 "cuda" 开头), dtype(float16/bfloat16/float32/auto), device_map(auto/cuda:X/dict), load_in_8bit, load_in_4bit(互斥), trust_remote_code, max_new_tokens, max_batch_size
Provider：可注入 CudaBackendProvider；未注入时若可用自动创建 CUDA provider，mock_mode=True 或非 CUDA 设备自动使用 MockBackendProvider

配置示例

from sagellm_backend.engine.hf_cuda import HFCudaEngine, HFCudaEngineConfig

# 标准 FP16 配置（最小化）
config = HFCudaEngineConfig(
    engine_id="hf-001",
    model_path="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    device="cuda:0",
    dtype="float16",
    device_map="auto",
    load_in_8bit=False,
    load_in_4bit=False,
    trust_remote_code=False,
    max_new_tokens=256,
    max_batch_size=8,
)
engine = HFCudaEngine(config)
await engine.start()

# 4-bit 量化配置（需要 bitsandbytes）
quant_config = HFCudaEngineConfig(
    engine_id="hf-002",
    model_path="meta-llama/Llama-2-7b-hf",
    device="cuda:0",
    dtype="auto",
    device_map="auto",
    load_in_8bit=False,
    load_in_4bit=True,
    trust_remote_code=False,
    max_new_tokens=512,
    max_batch_size=4,
)

# Mock 模式（CI/无 GPU）
mock_config = HFCudaEngineConfig(
    engine_id="hf-mock",
    model_path="mock-model",
    device="cuda:0",  # 语义保持 CUDA，实际使用 MockBackendProvider
    dtype="float16",
    device_map="auto",
    load_in_8bit=False,
    load_in_4bit=False,
    trust_remote_code=False,
    max_new_tokens=64,
    max_batch_size=2,
    mock_mode=True,
)
mock_engine = HFCudaEngine(mock_config)
await mock_engine.start()

# 推理
response = await engine.execute(request)
stream = engine.stream(request)

指标输出

Response.metrics 覆盖协议字段：ttft_ms, tbt_ms, tpot_ms, throughput_tps, peak_mem_mb, error_rate，以及 KV 相关统计 kv_used_tokens, kv_used_bytes, prefix_hit_rate, evict_count, evict_ms，并附带 timestamps（queued/scheduled/executed/completed）。

扩展新后端

# 在 providers/ 下创建新模块
class AscendBackendProvider:
    def capability(self) -> CapabilityDescriptor:
        return CapabilityDescriptor(
            supported_dtypes=[DType.FP16, DType.BF16, DType.INT8],
            # ...
        )

    # 实现其他接口...

# 注册 entry point (pyproject.toml)
[project.entry-points."sagellm.backends"]
ascend_cann = "sagellm_backend.providers.ascend:create_ascend_backend"

Project details

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Programming Language

Release history Release notifications | RSS feed

0.5.4.17

Mar 12, 2026

0.5.4.16

Mar 8, 2026

0.5.4.13

Mar 6, 2026

0.5.4.11

Mar 3, 2026

0.5.4.8

Mar 1, 2026

0.5.4.7

Mar 1, 2026

0.5.4.4

Mar 1, 2026

0.5.4.1

Feb 28, 2026

0.5.4.0

Feb 27, 2026

0.5.3.26

Feb 27, 2026

0.5.3.25

Feb 27, 2026

0.5.3.24

Feb 27, 2026

0.5.3.23

Feb 27, 2026

0.5.3.22

Feb 27, 2026

0.5.3.21

Feb 27, 2026

0.5.3.19

Feb 27, 2026

0.5.3.17

Feb 26, 2026

0.5.3.13

Feb 26, 2026

0.5.3.9

Feb 26, 2026

0.5.3.6

Feb 26, 2026

0.5.3.5

Feb 26, 2026

0.5.3.3

Feb 25, 2026

0.5.3.2

Feb 25, 2026

0.5.3.1

Feb 23, 2026

0.5.3.0

Feb 23, 2026

0.5.2.13

Feb 23, 2026

0.5.2.12

Feb 20, 2026

0.5.2.10

Feb 18, 2026

0.5.2.9

Feb 17, 2026

0.5.2.8

Feb 17, 2026

0.5.2.7

Feb 17, 2026

0.5.2.6

Feb 17, 2026

0.5.2.5

Feb 17, 2026

0.5.2.4

Feb 17, 2026

0.5.2.3

Feb 17, 2026

0.5.2.2

Feb 17, 2026

0.5.2.1

Feb 17, 2026

0.5.2.0

Feb 17, 2026

0.5.1.0

Feb 17, 2026

0.4.1.6

Feb 17, 2026

0.4.1.5

Feb 17, 2026

0.4.1.4

Feb 15, 2026

0.4.1.3

Feb 15, 2026

0.4.1.2

Feb 15, 2026

0.4.1.1

Feb 15, 2026

0.4.1.0

Feb 15, 2026

0.4.0.14

Feb 8, 2026

0.4.0.10

Feb 1, 2026

0.4.0.9

Feb 1, 2026

0.4.0.8

Jan 31, 2026

0.4.0.7

Jan 31, 2026

0.4.0.6

Jan 30, 2026

0.4.0.5

Jan 30, 2026

0.4.0.4

Jan 30, 2026

0.4.0.3

Jan 30, 2026

0.4.0.2

Jan 30, 2026

0.4.0.1

Jan 30, 2026

0.3.0.11

Jan 29, 2026

0.3.0.10

Jan 29, 2026

0.3.0.9

Jan 29, 2026

0.3.0.8

Jan 29, 2026

0.3.0.7

Jan 29, 2026

0.3.0.6

Jan 28, 2026

0.3.0.5

Jan 27, 2026

0.3.0.4

Jan 27, 2026

0.3.0.3

Jan 27, 2026

0.3.0.1

Jan 27, 2026

0.3.0.0

Jan 27, 2026

0.2.1.6

Jan 27, 2026

0.2.1.5

Jan 26, 2026

0.2.1.4

Jan 26, 2026

0.2.1.3

Jan 26, 2026

0.2.1.2

Jan 25, 2026

0.2.1.1

Jan 21, 2026

0.2.1.0

Jan 20, 2026

0.2.0.0

Jan 20, 2026

0.1.1.4

Jan 18, 2026

0.1.1.3

Jan 17, 2026

This version

0.1.1.2

Jan 15, 2026

0.1.1.1

Jan 15, 2026

0.1.1.0

Jan 15, 2026

0.1.0.2

Jan 15, 2026

0.1.0.1

Jan 15, 2026

0.1.0

Jan 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

isagellm_backend-0.1.1.2-cp311-none-any.whl (116.2 kB view details)

Uploaded Jan 15, 2026 CPython 3.11

File details

Details for the file isagellm_backend-0.1.1.2-cp311-none-any.whl.

File metadata

Download URL: isagellm_backend-0.1.1.2-cp311-none-any.whl
Upload date: Jan 15, 2026
Size: 116.2 kB
Tags: CPython 3.11
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for isagellm_backend-0.1.1.2-cp311-none-any.whl
Algorithm	Hash digest
SHA256	`0ce5c649f6a20d7768da9fd57fa19bb41f6872230d0d07f291a4ce4842f8e799`
MD5	`6f2b94c8754aafe5976ef5f50829c14b`
BLAKE2b-256	`0959a5a72071b6c6c4b2597ea4b72e73e572fff0ad67853b76f1a1c99c065936`

See more details on using hashes here.

isagellm-backend 0.1.1.2

Navigation

Verified details

Owner

Maintainers

Unverified details

Meta

Classifiers

Project description

sagellm-backend

架构层级

Entry Points

组件说明

BackendProvider 抽象

Mock 实现

安装

🚀 开发者快速开始

开发者指南

使用示例

HFCudaEngine（HuggingFace CUDA）

配置示例

指标输出

扩展新后端

Project details

Verified details

Owner

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes