Skip to main content

sageLLM backend provider abstraction and mock implementation

Project description

sagellm-backend

BackendProvider 抽象与内置 mock 实现。

  • 仓库名:sagellm-backend
  • PyPI 包名:isagellm-backend
  • import 名:sagellm_backend

架构层级

Level 0: sagellm-protocol (最基础,无依赖)
    ↓
Level 1: sagellm-backend (依赖 protocol) ← 当前包
    ↓
Level 2: sagellm-core (依赖 protocol + backend)
    ↓
Level 3: 功能模块(依赖 protocol + backend + core)

依赖说明

  • ✅ 生产依赖:isagellm-protocol (仅依赖 protocol)
  • ✅ 开发依赖:isagellm-core (仅用于测试 entry point 机制)
  • ❌ 不依赖:sagellm-core 的任何运行时功能

Entry Points

  • group:sagellm.backends
  • 内置 kind:mock

其他硬件后端(ascend_cann/cuda 等)可通过独立插件包扩展,或在本仓库内添加 provider。

组件说明

BackendProvider 抽象

  • 定义统一的硬件抽象接口
  • 覆盖:stream/event、集合通信、KV 块管理、kernel 注册
  • 能力矩阵驱动(CapabilityDescriptor)

Mock 实现

  • 支持 CI 测试(无需真实硬件)
  • 能力矩阵可配置(模拟不同后端能力)
  • Fail-fast 错误处理

安装

# 从 PyPI 安装(自动安装 protocol 依赖)
pip install isagellm-backend

🚀 开发者快速开始

git clone git@github.com:intellistream/sagellm-backend.git
cd sagellm-backend
./quickstart.sh   # 一键安装开发环境(含依赖)

# 或手动安装
pip install -e ".[dev]"

运行测试:

pytest tests/ -v

💡 提示isagellm-protocol 会自动从 PyPI 安装。如需本地联调 protocol:

pip install -e ../sagellm-protocol

开发者指南

本项目使用完整的代码质量保障机制:

  • Pre-commit hooks: 自动格式化和 lint
  • Unit tests: 17 个单元测试 + 集成测试
  • CI/CD: GitHub Actions 自动化测试
  • Type checking: 100% 类型注解覆盖

详见 CONTRIBUTING.md

使用示例

from sagellm_backend import MockBackendProvider, DType

# 创建 mock backend
backend = MockBackendProvider(
    supported_dtypes=[DType.FP16, DType.BF16],
    has_collective=True,
)

# 查询能力
cap = backend.capability()
print(cap.supported_dtypes)

# 分配 KV 块
block = backend.kv_block_alloc(128, DType.FP16)

HFCudaEngine(HuggingFace CUDA)

HuggingFace Transformers 的 CUDA 推理引擎,遵循 fail-fast 与 mock-first 约束。

  • 依赖:pip install torch transformers accelerate;量化需额外 pip install bitsandbytes
  • 必填字段(无隐式默认):engine_id, model_path, device(必须以 "cuda" 开头), dtype(float16/bfloat16/float32/auto), device_map(auto/cuda:X/dict), load_in_8bit, load_in_4bit(互斥), trust_remote_code, max_new_tokens, max_batch_size
  • Provider:可注入 CudaBackendProvider;未注入时若可用自动创建 CUDA provider,mock_mode=True 或非 CUDA 设备自动使用 MockBackendProvider

配置示例

from sagellm_backend.engine.hf_cuda import HFCudaEngine, HFCudaEngineConfig

# 标准 FP16 配置(最小化)
config = HFCudaEngineConfig(
    engine_id="hf-001",
    model_path="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    device="cuda:0",
    dtype="float16",
    device_map="auto",
    load_in_8bit=False,
    load_in_4bit=False,
    trust_remote_code=False,
    max_new_tokens=256,
    max_batch_size=8,
)
engine = HFCudaEngine(config)
await engine.start()

# 4-bit 量化配置(需要 bitsandbytes)
quant_config = HFCudaEngineConfig(
    engine_id="hf-002",
    model_path="meta-llama/Llama-2-7b-hf",
    device="cuda:0",
    dtype="auto",
    device_map="auto",
    load_in_8bit=False,
    load_in_4bit=True,
    trust_remote_code=False,
    max_new_tokens=512,
    max_batch_size=4,
)

# Mock 模式(CI/无 GPU)
mock_config = HFCudaEngineConfig(
    engine_id="hf-mock",
    model_path="mock-model",
    device="cuda:0",  # 语义保持 CUDA,实际使用 MockBackendProvider
    dtype="float16",
    device_map="auto",
    load_in_8bit=False,
    load_in_4bit=False,
    trust_remote_code=False,
    max_new_tokens=64,
    max_batch_size=2,
    mock_mode=True,
)
mock_engine = HFCudaEngine(mock_config)
await mock_engine.start()

# 推理
response = await engine.execute(request)
stream = engine.stream(request)

指标输出

Response.metrics 覆盖协议字段:ttft_ms, tbt_ms, tpot_ms, throughput_tps, peak_mem_mb, error_rate,以及 KV 相关统计 kv_used_tokens, kv_used_bytes, prefix_hit_rate, evict_count, evict_ms,并附带 timestamps(queued/scheduled/executed/completed)。

扩展新后端

# 在 providers/ 下创建新模块
class AscendBackendProvider:
    def capability(self) -> CapabilityDescriptor:
        return CapabilityDescriptor(
            supported_dtypes=[DType.FP16, DType.BF16, DType.INT8],
            # ...
        )

    # 实现其他接口...

# 注册 entry point (pyproject.toml)
[project.entry-points."sagellm.backends"]
ascend_cann = "sagellm_backend.providers.ascend:create_ascend_backend"

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

isagellm_backend-0.1.1.2-cp311-none-any.whl (116.2 kB view details)

Uploaded CPython 3.11

File details

Details for the file isagellm_backend-0.1.1.2-cp311-none-any.whl.

File metadata

File hashes

Hashes for isagellm_backend-0.1.1.2-cp311-none-any.whl
Algorithm Hash digest
SHA256 0ce5c649f6a20d7768da9fd57fa19bb41f6872230d0d07f291a4ce4842f8e799
MD5 6f2b94c8754aafe5976ef5f50829c14b
BLAKE2b-256 0959a5a72071b6c6c4b2597ea4b72e73e572fff0ad67853b76f1a1c99c065936

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page