sageLLM backend provider abstraction and mock implementation
Project description
sagellm-backend
BackendProvider 抽象与内置 mock 实现。
- 仓库名:
sagellm-backend - PyPI 包名:
isagellm-backend - import 名:
sagellm_backend
架构层级
Level 0: sagellm-protocol (最基础,无依赖)
↓
Level 1: sagellm-backend (依赖 protocol) ← 当前包
↓
Level 2: sagellm-core (依赖 protocol + backend)
↓
Level 3: 功能模块(依赖 protocol + backend + core)
依赖说明:
- ✅ 生产依赖:
isagellm-protocol(仅依赖 protocol) - ✅ 开发依赖:
isagellm-core(仅用于测试 entry point 机制) - ❌ 不依赖:sagellm-core 的任何运行时功能
Entry Points
- group:
sagellm.backends - 内置 kind:
mock
其他硬件后端(ascend_cann/cuda 等)可通过独立插件包扩展,或在本仓库内添加 provider。
组件说明
BackendProvider 抽象
- 定义统一的硬件抽象接口
- 覆盖:stream/event、集合通信、KV 块管理、kernel 注册
- 能力矩阵驱动(CapabilityDescriptor)
Mock 实现
- 支持 CI 测试(无需真实硬件)
- 能力矩阵可配置(模拟不同后端能力)
- Fail-fast 错误处理
安装
# 从 PyPI 安装(自动安装 protocol 依赖)
pip install isagellm-backend
🚀 开发者快速开始
git clone git@github.com:intellistream/sagellm-backend.git
cd sagellm-backend
./quickstart.sh # 一键安装开发环境(含依赖)
# 或手动安装
pip install -e ".[dev]"
运行测试:
pytest tests/ -v
💡 提示:
isagellm-protocol会自动从 PyPI 安装。如需本地联调 protocol:pip install -e ../sagellm-protocol
开发者指南
本项目使用完整的代码质量保障机制:
- ✅ Pre-commit hooks: 自动格式化和 lint
- ✅ Unit tests: 17 个单元测试 + 集成测试
- ✅ CI/CD: GitHub Actions 自动化测试
- ✅ Type checking: 100% 类型注解覆盖
详见 CONTRIBUTING.md。
使用示例
from sagellm_backend import MockBackendProvider, DType
# 创建 mock backend
backend = MockBackendProvider(
supported_dtypes=[DType.FP16, DType.BF16],
has_collective=True,
)
# 查询能力
cap = backend.capability()
print(cap.supported_dtypes)
# 分配 KV 块
block = backend.kv_block_alloc(128, DType.FP16)
HFCudaEngine(HuggingFace CUDA)
HuggingFace Transformers 的 CUDA 推理引擎,遵循 fail-fast 与 mock-first 约束。
- 依赖:
pip install torch transformers accelerate;量化需额外pip install bitsandbytes - 必填字段(无隐式默认):
engine_id,model_path,device(必须以 "cuda" 开头),dtype(float16/bfloat16/float32/auto),device_map(auto/cuda:X/dict),load_in_8bit,load_in_4bit(互斥),trust_remote_code,max_new_tokens,max_batch_size - Provider:可注入
CudaBackendProvider;未注入时若可用自动创建 CUDA provider,mock_mode=True或非 CUDA 设备自动使用MockBackendProvider
配置示例
from sagellm_backend.engine.hf_cuda import HFCudaEngine, HFCudaEngineConfig
# 标准 FP16 配置(最小化)
config = HFCudaEngineConfig(
engine_id="hf-001",
model_path="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
device="cuda:0",
dtype="float16",
device_map="auto",
load_in_8bit=False,
load_in_4bit=False,
trust_remote_code=False,
max_new_tokens=256,
max_batch_size=8,
)
engine = HFCudaEngine(config)
await engine.start()
# 4-bit 量化配置(需要 bitsandbytes)
quant_config = HFCudaEngineConfig(
engine_id="hf-002",
model_path="meta-llama/Llama-2-7b-hf",
device="cuda:0",
dtype="auto",
device_map="auto",
load_in_8bit=False,
load_in_4bit=True,
trust_remote_code=False,
max_new_tokens=512,
max_batch_size=4,
)
# Mock 模式(CI/无 GPU)
mock_config = HFCudaEngineConfig(
engine_id="hf-mock",
model_path="mock-model",
device="cuda:0", # 语义保持 CUDA,实际使用 MockBackendProvider
dtype="float16",
device_map="auto",
load_in_8bit=False,
load_in_4bit=False,
trust_remote_code=False,
max_new_tokens=64,
max_batch_size=2,
mock_mode=True,
)
mock_engine = HFCudaEngine(mock_config)
await mock_engine.start()
# 推理
response = await engine.execute(request)
stream = engine.stream(request)
指标输出
Response.metrics 覆盖协议字段:ttft_ms, tbt_ms, tpot_ms, throughput_tps, peak_mem_mb, error_rate,以及 KV 相关统计 kv_used_tokens, kv_used_bytes, prefix_hit_rate, evict_count, evict_ms,并附带 timestamps(queued/scheduled/executed/completed)。
扩展新后端
# 在 providers/ 下创建新模块
class AscendBackendProvider:
def capability(self) -> CapabilityDescriptor:
return CapabilityDescriptor(
supported_dtypes=[DType.FP16, DType.BF16, DType.INT8],
# ...
)
# 实现其他接口...
# 注册 entry point (pyproject.toml)
[project.entry-points."sagellm.backends"]
ascend_cann = "sagellm_backend.providers.ascend:create_ascend_backend"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file isagellm_backend-0.1.1.1-cp311-none-any.whl.
File metadata
- Download URL: isagellm_backend-0.1.1.1-cp311-none-any.whl
- Upload date:
- Size: 166.0 kB
- Tags: CPython 3.11
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2823d1be14c93f911383d3dff511a6d98bba67747f1a5918a9fc6ec89784e129
|
|
| MD5 |
bb38008ad8d8b421a8215e99cec4fedc
|
|
| BLAKE2b-256 |
5ada5eeeccd0a5dbd181670a8cdc1a5c38cefa385338436cb1cc1a9896dce3bb
|