sageLLM protocol types and validation (Protocol v0.1)

These details have not been verified by PyPI

Project description

sagellm-protocol

Protocol v0.1 类型定义与验证 | 为 sageLLM 推理引擎提供统一的协议定义

📋 快速导航

PyPI 包名: isagellm-protocol
导入命名空间: sagellm_protocol
Python 版本: 3.10+
当前版本: 0.4.0.8

职责定位

sagellm-protocol 是 sageLLM 系统的协议定义层，提供所有全局共享的类型定义。在整体架构中的角色：

┌─────────────────────────────────────────────────────┐
│              sageLLM 架构概览                        │
├─────────────────────────────────────────────────────┤
│  Gateway (网关)                                    │
│  ├─ 路由请求到合适的 Backend                        │
│  └─ 依赖: sagellm-protocol, sagellm-comm           │
├─────────────────────────────────────────────────────┤
│  Backend (推理后端)                                │
│  ├─ 执行 LLM 推理，收集指标                        │
│  └─ 依赖: sagellm-protocol, sagellm-kv-cache      │
├─────────────────────────────────────────────────────┤
│  Core (核心库)                                     │
│  ├─ 推理引擎、调度器、插件系统                      │
│  └─ 依赖: sagellm-protocol                        │
├─────────────────────────────────────────────────────┤
│  ⭐ sagellm-protocol (本仓库)                       │
│  ├─ Request/Response 定义                          │
│  ├─ 错误码、指标、设备类型                          │
│  ├─ KV Cache 生命周期类型                          │
│  └─ OpenAI 兼容类型                                │
└─────────────────────────────────────────────────────┘

依赖关系

被依赖的包

包名	用途
`pydantic` (>=2.0.0)	数据验证和序列化

依赖此包的仓库

仓库	描述
`sagellm-backend`	推理后端，使用 Request/Response/Metrics 类型
`sagellm-core`	核心推理引擎，使用所有协议类型
`sagellm-gateway`	网关，使用 OpenAI 兼容类型和路由类型
`sagellm-kv-cache`	KV 缓存管理，使用 KVAllocateParams/KVHandle 等
`sagellm-comm`	分布式通信，使用错误码和分布式字段
`sagellm` (umbrella)	顶层包，导出本包中的公共类型

安装指南

从 PyPI 安装 (推荐)

# 安装稳定版本
pip install isagellm-protocol==0.4.0.8

# 安装最新版本（可能不稳定）
pip install isagellm-protocol

本地开发安装

# 克隆仓库
git clone git@github.com:intellistream/sagellm-protocol.git
cd sagellm-protocol

# 创建虚拟环境（推荐）
python3.10 -m venv venv
source venv/bin/activate  # Linux/Mac
# 或
venv\Scripts\activate  # Windows

# 安装开发依赖
pip install -e ".[dev]"

快速开始

基础使用

from sagellm_protocol import Request, Response, Metrics, ErrorCode

# 创建请求
req = Request(
    request_id="req-001",
    trace_id="trace-001",
    model="llama2-7b",
    prompt="Hello, world!",
    max_tokens=128,
    stream=False,
    temperature=0.7,
)

# 创建响应
metrics = Metrics(
    ttft_ms=45.2,
    tbt_ms=12.5,
    throughput_tps=80.0,
    peak_mem_mb=24576,
    error_rate=0.0,
)

resp = Response(
    request_id="req-001",
    trace_id="trace-001",
    output_text="Hi there!",
    output_tokens=[42, 17],
    finish_reason="stop",
    metrics=metrics,
)

流式输出

from sagellm_protocol import StreamEventStart, StreamEventDelta, StreamEventEnd

# 流式开始事件
start = StreamEventStart(
    request_id="req-002",
    trace_id="trace-002",
    engine_id="engine-001",
    prompt_tokens=10,
)

# 中间增量事件
delta = StreamEventDelta(
    request_id="req-002",
    trace_id="trace-002",
    engine_id="engine-001",
    content="Hi",
    content_tokens=[42],
)

# 流式结束事件
from sagellm_protocol import Metrics
metrics = Metrics(
    ttft_ms=40.0,
    tpot_ms=11.0,
    throughput_tps=75.0,
    peak_mem_mb=20480,
    error_rate=0.0,
)
end = StreamEventEnd(
    request_id="req-002",
    trace_id="trace-002",
    engine_id="engine-001",
    content="Hi there",
    output_tokens=[42, 17],
    finish_reason="stop",
    metrics=metrics,
)

采样参数与解码策略

from sagellm_protocol import (
    DecodingStrategy,
    SamplingParams,
    SamplingPreset,
    DEFAULT_SAMPLING_PARAMS,
)

# 方式 1：使用默认配置（greedy，保证确定性）
params = DEFAULT_SAMPLING_PARAMS

# 方式 2：使用预设配置
params = SamplingPreset.get_params(SamplingPreset.BALANCED)

# 方式 3：自定义配置
params = SamplingParams(
    strategy=DecodingStrategy.SAMPLING,
    temperature=0.7,
    top_p=0.9,
    top_k=50,
)

更多完整示例见 examples/basic_usage.py 和 examples/sampling_usage.py。

API 文档

核心类型

Request - 推理请求

class Request(BaseModel):
    # 必填字段
    request_id: str              # 请求唯一标识符
    trace_id: str                # 追踪标识符
    model: str                   # 模型名称
    prompt: str                  # 输入提示文本
    max_tokens: int              # 最大生成 token 数 (> 0)
    stream: bool                 # 是否使用流式输出

    # 可选采样参数
    temperature: float | None    # 采样温度 (0, 2]
    top_p: float | None          # nucleus 采样概率 (0, 1]
    kv_budget_tokens: int | None # KV Cache 预算
    metadata: dict | None        # 透传元数据

Response - 推理响应

class Response(BaseModel):
    request_id: str              # 对应的请求 ID
    trace_id: str                # 对应的追踪 ID
    output_text: str             # 生成的文本
    output_tokens: list[int]     # 生成的 token IDs
    finish_reason: str           # 完成原因 (stop/length/error)
    metrics: Metrics             # 性能指标

Metrics - 性能指标

class Metrics(BaseModel):
    # 时间指标 (毫秒)
    ttft_ms: float               # Time To First Token
    tbt_ms: float = 0.0          # Time Between Tokens
    tpot_ms: float = 0.0         # Time Per Output Token

    # 吞吐和内存
    throughput_tps: float        # 吞吐量 (tokens/sec)
    peak_mem_mb: int             # 峰值内存 (MB)

    # KV Cache 指标
    kv_used_tokens: int = 0      # 已用 token 数
    kv_used_bytes: int = 0       # 已用字节数
    prefix_hit_rate: float = 0.0 # 前缀缓存命中率

    # 其他指标
    error_rate: float            # 错误率 [0, 1]
    spec_accept_rate: float = 0.0 # 投机解码接受率

ErrorCode - 协议错误码

class ErrorCode(str, Enum):
    INVALID_ARGUMENT = "invalid_argument"           # 缺必填字段或非法取值
    RESOURCE_EXHAUSTED = "resource_exhausted"       # KV/显存/并发不足
    UNAVAILABLE = "unavailable"                     # 后端不可用
    DEADLINE_EXCEEDED = "deadline_exceeded"         # 请求超时
    NOT_IMPLEMENTED = "not_implemented"             # 接口未实现
    KV_TRANSFER_FAILED = "kv_transfer_failed"       # KV 迁移失败
    COMM_TIMEOUT = "comm_timeout"                   # 通信超时
    DISTRIBUTED_RUNTIME_ERROR = "distributed_runtime_error"  # 分布式运行时错误

StreamEvent - 流式事件

class StreamEventStart(BaseModel):
    event: Literal["start"] = "start"
    request_id: str                    # 请求标识符
    trace_id: str                      # 追踪标识符
    engine_id: str                     # 引擎实例标识符
    prompt_tokens: int | None = None   # prompt token 数量（可选）

class StreamEventDelta(BaseModel):
    event: Literal["delta"] = "delta"
    request_id: str                    # 请求标识符
    trace_id: str                      # 追踪标识符
    engine_id: str                     # 引擎实例标识符
    content: str                       # 增量文本
    content_tokens: list[int]          # 增量 token ids

class StreamEventEnd(BaseModel):
    event: Literal["end"] = "end"
    request_id: str                    # 请求标识符
    trace_id: str                      # 追踪标识符
    engine_id: str                     # 引擎实例标识符
    content: str                       # 完整生成的文本
    output_tokens: list[int]           # 完整生成的 token ids
    finish_reason: str                 # 完成原因 (stop/length/error)
    metrics: Metrics                   # 性能指标
    error: Error | None = None         # 错误对象（若有）

DecodingStrategy - 解码策略

class DecodingStrategy(str, Enum):
    GREEDY = "greedy"           # 贪婪解码 (确定性)
    SAMPLING = "sampling"       # 温度采样 (多样性)
    BEAM_SEARCH = "beam_search" # 束搜索
    CONTRASTIVE = "contrastive" # 对比搜索

其他核心类型

Timestamps - 观测时间戳，用于计算推理指标
StreamEvent - 流式事件基类 (StreamEventStart/Delta/End)
KVAllocateParams - KV Cache 分配参数
KVHandle - KV Cache 句柄和生命周期管理
CapabilityDescriptor - Backend 能力描述 (DType、KernelKind、设备)
ChatCompletionRequest/Response - OpenAI 兼容类型

详见 src/sagellm_protocol/ 中的各模块源码。

开发指南

环境设置

# 使用 Python 3.10+ (推荐 3.10, 3.11, 3.12)
python --version

# 创建虚拟环境
python3.10 -m venv venv
source venv/bin/activate

# 安装依赖
pip install -e ".[dev]"

运行测试

# 运行所有测试
pytest tests/ -v

# 运行特定测试文件
pytest tests/test_types.py -v

# 运行带覆盖率
pytest --cov=sagellm_protocol --cov-report=term-missing

# 生成 HTML 覆盖率报告
pytest --cov=sagellm_protocol --cov-report=html
# 查看报告: open htmlcov/index.html

代码检查与格式化

# 格式化代码
ruff format .

# 检查代码（包括导入、类型等）
ruff check . --fix

# 类型检查
mypy src/sagellm_protocol

# 所有检查
ruff format . && ruff check . --fix && mypy src/sagellm_protocol

测试覆盖率

本仓库维护 100% 测试覆盖率：

7 个测试文件
62+ 个测试用例
1100+ 行测试代码

详见 docs/TESTING.md

贡献工作流

创建 Issue - 描述问题或功能需求

gh issue create --title "[Bug] 描述" --label "sagellm-protocol"

创建分支 - 从 main-dev 创建修复分支

git checkout -b fix/#123-description origin/main-dev

开发与测试

# 编写代码、测试
pytest tests/ -v
ruff format . && ruff check . --fix
mypy src/sagellm_protocol

提交 PR - 提交到 main-dev

git commit -m "fix: 描述 (#123)"
gh pr create --base main-dev --title "Fix: 描述" --body "Closes #123"

合并 - 审批通过后合并到 main-dev

提交约定

使用 Conventional Commits 格式：

fix: 修复 bug
feat: 新功能
docs: 文档更新
test: 测试用例
refactor: 代码重构

版本发布

版本号格式：MAJOR.MINOR.PATCH.BUILD (e.g., 0.4.0.8)

保持与 sagellm-backend、sagellm-core、sagellm-comm 的版本同步
更新 CHANGELOG.md 中的版本记录

文档与示例

示例文件

examples/basic_usage.py - 基础使用示例 (Request/Response/Metrics/Errors)
examples/sampling_usage.py - 采样参数和解码策略示例

文档文件

docs/TESTING.md - 测试指南
docs/SAMPLING_PARAMS.md - 采样参数详细说明
docs/kv_cache_protocol_fields.md - KV Cache 字段说明
CHANGELOG.md - 版本更新日志

外部文档

Protocol v0.1 - 协议规范
架构设计文档 - 整体架构
包依赖关系 - 依赖图表

版本信息

当前版本: 0.4.0.8
发布日期: 2026-01-30
Python 支持: 3.10, 3.11, 3.12
Pydantic: >= 2.0.0

版本历史

详见 CHANGELOG.md

主要版本：

0.4.0 (2026-01-30) - Ascend 后端支持、与其他核心包版本对齐
0.3.0 (2026-01-27) - sageLLM 0.3 版本对齐
0.1.0 (2026-01-20) - Protocol v0.1 基础定义

常见问题

Q: 如何在 Backend 中使用协议类型？

A: 直接导入并使用：

from sagellm_protocol import Request, Response, Metrics

def process_request(req: Request) -> Response:
    # 使用 Request 和 Response 类型
    ...

Q: 能否自行定义新的请求/响应类型？

A: 不能。所有全局共享类型必须在 sagellm-protocol 中定义，然后其他包导入使用。这确保系统内部所有包使用统一的类型定义。

Q: 如何添加新的错误码？

A: 在 src/sagellm_protocol/errors.py 中的 ErrorCode 枚举中添加，然后发起 PR。

Q: 指标中的各字段单位是什么？

时间：毫秒 (ms)
吞吐：tokens/sec (tps)
内存：兆字节 (MB)

详见 Metrics 类的 docstring。

Q: 如何验证 Request/Response 对象？

A: Pydantic v2 自动验证。示例：

try:
    req = Request(
        request_id="req-001",
        trace_id="trace-001",
        model="llama2-7b",
        prompt="Hello",
        max_tokens=128,  # 必须 > 0
        stream=False,
    )
except ValueError as e:
    print(f"验证失败: {e}")

License

Proprietary - IntelliStream

需要帮助? 查看 docs/TESTING.md 或提交 Issue。

Project details

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Programming Language

Release history Release notifications | RSS feed

0.5.4.16

Mar 12, 2026

0.5.4.13

Mar 8, 2026

0.5.4.12

Mar 7, 2026

0.5.4.11

Mar 6, 2026

0.5.4.10

Mar 3, 2026

0.5.4.7

Mar 1, 2026

0.5.4.6

Mar 1, 2026

0.5.4.0

Feb 27, 2026

0.5.2.13

Feb 26, 2026

0.5.2.11

Feb 26, 2026

0.5.2.9

Feb 26, 2026

0.5.2.8

Feb 25, 2026

0.5.2.6

Feb 25, 2026

0.5.2.5

Feb 24, 2026

0.5.2.4

Feb 24, 2026

0.5.2.3

Feb 24, 2026

This version

0.5.2.2

Feb 23, 2026

0.5.2.1

Feb 23, 2026

0.5.2.0

Feb 23, 2026

0.5.1.2

Feb 20, 2026

0.5.1.1

Feb 20, 2026

0.5.1.0

Feb 17, 2026

0.5.0.1

Feb 14, 2026

0.5.0.0

Feb 14, 2026

0.4.1.0

Feb 7, 2026

0.4.0.8

Feb 1, 2026

0.4.0.7

Jan 31, 2026

0.4.0.6

Jan 31, 2026

0.4.0.5

Jan 31, 2026

0.4.0.4

Jan 30, 2026

0.4.0.3

Jan 30, 2026

0.4.0.2

Jan 30, 2026

0.4.0.1

Jan 30, 2026

0.4.0.0

Jan 30, 2026

0.3.0.5

Jan 29, 2026

0.3.0.4

Jan 29, 2026

0.3.0.3

Jan 29, 2026

0.3.0.2

Jan 27, 2026

0.3.0.1

Jan 27, 2026

0.3.0

Jan 27, 2026

0.1.2.4

Jan 27, 2026

0.1.2.3

Jan 26, 2026

0.1.2.2

Jan 26, 2026

0.1.2.1

Jan 25, 2026

0.1.2.0

Jan 25, 2026

0.1.1.0

Jan 25, 2026

0.1.0.2

Jan 21, 2026

0.1.0.1

Jan 18, 2026

0.1.0

Jan 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

isagellm_protocol-0.5.2.2.tar.gz (132.8 kB view details)

Uploaded Feb 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

isagellm_protocol-0.5.2.2-py2.py3-none-any.whl (193.3 kB view details)

Uploaded Feb 23, 2026 Python 2Python 3

File details

Details for the file isagellm_protocol-0.5.2.2.tar.gz.

File metadata

Download URL: isagellm_protocol-0.5.2.2.tar.gz
Upload date: Feb 23, 2026
Size: 132.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.11

File hashes

Hashes for isagellm_protocol-0.5.2.2.tar.gz
Algorithm	Hash digest
SHA256	`4c645d126c9c86044b403c2599d0aa04ffa35f2c79ce6d0370ca7b907ffc1777`
MD5	`1386bf88ee9ceb6ebf132ccc6c1cb1a3`
BLAKE2b-256	`2bf00c2ef8fe87f80ecfefcb0285400ad2c2b249bd76d200ef52781708cef823`

See more details on using hashes here.

File details

Details for the file isagellm_protocol-0.5.2.2-py2.py3-none-any.whl.

File metadata

Download URL: isagellm_protocol-0.5.2.2-py2.py3-none-any.whl
Upload date: Feb 23, 2026
Size: 193.3 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.11

File hashes

Hashes for isagellm_protocol-0.5.2.2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`21d93dee864399f0fe01de79832a77304837d5828131129922930d9c2f1300dd`
MD5	`216982938382f4783f3dc3713925378c`
BLAKE2b-256	`bd6269b57aa72ead46db0623e9e19c217b91679ebd38e783e3be45236afeb39f`

See more details on using hashes here.

isagellm-protocol 0.5.2.2

Navigation

Verified details

Owner

Unverified details

Meta

Classifiers

Project description

sagellm-protocol

📋 快速导航

职责定位

依赖关系

被依赖的包

依赖此包的仓库

安装指南

从 PyPI 安装 (推荐)

本地开发安装

快速开始

基础使用

流式输出

采样参数与解码策略

API 文档

核心类型

Request - 推理请求

Response - 推理响应

Metrics - 性能指标

ErrorCode - 协议错误码

StreamEvent - 流式事件

DecodingStrategy - 解码策略

其他核心类型

开发指南

环境设置

运行测试

代码检查与格式化

测试覆盖率

贡献工作流

提交约定

版本发布

文档与示例

示例文件

文档文件

外部文档

版本信息

版本历史

常见问题

Q: 如何在 Backend 中使用协议类型？

Q: 能否自行定义新的请求/响应类型？

Q: 如何添加新的错误码？

Q: 指标中的各字段单位是什么？

Q: 如何验证 Request/Response 对象？

License

Project details

Verified details

Owner

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes