Skip to main content

anyServe - Capability-Oriented Serving Runtime for LLM inference

Project description

anyserve

面向大规模 LLM 推理的 Serving Runtime。

项目状态

POC 阶段 - 核心骨架已实现,正在开发 MVP 功能。

核心特性

  • Capability 驱动:基于任意 key-value 的请求路由,而非固定 model name
  • Worker 动态启停:根据负载动态管理 Worker,资源灵活复用
  • 控制流/数据流分离:控制流走 KServe 协议,数据流走 Object System
  • C++ Dispatcher + Python Worker:高性能控制面 + 灵活执行面

架构概览

┌──────────────────────────────────────────┐
│            API Server (独立项目)          │
│         基于 Capability 路由请求          │
└──────────────────────┬───────────────────┘
                       │
          ┌────────────┼────────────┐
          ↓            ↓            ↓
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│  Replica A   │ │  Replica B   │ │  Replica C   │
│  (anyserve)  │ │  (anyserve)  │ │  (anyserve)  │
│              │ │              │ │              │
│ Dispatcher   │ │ Dispatcher   │ │ Dispatcher   │
│     ↓        │ │     ↓        │ │     ↓        │
│  Workers     │ │  Workers     │ │  Workers     │
└──────────────┘ └──────────────┘ └──────────────┘

详细设计请参阅:

快速开始

环境要求

  • Python 3.11+
  • C++ 编译器(支持 C++17)
  • CMake 3.20+
  • Conan 2.0+

安装

# 安装依赖并构建
just setup
just build

# 安装 Python 包
pip install -e python/

运行示例

# 启动 server
anyserve examples.basic.app:app --port 8000 --workers 1

# 测试
python examples/basic/run_example.py

定义 Capability Handler

from anyserve import AnyServe, ModelInferRequest, ModelInferResponse

app = AnyServe()

@app.capability(type="echo")
def echo_handler(request: ModelInferRequest) -> ModelInferResponse:
    response = ModelInferResponse(
        model_name=request.model_name,
        id=request.id
    )
    for inp in request.inputs:
        out = response.add_output(
            name=f"output_{inp.name}",
            datatype=inp.datatype,
            shape=inp.shape
        )
        out.contents = inp.contents
    return response

Client 连接模式

Client 支持两种连接模式:

from anyserve.worker.client import Client

# Direct 模式 - 直接连接指定 Worker
client = Client(endpoint="localhost:50051")

# Discovery 模式 - 通过 API Server 自动发现 Worker
client = Client(
    api_server="http://localhost:8080",
    capability={"type": "echo"}
)

result = client.infer("echo", {"text": ["hello"]})
client.close()

详见 examples/multi_server/ 示例。

Worker 间调用 (context.call)

Worker 可以通过 context.call() 调用其他服务,构建处理流水线:

@app.capability(type="tokenize")
def handler(request: ModelInferRequest, context: Context) -> ModelInferResponse:
    # 处理输入
    text = request.get_input("text").bytes_contents[0].decode()
    tokens = tokenize(text)

    # 调用其他服务
    result = context.call(
        model_name="analyze",
        capability={"type": "analyze"},  # 通过 API Server 路由
        inputs={"tokens": [",".join(tokens)]}
    )

    return build_response(result)

详见 examples/pipeline/ 示例。

项目结构

anyserve/
├── cpp/                    # C++ Dispatcher
│   └── server/             # 核心组件
├── python/anyserve/        # Python Worker
│   ├── cli.py              # CLI 入口
│   ├── kserve.py           # KServe 协议
│   └── worker/             # Worker 实现
├── proto/                  # 协议定义
├── examples/               # 示例
└── docs/                   # 文档
    ├── architecture.md     # 架构设计
    ├── runtime.md          # 运行时实现
    └── mvp.md              # MVP 计划

开发

just setup    # 安装依赖
just build    # 构建
just clean    # 清理

文档

文档 内容
architecture.md 架构设计、核心概念、设计原则
runtime.md 实现细节、代码结构、协议
mvp.md MVP 目标、当前状态、开发计划
agents.md AI 助手协作指南

License

[待定]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

anyserve-0.1.1-cp313-cp313-macosx_14_0_arm64.whl (11.9 MB view details)

Uploaded CPython 3.13macOS 14.0+ ARM64

anyserve-0.1.1-cp312-cp312-macosx_14_0_arm64.whl (11.9 MB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

anyserve-0.1.1-cp311-cp311-macosx_14_0_arm64.whl (11.9 MB view details)

Uploaded CPython 3.11macOS 14.0+ ARM64

File details

Details for the file anyserve-0.1.1-cp313-cp313-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for anyserve-0.1.1-cp313-cp313-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 9b9aa2d0a0f63c1815f772d1830969f133c81a8453b8907f01e6f461e46390b1
MD5 354613cd663d9319d26f01427b90caff
BLAKE2b-256 00504cc35ec41095b655d3efa1b3597267e57f628c0019491a895f9b472e3112

See more details on using hashes here.

File details

Details for the file anyserve-0.1.1-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for anyserve-0.1.1-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 9ec447d90e1aaf083acba77972ef038edea3ec155350d33c81ddd6a371d3e196
MD5 8b3236cd1de7a33cd11ea46db4bce07a
BLAKE2b-256 611289543ce49c3e20eb19a5d5eb0702f9697590b1cdb32855dd4d5c1b409e7c

See more details on using hashes here.

File details

Details for the file anyserve-0.1.1-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for anyserve-0.1.1-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 a5c716db369a778098f2af38ad32aac3e4144a67703fd7d53940952ce52c3891
MD5 340d4c5d5451dde2b218ea43a9673873
BLAKE2b-256 1b8db01e4f066a554f1373b1d1f93103b9cecb9ad5f8266e97dd505c55b8b948

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page