Skip to main content

anyServe - Capability-Oriented Serving Runtime for LLM inference

Project description

anyserve

面向大规模 LLM 推理的 Serving Runtime。

项目状态

POC 阶段 - 核心骨架已实现,正在开发 MVP 功能。

核心特性

  • Capability 驱动:基于任意 key-value 的请求路由,而非固定 model name
  • Worker 动态启停:根据负载动态管理 Worker,资源灵活复用
  • 控制流/数据流分离:控制流走 KServe 协议,数据流走 Object System
  • C++ Dispatcher + Python Worker:高性能控制面 + 灵活执行面

架构概览

┌──────────────────────────────────────────┐
│            API Server (独立项目)          │
│         基于 Capability 路由请求          │
└──────────────────────┬───────────────────┘
                       │
          ┌────────────┼────────────┐
          ↓            ↓            ↓
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│  Replica A   │ │  Replica B   │ │  Replica C   │
│  (anyserve)  │ │  (anyserve)  │ │  (anyserve)  │
│              │ │              │ │              │
│ Dispatcher   │ │ Dispatcher   │ │ Dispatcher   │
│     ↓        │ │     ↓        │ │     ↓        │
│  Workers     │ │  Workers     │ │  Workers     │
└──────────────┘ └──────────────┘ └──────────────┘

详细设计请参阅:

快速开始

环境要求

  • Python 3.11+
  • C++ 编译器(支持 C++17)
  • CMake 3.20+
  • Conan 2.0+

安装

# 安装依赖并构建
just setup
just build

# 安装 Python 包
pip install -e python/

运行示例

# 启动 server
anyserve examples.basic.app:app --port 8000 --workers 1

# 测试
python examples/basic/run_example.py

定义 Capability Handler

from anyserve import AnyServe, ModelInferRequest, ModelInferResponse

app = AnyServe()

@app.capability(type="echo")
def echo_handler(request: ModelInferRequest) -> ModelInferResponse:
    response = ModelInferResponse(
        model_name=request.model_name,
        id=request.id
    )
    for inp in request.inputs:
        out = response.add_output(
            name=f"output_{inp.name}",
            datatype=inp.datatype,
            shape=inp.shape
        )
        out.contents = inp.contents
    return response

Client 连接模式

Client 支持两种连接模式:

from anyserve.worker.client import Client

# Direct 模式 - 直接连接指定 Worker
client = Client(endpoint="localhost:50051")

# Discovery 模式 - 通过 API Server 自动发现 Worker
client = Client(
    api_server="http://localhost:8080",
    capability={"type": "echo"}
)

result = client.infer("echo", {"text": ["hello"]})
client.close()

详见 examples/multi_server/ 示例。

Worker 间调用 (context.call)

Worker 可以通过 context.call() 调用其他服务,构建处理流水线:

@app.capability(type="tokenize")
def handler(request: ModelInferRequest, context: Context) -> ModelInferResponse:
    # 处理输入
    text = request.get_input("text").bytes_contents[0].decode()
    tokens = tokenize(text)

    # 调用其他服务
    result = context.call(
        model_name="analyze",
        capability={"type": "analyze"},  # 通过 API Server 路由
        inputs={"tokens": [",".join(tokens)]}
    )

    return build_response(result)

详见 examples/pipeline/ 示例。

项目结构

anyserve/
├── cpp/                    # C++ Dispatcher
│   └── server/             # 核心组件
├── python/anyserve/        # Python Worker
│   ├── cli.py              # CLI 入口
│   ├── kserve.py           # KServe 协议
│   └── worker/             # Worker 实现
├── proto/                  # 协议定义
├── examples/               # 示例
└── docs/                   # 文档
    ├── architecture.md     # 架构设计
    ├── runtime.md          # 运行时实现
    └── mvp.md              # MVP 计划

开发

just setup    # 安装依赖
just build    # 构建
just clean    # 清理

文档

文档 内容
architecture.md 架构设计、核心概念、设计原则
runtime.md 实现细节、代码结构、协议
mvp.md MVP 目标、当前状态、开发计划
agents.md AI 助手协作指南

License

[待定]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

anyserve-0.1.2-cp313-cp313-macosx_14_0_arm64.whl (11.9 MB view details)

Uploaded CPython 3.13macOS 14.0+ ARM64

anyserve-0.1.2-cp312-cp312-macosx_14_0_arm64.whl (11.9 MB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

anyserve-0.1.2-cp311-cp311-macosx_14_0_arm64.whl (11.9 MB view details)

Uploaded CPython 3.11macOS 14.0+ ARM64

File details

Details for the file anyserve-0.1.2-cp313-cp313-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for anyserve-0.1.2-cp313-cp313-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 86b94a243b5abe1c3e0967f64fec3f5cf00cd73db82a339c27bc4d8cb9a5869c
MD5 81a6d0740d10332935cd0c007db9d496
BLAKE2b-256 d441145fec6a90a85edb07853e8e0cb2cb53d1b613a8ab9cf4e1649b9631124d

See more details on using hashes here.

File details

Details for the file anyserve-0.1.2-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for anyserve-0.1.2-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 00c7649f72361a01bfc5d26836d62ef18fcc843338a1ce8fdd72bd303075c97f
MD5 b2495e64644df2ca9c126872fce7b7d1
BLAKE2b-256 61bee145e16bf4c191d08452225df9c14f847c70b5e43654f6841e7b64b74896

See more details on using hashes here.

File details

Details for the file anyserve-0.1.2-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for anyserve-0.1.2-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 9ebb59fdbca2ff63cd84123f3d10085716cc31162877c394ebeec58d9f9f04fd
MD5 8aab812eaa8c91cc0a782b485b9924ba
BLAKE2b-256 050c587e8500d575d5dbfe7ce3d63a6174358ac24367ac9e3132252c9b115b44

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page