anyServe - Capability-Oriented Serving Runtime for LLM inference
Project description
anyserve
面向大规模 LLM 推理的 Serving Runtime。
项目状态
POC 阶段 - 核心骨架已实现,正在开发 MVP 功能。
核心特性
- Capability 驱动:基于任意 key-value 的请求路由,而非固定 model name
- Worker 动态启停:根据负载动态管理 Worker,资源灵活复用
- 控制流/数据流分离:控制流走 KServe 协议,数据流走 Object System
- C++ Dispatcher + Python Worker:高性能控制面 + 灵活执行面
架构概览
┌──────────────────────────────────────────┐
│ API Server (独立项目) │
│ 基于 Capability 路由请求 │
└──────────────────────┬───────────────────┘
│
┌────────────┼────────────┐
↓ ↓ ↓
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Replica A │ │ Replica B │ │ Replica C │
│ (anyserve) │ │ (anyserve) │ │ (anyserve) │
│ │ │ │ │ │
│ Dispatcher │ │ Dispatcher │ │ Dispatcher │
│ ↓ │ │ ↓ │ │ ↓ │
│ Workers │ │ Workers │ │ Workers │
└──────────────┘ └──────────────┘ └──────────────┘
详细设计请参阅:
快速开始
环境要求
- Python 3.11+
- C++ 编译器(支持 C++17)
- CMake 3.20+
- Conan 2.0+
安装
# 安装依赖并构建
just setup
just build
# 安装 Python 包
pip install -e python/
运行示例
# 启动 server
anyserve examples.basic.app:app --port 8000 --workers 1
# 测试
python examples/basic/run_example.py
定义 Capability Handler
from anyserve import AnyServe, ModelInferRequest, ModelInferResponse
app = AnyServe()
@app.capability(type="echo")
def echo_handler(request: ModelInferRequest) -> ModelInferResponse:
response = ModelInferResponse(
model_name=request.model_name,
id=request.id
)
for inp in request.inputs:
out = response.add_output(
name=f"output_{inp.name}",
datatype=inp.datatype,
shape=inp.shape
)
out.contents = inp.contents
return response
Client 连接模式
Client 支持两种连接模式:
from anyserve.worker.client import Client
# Direct 模式 - 直接连接指定 Worker
client = Client(endpoint="localhost:50051")
# Discovery 模式 - 通过 API Server 自动发现 Worker
client = Client(
api_server="http://localhost:8080",
capability={"type": "echo"}
)
result = client.infer("echo", {"text": ["hello"]})
client.close()
详见 examples/multi_server/ 示例。
Worker 间调用 (context.call)
Worker 可以通过 context.call() 调用其他服务,构建处理流水线:
@app.capability(type="tokenize")
def handler(request: ModelInferRequest, context: Context) -> ModelInferResponse:
# 处理输入
text = request.get_input("text").bytes_contents[0].decode()
tokens = tokenize(text)
# 调用其他服务
result = context.call(
model_name="analyze",
capability={"type": "analyze"}, # 通过 API Server 路由
inputs={"tokens": [",".join(tokens)]}
)
return build_response(result)
详见 examples/pipeline/ 示例。
项目结构
anyserve/
├── cpp/ # C++ Dispatcher
│ └── server/ # 核心组件
├── python/anyserve/ # Python Worker
│ ├── cli.py # CLI 入口
│ ├── kserve.py # KServe 协议
│ └── worker/ # Worker 实现
├── proto/ # 协议定义
├── examples/ # 示例
└── docs/ # 文档
├── architecture.md # 架构设计
├── runtime.md # 运行时实现
└── mvp.md # MVP 计划
开发
just setup # 安装依赖
just build # 构建
just clean # 清理
文档
| 文档 | 内容 |
|---|---|
| architecture.md | 架构设计、核心概念、设计原则 |
| runtime.md | 实现细节、代码结构、协议 |
| mvp.md | MVP 目标、当前状态、开发计划 |
| agents.md | AI 助手协作指南 |
License
[待定]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anyserve-0.1.2-cp313-cp313-macosx_14_0_arm64.whl.
File metadata
- Download URL: anyserve-0.1.2-cp313-cp313-macosx_14_0_arm64.whl
- Upload date:
- Size: 11.9 MB
- Tags: CPython 3.13, macOS 14.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86b94a243b5abe1c3e0967f64fec3f5cf00cd73db82a339c27bc4d8cb9a5869c
|
|
| MD5 |
81a6d0740d10332935cd0c007db9d496
|
|
| BLAKE2b-256 |
d441145fec6a90a85edb07853e8e0cb2cb53d1b613a8ab9cf4e1649b9631124d
|
File details
Details for the file anyserve-0.1.2-cp312-cp312-macosx_14_0_arm64.whl.
File metadata
- Download URL: anyserve-0.1.2-cp312-cp312-macosx_14_0_arm64.whl
- Upload date:
- Size: 11.9 MB
- Tags: CPython 3.12, macOS 14.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00c7649f72361a01bfc5d26836d62ef18fcc843338a1ce8fdd72bd303075c97f
|
|
| MD5 |
b2495e64644df2ca9c126872fce7b7d1
|
|
| BLAKE2b-256 |
61bee145e16bf4c191d08452225df9c14f847c70b5e43654f6841e7b64b74896
|
File details
Details for the file anyserve-0.1.2-cp311-cp311-macosx_14_0_arm64.whl.
File metadata
- Download URL: anyserve-0.1.2-cp311-cp311-macosx_14_0_arm64.whl
- Upload date:
- Size: 11.9 MB
- Tags: CPython 3.11, macOS 14.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ebb59fdbca2ff63cd84123f3d10085716cc31162877c394ebeec58d9f9f04fd
|
|
| MD5 |
8aab812eaa8c91cc0a782b485b9924ba
|
|
| BLAKE2b-256 |
050c587e8500d575d5dbfe7ce3d63a6174358ac24367ac9e3132252c9b115b44
|