Relay: minimal LLM inference server for heterogeneous devices
Project description
RelayServe
RelayServe is a minimal LLM inference server that adapts to heterogeneous devices.
Quick start
Install from PyPI:
pip install relayserve
relayserve
Defaults:
- HTTP server:
:8080 - Endpoints:
GET /healthzGET /v1/modelsPOST /v1/chat/completionsGET /metricsGET /debug/shardPOST /v1/chat/pretty(colorized text response)
- Backends: set
RELAYSERVE_BACKENDSto comma-separated llama.cpp servers
Environment
RELAYSERVE_PORT(default8080)RELAYSERVE_MODEL_ID(defaultrelay-gguf)RELAYSERVE_BACKENDS(comma-separated, e.g.http://localhost:8081,http://localhost:8082)RELAYSERVE_BATCH_SIZE(default4)RELAYSERVE_BATCH_WAIT_MS(default10)RELAYSERVE_METRICS_MAX_ITEMS(default1000)RELAYSERVE_TOTAL_LAYERS(default32)RELAYSERVE_PRETTY_JSON(set1for readable JSON responses)RELAYSERVE_PRETTY_DEFAULT(default1, set0for JSON by default)
Spawning llama.cpp backends
export LLAMA_SERVER_PATH=/path/to/llama.cpp/server
export LLAMA_MODEL_PATH=/path/to/models/phi-3-mini.gguf
export LLAMA_PORTS=8081,8082
python scripts/spawn_backends.py
Then run the RelayServe server with:
export RELAYSERVE_BACKENDS=http://localhost:8081,http://localhost:8082
relayserve
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
relayserve-1.1.tar.gz
(13.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
relayserve-1.1-py3-none-any.whl
(17.1 kB
view details)
File details
Details for the file relayserve-1.1.tar.gz.
File metadata
- Download URL: relayserve-1.1.tar.gz
- Upload date:
- Size: 13.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
566f3744885974e773d1f5c855541796ca6f38fe5ba44958a957f611619b13a8
|
|
| MD5 |
ed035d0c7f2a7fc5a7914bd68c8e8fe9
|
|
| BLAKE2b-256 |
a9e3effd4b6cd089dd6bf0012d881516aae566a994fe651c3d49434d370620f3
|
File details
Details for the file relayserve-1.1-py3-none-any.whl.
File metadata
- Download URL: relayserve-1.1-py3-none-any.whl
- Upload date:
- Size: 17.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1ac140ac51d29de9c3546bd165c6304ad4068ff6b9173fca2e2a08a20b5f02b
|
|
| MD5 |
6f3d04d502332dc1d2e4f6289fd3cf5d
|
|
| BLAKE2b-256 |
d7ec4d059074f3f78bd165b5ddfc600e6e73e1206768b980530037b0e5e847af
|