Skip to main content

Relay: minimal LLM inference server for heterogeneous devices

Project description

RelayServe

RelayServe is a minimal LLM inference server that adapts to heterogeneous devices.

Quick start

Install from PyPI:

pip install relayserve
relayserve

Defaults:

  • HTTP server: :8080
  • Endpoints:
    • GET /healthz
    • GET /v1/models
    • POST /v1/chat/completions
    • GET /metrics
    • GET /debug/shard
    • POST /v1/chat/pretty (colorized text response)
  • Backends: set RELAYSERVE_BACKENDS to comma-separated llama.cpp servers

Environment

  • RELAYSERVE_PORT (default 8080)
  • RELAYSERVE_MODEL_ID (default relay-gguf)
  • RELAYSERVE_BACKENDS (comma-separated, e.g. http://localhost:8081,http://localhost:8082)
  • RELAYSERVE_BATCH_SIZE (default 4)
  • RELAYSERVE_BATCH_WAIT_MS (default 10)
  • RELAYSERVE_METRICS_MAX_ITEMS (default 1000)
  • RELAYSERVE_TOTAL_LAYERS (default 32)
  • RELAYSERVE_PRETTY_JSON (set 1 for readable JSON responses)
  • RELAYSERVE_PRETTY_DEFAULT (default 1, set 0 for JSON by default)

Spawning llama.cpp backends

export LLAMA_SERVER_PATH=/path/to/llama.cpp/server
export LLAMA_MODEL_PATH=/path/to/models/phi-3-mini.gguf
export LLAMA_PORTS=8081,8082
python scripts/spawn_backends.py

Then run the RelayServe server with:

export RELAYSERVE_BACKENDS=http://localhost:8081,http://localhost:8082
relayserve

Project details


Release history Release notifications | RSS feed

This version

1.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

relayserve-1.1.tar.gz (13.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

relayserve-1.1-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file relayserve-1.1.tar.gz.

File metadata

  • Download URL: relayserve-1.1.tar.gz
  • Upload date:
  • Size: 13.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for relayserve-1.1.tar.gz
Algorithm Hash digest
SHA256 566f3744885974e773d1f5c855541796ca6f38fe5ba44958a957f611619b13a8
MD5 ed035d0c7f2a7fc5a7914bd68c8e8fe9
BLAKE2b-256 a9e3effd4b6cd089dd6bf0012d881516aae566a994fe651c3d49434d370620f3

See more details on using hashes here.

File details

Details for the file relayserve-1.1-py3-none-any.whl.

File metadata

  • Download URL: relayserve-1.1-py3-none-any.whl
  • Upload date:
  • Size: 17.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for relayserve-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f1ac140ac51d29de9c3546bd165c6304ad4068ff6b9173fca2e2a08a20b5f02b
MD5 6f3d04d502332dc1d2e4f6289fd3cf5d
BLAKE2b-256 d7ec4d059074f3f78bd165b5ddfc600e6e73e1206768b980530037b0e5e847af

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page