Skip to main content

Local LLM-aware balancer/gateway. Relays Ollama, OpenAI and Anthropic requests across multiple upstream providers with failover.

Project description

LM Relay

Local LLM-aware balancer / gateway. One endpoint, three wire protocols (Ollama / OpenAI / Anthropic), eight upstream providers with health-sorted failover, multi-key rotation, per-key model allow-lists, free-vs-paid catalog toggles, and a TOML config that hot-reloads.

Previously known as freellama. The "free-tier only" framing was too narrow — lmrelay now handles paid keys and BYOK accounts just as well, and is the foundation for upcoming profiles (named routing presets) and tokens (per-user auth / quotas) work.

lmrelay speaks three wire protocols on the same port:

  • Ollama /api/* (so Open WebUI, LobeChat, Continue, Page Assist, n8n, AnythingLLM, Cherry Studio just work),
  • OpenAI /v1/* (so Aider, Continue, Cursor, OpenClaw, Codex just work),
  • Anthropic /anthropic/v1/messages (so Claude Code just works).

Requests are relayed across eight cloud providers — OpenRouter, Groq, NVIDIA NIM, HuggingFace, Cerebras, Cloudflare Workers AI, Google Gemini, and a local Ollama you may already have — with automatic failover, per-(provider, key) cooldown matrix, health-sorted candidate chains, and multi-key rotation.

Install

pipx install lmrelay        # recommended for end users
pip install lmrelay         # if pipx is unavailable
uv tool install lmrelay     # if you prefer uv

Developing lmrelay itself? make install creates .venv and installs the package with dev extras; make check runs ruff + mypy + pytest. See all targets with make.

Quickstart

lmrelay init           # interactive wizard: writes ~/.lmrelay/.env
lmrelay serve          # starts the gateway on :11434

# In another terminal — point any Ollama client at us:
curl http://localhost:11434/api/tags

# Claude Code with a free backend:
lmrelay run claude

# OpenClaw / Aider / Continue:
lmrelay bind aider
lmrelay bind continue

Providers

Provider chat stream tools embed vision json-mode
OpenRouter
Groq
NVIDIA NIM
HuggingFace Router partial partial
Cerebras
Cloudflare WAI partial
Gemini
Local Ollama via format=json

Virtual aliases

Instead of typing a full model id, use:

  • free — any free model, health-sorted
  • fast — lowest TTFT (Groq / Cerebras first)
  • quality — widest catalog of strong models (OpenRouter first)
  • coding — code-tuned models with reliable tool use
  • vision — multimodal models accepting image input
  • embed — embedding models

CLI

lmrelay init           Interactive wizard: write ~/.lmrelay/.env
lmrelay serve          Run the gateway
lmrelay reload         Re-read lmrelay.toml + .env without restart
lmrelay keys           Show recognised provider keys and health
lmrelay list           List available models (per provider / alias)
lmrelay doctor         Pre-flight checks (--claude / --openclaw)
lmrelay ping           List providers (enabled/disabled) + tiny pong probe
lmrelay audit-models   Probe each model with each key
lmrelay bench          Benchmark p50/p95 latency per provider
lmrelay run <agent>    Launch claude / openclaw / codex / gemini / ...
lmrelay bind <agent>   Write persistent agent config (aider / continue / cursor / lobechat / ...)
lmrelay migrate-ollama Move local Ollama to :11435 and register it as backend
lmrelay telemetry      on / off / status
lmrelay dashboard      TUI live dashboard
lmrelay-watcher        Background head-of-chain probe (separate binary)

Configuration (lmrelay.toml)

lmrelay looks for a TOML config in two well-known locations:

  1. ./lmrelay.toml — when running from a checkout (already in .gitignore).
  2. ~/.lmrelay/lmrelay.toml — when installed via pipx / pip / docker.

Process env vars take precedence over values in the file, so Docker / systemd / CI can override without editing it. See lmrelay.toml.example for the full schema. Each provider gets its own section and one [[provider.X.keys]] block per credential:

[server]
host       = "0.0.0.0"
port       = 11434
log_level  = "INFO"

[runtime]
disabled_providers = ["cloudflare_wai"]

[provider.openrouter]
[[provider.openrouter.keys]]
api_key = "sk-or-v1-..."
label   = "personal"

[[provider.openrouter.keys]]
api_key = "sk-or-v1-..."
label   = "work"

Per-key model filter

Each [[provider.X.keys]] block optionally takes a models = [...] list of fnmatch globs. The router only routes a (provider, key, model) triple through this key if at least one pattern matches the resolved model id. Missing or empty → no filter. Useful when one key has a quota only for a model family, or when different keys belong to different paid sub-accounts:

[[provider.openrouter.keys]]
api_key = "sk-or-v1-aaa"
label   = "free-tier"
models  = ["*:free"]              # only OpenRouter free models

[[provider.openrouter.keys]]
api_key = "sk-or-v1-bbb"
label   = "work-paid"
# no `models` → this key handles everything else

lmrelay keys shows the filter in the models column.

Free vs paid models per provider

By default each provider exposes only its free-tier models. Opt into paid models with include_paid = true, cherry-pick specific paid models with include_extra, or hide individual ones with exclude:

[provider.openrouter]
include_paid  = false                       # default — free models only
include_extra = ["openai/gpt-4o-mini"]      # let one paid model through
exclude       = ["meta-llama/llama-3.2-1b:free"]

All four fields are optional. include_extra and exclude are fnmatch globs (* matches anything). exclude wins over the others. Only OpenRouter currently surfaces mixed free/paid tiers — the rest of the providers tag their catalogs as free by default. Use lmrelay reload after editing the file.

Enabling and disabling providers

A provider is enabled when:

  • at least one of its <PROVIDER>_API_KEY[_N] env vars is set OR its [provider.X] section in lmrelay.toml carries at least one [[provider.X.keys]] block, AND
  • its name does not appear in LMRELAY_DISABLED_PROVIDERS (or the TOML runtime.disabled_providers list).

lmrelay ping prints the per-provider on/off table and, unless you pass --no-probe, sends a 4-token Reply with one word: pong request to the first model in each enabled provider's catalog to verify it is actually reachable:

lmrelay ping                       # full table + live probes
lmrelay ping --no-probe            # just list status, no network calls
lmrelay ping --provider groq       # narrow to one provider
lmrelay ping --json

To temporarily disable a provider without removing its key:

LMRELAY_DISABLED_PROVIDERS=cloudflare_wai,huggingface lmrelay serve

Disabled providers are dropped from app.state.providers at startup, so the router never picks them.

Reload

After editing the file on a running gateway, hot-reload without restart:

lmrelay reload                       # local instance
sudo systemctl reload lmrelay        # systemd-managed
docker compose exec frl_app lmrelay reload

Reload re-reads lmrelay.toml + .env, rebuilds the key ring and the active provider list, and clears the cooldown matrix. Host/port stay bound until a full restart.

Deploying

  • systemd: see deploy/systemd/ and deploy/README.md for unit files and the install procedure (system account, WorkingDirectory, ExecReload).
  • Docker: docker compose up -d. Drop your lmrelay.toml into ./.lmrelay/lmrelay.toml (mounted at /root/.lmrelay).

Multi-key rotation

Every provider's env var accepts _2, _3, ... suffixes:

OPENROUTER_API_KEY=sk-or-v1-...
OPENROUTER_API_KEY_2=sk-or-v1-...
OPENROUTER_API_KEY_3=sk-or-v1-...

When the active key hits a 429 / quota, the router automatically rotates to the next key and puts the offender in cooldown.

Security

  • Default bind is 0.0.0.0:11434 so the LAN can see it (consistent with Ollama). When LMRELAY_TOKEN is not set, a loud banner is printed at startup.
  • Set LMRELAY_TOKEN=$(openssl rand -hex 16) to require Authorization: Bearer <token> for /api/*, /v1/*, /anthropic/v1/*. /health, /ready, /metrics, /docs always remain open.
  • API keys are never logged in full — only the last 4 characters.

Telemetry

Off by default. lmrelay telemetry on enables sending install_uuid, lmrelay version, OS, python version, and aggregate counters. Never prompts, completions, hostnames, IPs, or API keys.

Docker

docker compose up -d

docker-compose.yml runs two services:

  • frl_app — the gateway on :11434
  • frl_watcher — the head-of-chain probe daemon

License

MIT — see LICENSE.

lmrelay is not affiliated with Meta, Ollama Inc., OpenAI, Anthropic, Groq, NVIDIA, HuggingFace, Cerebras, or Cloudflare. All trademarks are property of their respective owners.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lmrelay-0.0.0.tar.gz (80.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lmrelay-0.0.0-py3-none-any.whl (91.5 kB view details)

Uploaded Python 3

File details

Details for the file lmrelay-0.0.0.tar.gz.

File metadata

  • Download URL: lmrelay-0.0.0.tar.gz
  • Upload date:
  • Size: 80.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lmrelay-0.0.0.tar.gz
Algorithm Hash digest
SHA256 1374d05d00b4f7b8499cb0540a607155dc8e14943e0dae0411f325e906d168ff
MD5 c135fa7f38b87e3de066b05415a3a16e
BLAKE2b-256 4ca9f4955e3c89b3de2fd3f533a62c98ebc83b00cb2fd851b196d3e625a53657

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmrelay-0.0.0.tar.gz:

Publisher: publish.yml on KPbICO6Ou/lmrelay

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lmrelay-0.0.0-py3-none-any.whl.

File metadata

  • Download URL: lmrelay-0.0.0-py3-none-any.whl
  • Upload date:
  • Size: 91.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lmrelay-0.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c42ca100c52bd634d46da6d2efb8b8d2dab2bfe47334d13811a8a71475b76624
MD5 dd97f677fc8a429454b815d9c8569bd9
BLAKE2b-256 d96e381aef2a86ad1b7f9139b8e07bf6e929a6bfd5be8267e4057d7db3c00f08

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmrelay-0.0.0-py3-none-any.whl:

Publisher: publish.yml on KPbICO6Ou/lmrelay

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page