Local LLM-aware balancer/gateway. Relays Ollama, OpenAI and Anthropic requests across multiple upstream providers with failover.
Project description
LM Relay
Local LLM-aware balancer / gateway. One endpoint, three wire protocols (Ollama / OpenAI / Anthropic), eight upstream providers with health-sorted failover, multi-key rotation, per-key model allow-lists, free-vs-paid catalog toggles, and a TOML config that hot-reloads.
Previously known as
freellama. The "free-tier only" framing was too narrow —lmrelaynow handles paid keys and BYOK accounts just as well, and is the foundation for upcoming profiles (named routing presets) and tokens (per-user auth / quotas) work.
lmrelay speaks three wire protocols on the same port:
- Ollama
/api/*(so Open WebUI, LobeChat, Continue, Page Assist, n8n, AnythingLLM, Cherry Studio just work), - OpenAI
/v1/*(so Aider, Continue, Cursor, OpenClaw, Codex just work), - Anthropic
/anthropic/v1/messages(so Claude Code just works).
Requests are relayed across eight cloud providers — OpenRouter, Groq, NVIDIA NIM, HuggingFace, Cerebras, Cloudflare Workers AI, Google Gemini, and a local Ollama you may already have — with automatic failover, per-(provider, key) cooldown matrix, health-sorted candidate chains, and multi-key rotation.
Install
pipx install lmrelay # recommended for end users
pip install lmrelay # if pipx is unavailable
uv tool install lmrelay # if you prefer uv
Developing lmrelay itself? make install creates .venv and
installs the package with dev extras; make check runs ruff + mypy +
pytest. See all targets with make.
Quickstart
lmrelay init # interactive wizard: writes ~/.lmrelay/.env
lmrelay serve # starts the gateway on :11434
# In another terminal — point any Ollama client at us:
curl http://localhost:11434/api/tags
# Claude Code with a free backend:
lmrelay run claude
# OpenClaw / Aider / Continue:
lmrelay bind aider
lmrelay bind continue
Providers
| Provider | chat | stream | tools | embed | vision | json-mode |
|---|---|---|---|---|---|---|
| OpenRouter | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Groq | ✓ | ✓ | ✓ | — | — | ✓ |
| NVIDIA NIM | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| HuggingFace Router | ✓ | ✓ | partial | ✓ | — | partial |
| Cerebras | ✓ | ✓ | ✓ | — | — | ✓ |
| Cloudflare WAI | ✓ | ✓ | — | ✓ | — | partial |
| Gemini | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Local Ollama | ✓ | ✓ | ✓ | ✓ | ✓ | via format=json |
Virtual aliases
Instead of typing a full model id, use:
free— any free model, health-sortedfast— lowest TTFT (Groq / Cerebras first)quality— widest catalog of strong models (OpenRouter first)coding— code-tuned models with reliable tool usevision— multimodal models accepting image inputembed— embedding models
CLI
lmrelay init Interactive wizard: write ~/.lmrelay/.env
lmrelay serve Run the gateway
lmrelay reload Re-read lmrelay.toml + .env without restart
lmrelay keys Show recognised provider keys and health
lmrelay list List available models (per provider / alias)
lmrelay doctor Pre-flight checks (--claude / --openclaw)
lmrelay ping List providers (enabled/disabled) + tiny pong probe
lmrelay audit-models Probe each model with each key
lmrelay bench Benchmark p50/p95 latency per provider
lmrelay run <agent> Launch claude / openclaw / codex / gemini / ...
lmrelay bind <agent> Write persistent agent config (aider / continue / cursor / lobechat / ...)
lmrelay migrate-ollama Move local Ollama to :11435 and register it as backend
lmrelay telemetry on / off / status
lmrelay dashboard TUI live dashboard
lmrelay-watcher Background head-of-chain probe (separate binary)
Configuration (lmrelay.toml)
lmrelay looks for a TOML config in two well-known locations:
./lmrelay.toml— when running from a checkout (already in.gitignore).~/.lmrelay/lmrelay.toml— when installed via pipx / pip / docker.
Process env vars take precedence over values in the file, so Docker /
systemd / CI can override without editing it. See
lmrelay.toml.example for the full
schema. Each provider gets its own section and one
[[provider.X.keys]] block per credential:
[server]
host = "0.0.0.0"
port = 11434
log_level = "INFO"
[runtime]
disabled_providers = ["cloudflare_wai"]
[provider.openrouter]
[[provider.openrouter.keys]]
api_key = "sk-or-v1-..."
label = "personal"
[[provider.openrouter.keys]]
api_key = "sk-or-v1-..."
label = "work"
Per-key model filter
Each [[provider.X.keys]] block optionally takes a models = [...]
list of fnmatch globs. The router only routes a (provider, key, model) triple through this key if at least one pattern matches the
resolved model id. Missing or empty → no filter. Useful when one key
has a quota only for a model family, or when different keys belong to
different paid sub-accounts:
[[provider.openrouter.keys]]
api_key = "sk-or-v1-aaa"
label = "free-tier"
models = ["*:free"] # only OpenRouter free models
[[provider.openrouter.keys]]
api_key = "sk-or-v1-bbb"
label = "work-paid"
# no `models` → this key handles everything else
lmrelay keys shows the filter in the models column.
Free vs paid models per provider
By default each provider exposes only its free-tier models. Opt
into paid models with include_paid = true, cherry-pick specific
paid models with include_extra, or hide individual ones with
exclude:
[provider.openrouter]
include_paid = false # default — free models only
include_extra = ["openai/gpt-4o-mini"] # let one paid model through
exclude = ["meta-llama/llama-3.2-1b:free"]
All four fields are optional. include_extra and exclude are
fnmatch globs (* matches anything). exclude wins over the
others. Only OpenRouter currently surfaces mixed free/paid tiers
— the rest of the providers tag their catalogs as free by default.
Use lmrelay reload after editing the file.
Enabling and disabling providers
A provider is enabled when:
- at least one of its
<PROVIDER>_API_KEY[_N]env vars is set OR its[provider.X]section inlmrelay.tomlcarries at least one[[provider.X.keys]]block, AND - its name does not appear in
LMRELAY_DISABLED_PROVIDERS(or the TOMLruntime.disabled_providerslist).
lmrelay ping prints the per-provider on/off table and, unless you
pass --no-probe, sends a 4-token Reply with one word: pong request to
the first model in each enabled provider's catalog to verify it is
actually reachable:
lmrelay ping # full table + live probes
lmrelay ping --no-probe # just list status, no network calls
lmrelay ping --provider groq # narrow to one provider
lmrelay ping --json
To temporarily disable a provider without removing its key:
LMRELAY_DISABLED_PROVIDERS=cloudflare_wai,huggingface lmrelay serve
Disabled providers are dropped from app.state.providers at startup, so
the router never picks them.
Reload
After editing the file on a running gateway, hot-reload without restart:
lmrelay reload # local instance
sudo systemctl reload lmrelay # systemd-managed
docker compose exec frl_app lmrelay reload
Reload re-reads lmrelay.toml + .env, rebuilds the key ring and
the active provider list, and clears the cooldown matrix. Host/port
stay bound until a full restart.
Deploying
- systemd: see
deploy/systemd/anddeploy/README.mdfor unit files and the install procedure (system account,WorkingDirectory,ExecReload). - Docker:
docker compose up -d. Drop yourlmrelay.tomlinto./.lmrelay/lmrelay.toml(mounted at/root/.lmrelay).
Multi-key rotation
Every provider's env var accepts _2, _3, ... suffixes:
OPENROUTER_API_KEY=sk-or-v1-...
OPENROUTER_API_KEY_2=sk-or-v1-...
OPENROUTER_API_KEY_3=sk-or-v1-...
When the active key hits a 429 / quota, the router automatically rotates to the next key and puts the offender in cooldown.
Security
- Default bind is
0.0.0.0:11434so the LAN can see it (consistent with Ollama). WhenLMRELAY_TOKENis not set, a loud banner is printed at startup. - Set
LMRELAY_TOKEN=$(openssl rand -hex 16)to requireAuthorization: Bearer <token>for/api/*,/v1/*,/anthropic/v1/*./health,/ready,/metrics,/docsalways remain open. - API keys are never logged in full — only the last 4 characters.
Telemetry
Off by default. lmrelay telemetry on enables sending install_uuid,
lmrelay version, OS, python version, and aggregate counters. Never prompts,
completions, hostnames, IPs, or API keys.
Docker
docker compose up -d
docker-compose.yml runs two services:
frl_app— the gateway on:11434frl_watcher— the head-of-chain probe daemon
License
MIT — see LICENSE.
lmrelay is not affiliated with Meta, Ollama Inc., OpenAI, Anthropic, Groq,
NVIDIA, HuggingFace, Cerebras, or Cloudflare. All trademarks are property of
their respective owners.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lmrelay-0.0.0.tar.gz.
File metadata
- Download URL: lmrelay-0.0.0.tar.gz
- Upload date:
- Size: 80.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1374d05d00b4f7b8499cb0540a607155dc8e14943e0dae0411f325e906d168ff
|
|
| MD5 |
c135fa7f38b87e3de066b05415a3a16e
|
|
| BLAKE2b-256 |
4ca9f4955e3c89b3de2fd3f533a62c98ebc83b00cb2fd851b196d3e625a53657
|
Provenance
The following attestation bundles were made for lmrelay-0.0.0.tar.gz:
Publisher:
publish.yml on KPbICO6Ou/lmrelay
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lmrelay-0.0.0.tar.gz -
Subject digest:
1374d05d00b4f7b8499cb0540a607155dc8e14943e0dae0411f325e906d168ff - Sigstore transparency entry: 1575371318
- Sigstore integration time:
-
Permalink:
KPbICO6Ou/lmrelay@19328eec4f10ec9c29aa052f66e74a1ef9c6a8cc -
Branch / Tag:
refs/tags/0.0.0 - Owner: https://github.com/KPbICO6Ou
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@19328eec4f10ec9c29aa052f66e74a1ef9c6a8cc -
Trigger Event:
push
-
Statement type:
File details
Details for the file lmrelay-0.0.0-py3-none-any.whl.
File metadata
- Download URL: lmrelay-0.0.0-py3-none-any.whl
- Upload date:
- Size: 91.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c42ca100c52bd634d46da6d2efb8b8d2dab2bfe47334d13811a8a71475b76624
|
|
| MD5 |
dd97f677fc8a429454b815d9c8569bd9
|
|
| BLAKE2b-256 |
d96e381aef2a86ad1b7f9139b8e07bf6e929a6bfd5be8267e4057d7db3c00f08
|
Provenance
The following attestation bundles were made for lmrelay-0.0.0-py3-none-any.whl:
Publisher:
publish.yml on KPbICO6Ou/lmrelay
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lmrelay-0.0.0-py3-none-any.whl -
Subject digest:
c42ca100c52bd634d46da6d2efb8b8d2dab2bfe47334d13811a8a71475b76624 - Sigstore transparency entry: 1575371378
- Sigstore integration time:
-
Permalink:
KPbICO6Ou/lmrelay@19328eec4f10ec9c29aa052f66e74a1ef9c6a8cc -
Branch / Tag:
refs/tags/0.0.0 - Owner: https://github.com/KPbICO6Ou
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@19328eec4f10ec9c29aa052f66e74a1ef9c6a8cc -
Trigger Event:
push
-
Statement type: