Local LLM-aware balancer/gateway. Relays Ollama, OpenAI and Anthropic requests across multiple upstream providers with failover.

These details have not been verified by PyPI

Project links

Project description

LM Relay

Local LLM-aware balancer / gateway. One endpoint, three wire protocols (Ollama / OpenAI / Anthropic), eight upstream providers with health-sorted failover, multi-key rotation, per-key model allow-lists, free-vs-paid catalog toggles, and a TOML config that hot-reloads.

Previously known as freellama. The "free-tier only" framing was too narrow — lmrelay now handles paid keys and BYOK accounts just as well, and is the foundation for upcoming profiles (named routing presets) and tokens (per-user auth / quotas) work.

lmrelay speaks three wire protocols on the same port:

Ollama /api/* (so Open WebUI, LobeChat, Continue, Page Assist, n8n, AnythingLLM, Cherry Studio just work),
OpenAI /v1/* (so Aider, Continue, Cursor, OpenClaw, Codex just work),
Anthropic /anthropic/v1/messages (so Claude Code just works).

Requests are relayed across eight cloud providers — OpenRouter, Groq, NVIDIA NIM, HuggingFace, Cerebras, Cloudflare Workers AI, Google Gemini, and a local Ollama you may already have — with automatic failover, per-(provider, key) cooldown matrix, health-sorted candidate chains, and multi-key rotation.

Install

pipx install lmrelay        # recommended for end users
pip install lmrelay         # if pipx is unavailable
uv tool install lmrelay     # if you prefer uv

Developing lmrelay itself? make install creates .venv and installs the package with dev extras; make check runs ruff + mypy + pytest. See all targets with make.

Quickstart

lmrelay init           # interactive wizard: writes ~/.lmrelay/.env
lmrelay serve          # starts the gateway on :11434

# In another terminal — point any Ollama client at us:
curl http://localhost:11434/api/tags

# Claude Code with a free backend:
lmrelay run claude

# OpenClaw / Aider / Continue:
lmrelay bind aider
lmrelay bind continue

Providers

Provider	chat	stream	tools	embed	vision	json-mode
OpenRouter	✓	✓	✓	✓	✓	✓
Groq	✓	✓	✓	—	—	✓
NVIDIA NIM	✓	✓	✓	✓	✓	✓
HuggingFace Router	✓	✓	partial	✓	—	partial
Cerebras	✓	✓	✓	—	—	✓
Cloudflare WAI	✓	✓	—	✓	—	partial
Gemini	✓	✓	✓	✓	✓	✓
Local Ollama	✓	✓	✓	✓	✓	via format=json

Virtual aliases

Instead of typing a full model id, use:

free — any free model, health-sorted
fast — lowest TTFT (Groq / Cerebras first)
quality — widest catalog of strong models (OpenRouter first)
coding — code-tuned models with reliable tool use
vision — multimodal models accepting image input
embed — embedding models

CLI

lmrelay init           Interactive wizard: write ~/.lmrelay/.env
lmrelay serve          Run the gateway
lmrelay reload         Re-read lmrelay.toml + .env without restart
lmrelay keys           Show recognised provider keys and health
lmrelay list           List available models (per provider / alias)
lmrelay doctor         Pre-flight checks (--claude / --openclaw)
lmrelay ping           List providers (enabled/disabled) + tiny pong probe
lmrelay audit-models   Probe each model with each key
lmrelay bench          Benchmark p50/p95 latency per provider
lmrelay run <agent>    Launch claude / openclaw / codex / gemini / ...
lmrelay bind <agent>   Write persistent agent config (aider / continue / cursor / lobechat / ...)
lmrelay migrate-ollama Move local Ollama to :11435 and register it as backend
lmrelay telemetry      on / off / status
lmrelay dashboard      TUI live dashboard
lmrelay-watcher        Background head-of-chain probe (separate binary)

Configuration (lmrelay.toml)

lmrelay looks for a TOML config in two well-known locations:

./lmrelay.toml — when running from a checkout (already in .gitignore).
~/.lmrelay/lmrelay.toml — when installed via pipx / pip / docker.

Process env vars take precedence over values in the file, so Docker / systemd / CI can override without editing it. See lmrelay.toml.example for the full schema. Each provider gets its own section and one [[provider.X.keys]] block per credential:

[server]
host       = "0.0.0.0"
port       = 11434
log_level  = "INFO"

[runtime]
disabled_providers = ["cloudflare_wai"]

[provider.openrouter]
[[provider.openrouter.keys]]
api_key = "sk-or-v1-..."
label   = "personal"

[[provider.openrouter.keys]]
api_key = "sk-or-v1-..."
label   = "work"

Per-key model filter

Each [[provider.X.keys]] block optionally takes a models = [...] list of fnmatch globs. The router only routes a (provider, key, model) triple through this key if at least one pattern matches the resolved model id. Missing or empty → no filter. Useful when one key has a quota only for a model family, or when different keys belong to different paid sub-accounts:

[[provider.openrouter.keys]]
api_key = "sk-or-v1-aaa"
label   = "free-tier"
models  = ["*:free"]              # only OpenRouter free models

[[provider.openrouter.keys]]
api_key = "sk-or-v1-bbb"
label   = "work-paid"
# no `models` → this key handles everything else

lmrelay keys shows the filter in the models column.

Free vs paid models per provider

By default each provider exposes only its free-tier models. Opt into paid models with include_paid = true, cherry-pick specific paid models with include_extra, or hide individual ones with exclude:

[provider.openrouter]
include_paid  = false                       # default — free models only
include_extra = ["openai/gpt-4o-mini"]      # let one paid model through
exclude       = ["meta-llama/llama-3.2-1b:free"]

All four fields are optional. include_extra and exclude are fnmatch globs (* matches anything). exclude wins over the others. Only OpenRouter currently surfaces mixed free/paid tiers — the rest of the providers tag their catalogs as free by default. Use lmrelay reload after editing the file.

Enabling and disabling providers

A provider is enabled when:

at least one of its <PROVIDER>_API_KEY[_N] env vars is set OR its [provider.X] section in lmrelay.toml carries at least one [[provider.X.keys]] block, AND
its name does not appear in LMRELAY_DISABLED_PROVIDERS (or the TOML runtime.disabled_providers list).

lmrelay ping prints the per-provider on/off table and, unless you pass --no-probe, sends a 4-token Reply with one word: pong request to the first model in each enabled provider's catalog to verify it is actually reachable:

lmrelay ping                       # full table + live probes
lmrelay ping --no-probe            # just list status, no network calls
lmrelay ping --provider groq       # narrow to one provider
lmrelay ping --json

To temporarily disable a provider without removing its key:

LMRELAY_DISABLED_PROVIDERS=cloudflare_wai,huggingface lmrelay serve

Disabled providers are dropped from app.state.providers at startup, so the router never picks them.

Reload

After editing the file on a running gateway, hot-reload without restart:

lmrelay reload                       # local instance
sudo systemctl reload lmrelay        # systemd-managed
docker compose exec frl_app lmrelay reload

Reload re-reads lmrelay.toml + .env, rebuilds the key ring and the active provider list, and clears the cooldown matrix. Host/port stay bound until a full restart.

Deploying

systemd: see deploy/systemd/ and deploy/README.md for unit files and the install procedure (system account, WorkingDirectory, ExecReload).
Docker: docker compose up -d. Drop your lmrelay.toml into ./.lmrelay/lmrelay.toml (mounted at /root/.lmrelay).

Multi-key rotation

Every provider's env var accepts _2, _3, ... suffixes:

OPENROUTER_API_KEY=sk-or-v1-...
OPENROUTER_API_KEY_2=sk-or-v1-...
OPENROUTER_API_KEY_3=sk-or-v1-...

When the active key hits a 429 / quota, the router automatically rotates to the next key and puts the offender in cooldown.

Security

Default bind is 0.0.0.0:11434 so the LAN can see it (consistent with Ollama). When LMRELAY_TOKEN is not set, a loud banner is printed at startup.
Set LMRELAY_TOKEN=$(openssl rand -hex 16) to require Authorization: Bearer <token> for /api/*, /v1/*, /anthropic/v1/*. /health, /ready, /metrics, /docs always remain open.
API keys are never logged in full — only the last 4 characters.

Telemetry

Off by default. lmrelay telemetry on enables sending install_uuid, lmrelay version, OS, python version, and aggregate counters. Never prompts, completions, hostnames, IPs, or API keys.

Docker

docker compose up -d

docker-compose.yml runs two services:

frl_app — the gateway on :11434
frl_watcher — the head-of-chain probe daemon

License

MIT — see LICENSE.

lmrelay is not affiliated with Meta, Ollama Inc., OpenAI, Anthropic, Groq, NVIDIA, HuggingFace, Cerebras, or Cloudflare. All trademarks are property of their respective owners.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.0

May 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lmrelay-0.0.0.tar.gz (80.1 kB view details)

Uploaded May 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lmrelay-0.0.0-py3-none-any.whl (91.5 kB view details)

Uploaded May 19, 2026 Python 3

File details

Details for the file lmrelay-0.0.0.tar.gz.

File metadata

Download URL: lmrelay-0.0.0.tar.gz
Upload date: May 19, 2026
Size: 80.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lmrelay-0.0.0.tar.gz
Algorithm	Hash digest
SHA256	`1374d05d00b4f7b8499cb0540a607155dc8e14943e0dae0411f325e906d168ff`
MD5	`c135fa7f38b87e3de066b05415a3a16e`
BLAKE2b-256	`4ca9f4955e3c89b3de2fd3f533a62c98ebc83b00cb2fd851b196d3e625a53657`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmrelay-0.0.0.tar.gz:

Publisher: publish.yml on KPbICO6Ou/lmrelay

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lmrelay-0.0.0.tar.gz
- Subject digest: 1374d05d00b4f7b8499cb0540a607155dc8e14943e0dae0411f325e906d168ff
- Sigstore transparency entry: 1575371318
- Sigstore integration time: May 19, 2026
Source repository:
- Permalink: KPbICO6Ou/lmrelay@19328eec4f10ec9c29aa052f66e74a1ef9c6a8cc
- Branch / Tag: refs/tags/0.0.0
- Owner: https://github.com/KPbICO6Ou
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@19328eec4f10ec9c29aa052f66e74a1ef9c6a8cc
- Trigger Event: push

File details

Details for the file lmrelay-0.0.0-py3-none-any.whl.

File metadata

Download URL: lmrelay-0.0.0-py3-none-any.whl
Upload date: May 19, 2026
Size: 91.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lmrelay-0.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c42ca100c52bd634d46da6d2efb8b8d2dab2bfe47334d13811a8a71475b76624`
MD5	`dd97f677fc8a429454b815d9c8569bd9`
BLAKE2b-256	`d96e381aef2a86ad1b7f9139b8e07bf6e929a6bfd5be8267e4057d7db3c00f08`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmrelay-0.0.0-py3-none-any.whl:

Publisher: publish.yml on KPbICO6Ou/lmrelay

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lmrelay-0.0.0-py3-none-any.whl
- Subject digest: c42ca100c52bd634d46da6d2efb8b8d2dab2bfe47334d13811a8a71475b76624
- Sigstore transparency entry: 1575371378
- Sigstore integration time: May 19, 2026
Source repository:
- Permalink: KPbICO6Ou/lmrelay@19328eec4f10ec9c29aa052f66e74a1ef9c6a8cc
- Branch / Tag: refs/tags/0.0.0
- Owner: https://github.com/KPbICO6Ou
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@19328eec4f10ec9c29aa052f66e74a1ef9c6a8cc
- Trigger Event: push

lmrelay 0.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LM Relay

Install

Quickstart

Providers

Virtual aliases

CLI

Configuration (lmrelay.toml)

Per-key model filter

Free vs paid models per provider

Enabling and disabling providers

Reload

Deploying

Multi-key rotation

Security

Telemetry

Docker

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance