Unified LLM usage management — API proxy, session diagnostics, multi-CLI orchestration.

These details have not been verified by PyPI

Project links

Project description

llm-relay

Unified LLM usage management — API proxy, session diagnostics, multi-CLI orchestration.

한국어 | llms.txt

Features

Proxy: Transparent API proxy with cache/token monitoring and 12-strategy pruning
Detect: 7 detectors (orphan, stuck, bloat, synthetic, cache, resume, microcompact)
Recover: Session recovery and doctor (7 health checks)
Guard: 4-tier threshold daemon with dual-zone classification
Cost: Per-1% cost calculation and rate-limit header analysis
Orch: Multi-CLI orchestration (Claude Code, Codex CLI, Gemini CLI)
Display: Multi-CLI session monitor with provider badges and liveness detection
I18n: Multi-language support (English, Korean) with browser auto-detection and LLM_RELAY_LANG env
MCP: 8 tools via stdio transport (cli_delegate, cli_status, cli_probe, orch_delegate, orch_history, relay_stats, session_turns, session_history)

Install

One-command install (Windows native)

For Windows users who just want it running, after Python 3.9+ is installed:

irm https://raw.githubusercontent.com/ArkNill/llm-relay/main/scripts/install.ps1 | iex

That script pip installs llm-relay[all], starts the proxy as a Windows background daemon, health-gates it, and only then routes Claude Code through it. Routing is never activated unless the proxy actually responds, so this is safe to run on a machine where Claude Code is already configured -- the worst case is the install aborts with a clear message and leaves your existing setup untouched. See Prerequisites below for the Python requirement and venv guidance.

If you would rather do it by hand (Linux, macOS, or just to see each step), keep reading.

Prerequisites

Python 3.9 or newer (3.12 recommended). We do not bundle a Python runtime; install it once and llm-relay reuses it.
- Windows: winget install Python.Python.3.12 or python.org/downloads
- macOS: brew install python@3.12
- Linux: your distribution's package manager (apt install python3.12, dnf install python3.12, etc.)
(Recommended) A virtual environment. Clean uninstall, no PATH surprises, isolated dependency tree.

1. Set up Python environment

Windows (pip)

python -m venv .venv
.venv\Scripts\activate

Windows (conda)

conda create -n llm-relay python=3.12
conda activate llm-relay

Linux / macOS (pip)

python3 -m venv .venv
source .venv/bin/activate

2. Install llm-relay

# Default (SQLite, zero-config)
pip install llm-relay

# With proxy + web dashboard
pip install llm-relay[proxy]

# With PostgreSQL support (long-term analytics + vector search)
pip install llm-relay[pg]

# With MCP server (Python 3.10+)
pip install llm-relay[mcp]

# Everything
pip install llm-relay[all]

3. Choose database

	SQLite (default)	PostgreSQL
Setup	Zero-config	Requires PG server
Best for	Getting started, light usage	Long-term data analytics, vector search
Install	`pip install llm-relay`	`pip install llm-relay[pg]`
Config	(none needed)	`LLM_RELAY_DB=postgresql://user:pass@host/db`

4. Initialize

llm-relay init

Quick Start

One-command setup

llm-relay init              # Auto-detect CLIs, configure proxy, start server

CLI commands

llm-relay scan              # Session health check (7 detectors)
llm-relay doctor            # Configuration health check (7 checks)
llm-relay recover           # Extract session context for resumption
llm-relay serve             # Start proxy server + web dashboard
llm-relay top               # Live terminal monitor (btop-style)
llm-relay service install   # Windows: background service + auto-start (no console window)
llm-relay service stop      # Windows: stop background service
llm-relay service uninstall # Windows: remove service + cleanup

Web dashboard

# Native (Linux/macOS/Windows)
llm-relay serve --port 8080

Then open:

/dashboard/ — CLI status, cost, delegation history, Turn Monitor (alive sessions only; ?include_dead=1 to bypass)
/display/ — Turn counter with CC/Codex/Gemini session cards (alive filter: CC via cc_pid+TTY fallback, Codex/Gemini via fd-open; Windows uses mtime+process detection)
/history/ — Session conversation history browser

MCP server

llm-relay-mcp               # stdio transport, 8 tools

API proxy for Claude Code

# Set in Claude Code
llm-relay connect   # Auto-configures Claude Code proxy

Agent-driven setup

If you would rather have your existing coding agent (Claude Code, Codex, Gemini) run the install for you, point it at docs/AGENT_SETUP.md. It is a structured playbook the agent follows step by step, using llm-relay env-fingerprint and llm-relay verify to probe and check each step without scraping output.

llm-relay env-fingerprint --format json        # state snapshot
llm-relay verify install --format json         # is the package usable?
llm-relay verify config --format json          # is local state set up?
llm-relay verify integration --cli claude-code # is the CLI wired?
llm-relay verify all                            # everything at once

Exit code is 0 on pass/warn, 1 on fail.

CLI Status

CLI	Status
Claude Code	Fully supported
OpenAI Codex	Fully supported
Gemini CLI	Display supported, oauth-personal has known 403 server-side bug (#25425)

Platform Support

Platform	Mode	Notes
Linux	Native	Full feature set, systemd recommended
macOS	Native	Full feature set
Windows	Native	`llm-relay service install` for background daemon (no console window)

Requirements

Python >= 3.9
MCP tools require Python >= 3.10

License

MIT

Ecosystem

Part of the QuartzUnit open-source ecosystem.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.9.6

May 21, 2026

0.9.5

May 21, 2026

0.9.4

May 21, 2026

0.9.1

Apr 29, 2026

0.9.0

Apr 29, 2026

0.8.6 yanked

Apr 28, 2026

Reason this release was yanked: