Unified LLM usage management — API proxy, session diagnostics, multi-CLI orchestration.
Project description
llm-relay
Unified LLM usage management — API proxy, session diagnostics, multi-CLI orchestration.
Features
- Proxy: Transparent API proxy with cache/token monitoring and 12-strategy pruning
- Detect: 7 detectors (orphan, stuck, bloat, synthetic, cache, resume, microcompact)
- Recover: Session recovery and doctor (7 health checks)
- Guard: 4-tier threshold daemon with dual-zone classification
- Cost: Per-1% cost calculation and rate-limit header analysis
- Orch: Multi-CLI orchestration (Claude Code, Codex CLI, Gemini CLI)
- Display: Multi-CLI session monitor with provider badges and liveness detection
- I18n: Multi-language support (English, Korean) with browser auto-detection and
LLM_RELAY_LANGenv - MCP: 8 tools via stdio transport (cli_delegate, cli_status, cli_probe, orch_delegate, orch_history, relay_stats, session_turns, session_history)
Install
1. Set up Python environment
Windows (pip)
python -m venv .venv
.venv\Scripts\activate
Windows (conda)
conda create -n llm-relay python=3.12
conda activate llm-relay
Linux / macOS (pip)
python3 -m venv .venv
source .venv/bin/activate
2. Install llm-relay
# Default (SQLite, zero-config)
pip install llm-relay
# With proxy + web dashboard
pip install llm-relay[proxy]
# With PostgreSQL support (long-term analytics + vector search)
pip install llm-relay[pg]
# With MCP server (Python 3.10+)
pip install llm-relay[mcp]
# Everything
pip install llm-relay[all]
3. Choose database
| SQLite (default) | PostgreSQL | |
|---|---|---|
| Setup | Zero-config | Requires PG server |
| Best for | Getting started, light usage | Long-term data analytics, vector search |
| Install | pip install llm-relay |
pip install llm-relay[pg] |
| Config | (none needed) | LLM_RELAY_DB=postgresql://user:pass@host/db |
4. Initialize
llm-relay init
Quick Start
One-command setup
llm-relay init # Auto-detect CLIs, configure proxy, start server
CLI commands
llm-relay scan # Session health check (7 detectors)
llm-relay doctor # Configuration health check (7 checks)
llm-relay recover # Extract session context for resumption
llm-relay serve # Start proxy server + web dashboard
llm-relay top # Live terminal monitor (btop-style)
llm-relay service install # Windows: background service + auto-start (no console window)
llm-relay service stop # Windows: stop background service
llm-relay service uninstall # Windows: remove service + cleanup
Web dashboard
# Native (Linux/macOS/Windows)
llm-relay serve --port 8080
Then open:
/dashboard/— CLI status, cost, delegation history, Turn Monitor (alive sessions only;?include_dead=1to bypass)/display/— Turn counter with CC/Codex/Gemini session cards (alive filter: CC via cc_pid+TTY fallback, Codex/Gemini via fd-open; Windows uses mtime+process detection)/history/— Session conversation history browser
MCP server
llm-relay-mcp # stdio transport, 8 tools
API proxy for Claude Code
# Set in Claude Code
llm-relay connect # Auto-configures Claude Code proxy
Agent-driven setup
If you would rather have your existing coding agent (Claude Code, Codex,
Gemini) run the install for you, point it at
docs/AGENT_SETUP.md. It is a structured playbook
the agent follows step by step, using llm-relay env-fingerprint and
llm-relay verify to probe and check each step without scraping output.
llm-relay env-fingerprint --format json # state snapshot
llm-relay verify install --format json # is the package usable?
llm-relay verify config --format json # is local state set up?
llm-relay verify integration --cli claude-code # is the CLI wired?
llm-relay verify all # everything at once
Exit code is 0 on pass/warn, 1 on fail.
CLI Status
| CLI | Status |
|---|---|
| Claude Code | Fully supported |
| OpenAI Codex | Fully supported |
| Gemini CLI | Display supported, oauth-personal has known 403 server-side bug (#25425) |
Platform Support
| Platform | Mode | Notes |
|---|---|---|
| Linux | Native | Full feature set, systemd recommended |
| macOS | Native | Full feature set |
| Windows | Native | llm-relay service install for background daemon (no console window) |
Requirements
- Python >= 3.9
- MCP tools require Python >= 3.10
License
MIT
Ecosystem
Part of the QuartzUnit open-source ecosystem.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_relay-0.9.4.tar.gz.
File metadata
- Download URL: llm_relay-0.9.4.tar.gz
- Upload date:
- Size: 265.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51e2e0846c3389f66ed66b1676adddcedc69ddec1e22d247a1cc1417512492a0
|
|
| MD5 |
09bf8c6655af47605e3a3b08d14d7979
|
|
| BLAKE2b-256 |
05d72fff6b6c0c878ebdb9933107f40f0d188cff5ff6e8f0a6183a752a34f1cc
|
Provenance
The following attestation bundles were made for llm_relay-0.9.4.tar.gz:
Publisher:
publish.yml on ArkNill/llm-relay
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llm_relay-0.9.4.tar.gz -
Subject digest:
51e2e0846c3389f66ed66b1676adddcedc69ddec1e22d247a1cc1417512492a0 - Sigstore transparency entry: 1588475284
- Sigstore integration time:
-
Permalink:
ArkNill/llm-relay@985a2c0d3b96e355ba8208abd67288bfcb3ef145 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ArkNill
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@985a2c0d3b96e355ba8208abd67288bfcb3ef145 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file llm_relay-0.9.4-py3-none-any.whl.
File metadata
- Download URL: llm_relay-0.9.4-py3-none-any.whl
- Upload date:
- Size: 199.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9dc721bdf97edfe4aacd4911ac2e875ae978f6c0141a836afc59e55dfe1b5b81
|
|
| MD5 |
e5a3d3de91768260a160c78a609c5fc3
|
|
| BLAKE2b-256 |
8605b73d8abc5db9f7054cb82f82c9e76f543ff684fc37dd00e9526e80f16952
|
Provenance
The following attestation bundles were made for llm_relay-0.9.4-py3-none-any.whl:
Publisher:
publish.yml on ArkNill/llm-relay
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llm_relay-0.9.4-py3-none-any.whl -
Subject digest:
9dc721bdf97edfe4aacd4911ac2e875ae978f6c0141a836afc59e55dfe1b5b81 - Sigstore transparency entry: 1588475320
- Sigstore integration time:
-
Permalink:
ArkNill/llm-relay@985a2c0d3b96e355ba8208abd67288bfcb3ef145 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ArkNill
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@985a2c0d3b96e355ba8208abd67288bfcb3ef145 -
Trigger Event:
workflow_dispatch
-
Statement type: