Unified LLM usage management — API proxy, session diagnostics, multi-CLI orchestration.
Reason this release was yanked:
Superseded by v0.9.1. Use: pip install llm-relay
Project description
llm-relay
Unified LLM usage management — API proxy, session diagnostics, multi-CLI orchestration.
Why
This project started from a need to escape deep vendor lock-in with a single AI coding tool. After investigating hidden behaviors in Claude Code — silent token inflation, false rate limits, context stripping, and opaque feature flags — it became clear that relying on one vendor's black box was a risk. llm-relay was built to take back visibility and control: monitor what's actually happening, diagnose problems independently, and orchestrate across multiple CLI tools (Claude Code, Codex, Gemini) so no single provider becomes a single point of failure.
Features
- Proxy: Transparent API proxy with cache/token monitoring and 12-strategy pruning
- Detect: 7 detectors (orphan, stuck, synthetic, bloat, cache, resume, microcompact)
- Recover: Session recovery and doctor (7 health checks)
- Guard: 4-tier threshold daemon with dual-zone classification
- Cost: Per-1% cost calculation and rate-limit header analysis
- Orch: Multi-CLI orchestration (Claude Code, Codex CLI, Gemini CLI)
- Display: Multi-CLI session monitor with context composition pie chart, connection type badges (SSH/tmux/tailscale/mosh), and provider liveness detection
- History: Proxy-level conversation capture with delta/full storage, compaction detection, and web replay viewer
- Composition: Real-time context window analysis — classifies content into 6 categories (user/assistant/tool_use/tool_result/thinking/system) with SNR metrics and duplicate read tracking
- Monitoring: Quota utilization (Q5h/Q7d), cache hit rate, error rate (2xx/4xx/5xx/429), TTL tier detection (1h/5m) — all surfaced from data already collected by the proxy
- TUI:
llm-relay top— btop-style terminal monitor with Rich Live (works over SSH, no browser needed) - i18n: Browser locale detection with en/ko support; server-side override via
LLM_RELAY_LANG - MCP: 8 tools via stdio transport (cli_delegate, cli_status, cli_probe, orch_delegate, orch_history, relay_stats, session_turns, session_history)
Quick Start (Docker — recommended)
Runs on Linux, macOS, and Windows with Docker. No Python or pip required on the host.
# 1. Download docker-compose.yml
curl -sL https://raw.githubusercontent.com/ArkNill/llm-relay/main/docker-compose.yml \
-o docker-compose.yml
# 2. Start the proxy
docker compose up -d
# 3. Open the dashboard
# http://localhost:8080/dashboard/
To route Claude Code through the proxy, add to ~/.claude/settings.json:
{
"env": {
"ANTHROPIC_BASE_URL": "http://localhost:8080"
}
}
Web pages:
/dashboard/— CLI status, cost, quota, error rate, cache hit rate, Turn Monitor/display/— Turn counter with context composition, connection type badges/history/— Session conversation replay with compaction timeline
Clean removal
docker compose down -v # Stops container + removes data volume
rm docker-compose.yml # Remove compose file
No files are left on the host. To stop routing Claude Code, remove ANTHROPIC_BASE_URL from ~/.claude/settings.json.
Install (CLI only — optional)
For lightweight diagnostics without the proxy server:
pip install llm-relay # Core diagnostics
pip install llm-relay[cli] # With Rich TUI (llm-relay top)
pip install llm-relay[mcp] # MCP server (Python 3.10+)
llm-relay scan # Session health check (7 detectors)
llm-relay doctor # Configuration health check (7 checks)
llm-relay top # Live terminal monitor (btop-style TUI)
llm-relay init # Check Docker status + setup guide
MCP server
llm-relay-mcp # stdio transport, 8 tools
API Endpoints
All endpoints are served by the proxy at http://localhost:8080/api/v1/.
| Endpoint | Description |
|---|---|
GET /api/v1/turns |
Turn counts + token metrics + zone classification for active sessions |
GET /api/v1/turns/{session_id} |
Per-session metrics with cache hit rate and TTL tier |
GET /api/v1/display |
Session cards with prompts, terminal info, composition |
GET /api/v1/quota |
Anthropic Q5h/Q7d quota utilization and overage status |
GET /api/v1/errors |
Error rate breakdown (2xx/4xx/5xx/429) |
GET /api/v1/cache |
Cache hit rate (global or per-session) |
GET /api/v1/ttl |
Cache TTL tier detection (1h/5m/mixed) |
GET /api/v1/health |
CLI + proxy + orchestration DB health |
GET /api/v1/cost |
Cost breakdown by model |
GET /api/v1/sessions |
Proxy session summaries |
GET /api/v1/cli/status |
CLI installation and auth status |
GET /api/v1/delegations |
Multi-CLI delegation history |
GET /api/v1/delegations/stats |
Delegation aggregate statistics |
GET /api/v1/history |
Sessions with conversation history |
GET /api/v1/history/{session_id} |
Conversation turns for a session |
GET /api/v1/history/{session_id}/compactions |
Compaction events |
GET /api/v1/history/{session_id}/composition |
Per-turn context composition |
GET /api/v1/i18n |
Locale-specific UI messages |
CLI Status
| CLI | Status |
|---|---|
| Claude Code | Fully supported |
| OpenAI Codex | Fully supported |
| Gemini CLI | Display supported, oauth-personal has known 403 server-side bug (#25425) |
Development
For local development without GHCR image:
docker compose -f docker-compose.yml -f docker-compose.dev.yml up -d --build
Requirements
- Docker (recommended) — Linux, macOS, or Windows with Docker Desktop
- Python >= 3.9 (CLI diagnostics only)
- MCP tools require Python >= 3.10
License
MIT
Ecosystem
Part of the QuartzUnit open-source ecosystem.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_relay-0.8.0.tar.gz.
File metadata
- Download URL: llm_relay-0.8.0.tar.gz
- Upload date:
- Size: 174.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c0e8fe05e3a6063b85de952251f4db79353d9cfa4cfef13e23c952c4d2e9c9c
|
|
| MD5 |
8fb217ad6f7a8e17efb2d0ca6b01ea7d
|
|
| BLAKE2b-256 |
e2b6d5485a771c2a1c7b4222eccf822ca8b781bbbf0b5dfdb10d685c146b3eaf
|
Provenance
The following attestation bundles were made for llm_relay-0.8.0.tar.gz:
Publisher:
publish.yml on ArkNill/llm-relay
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llm_relay-0.8.0.tar.gz -
Subject digest:
7c0e8fe05e3a6063b85de952251f4db79353d9cfa4cfef13e23c952c4d2e9c9c - Sigstore transparency entry: 1395969115
- Sigstore integration time:
-
Permalink:
ArkNill/llm-relay@24855dda7eee6501073f1100379e65e9904bff31 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ArkNill
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@24855dda7eee6501073f1100379e65e9904bff31 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file llm_relay-0.8.0-py3-none-any.whl.
File metadata
- Download URL: llm_relay-0.8.0-py3-none-any.whl
- Upload date:
- Size: 151.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e03809f0888fe5805280f7159344a570d15f05562f0da1dbd9b26284d11bd302
|
|
| MD5 |
6129e666d34ad0c12095f03c36b08a17
|
|
| BLAKE2b-256 |
0f0ab13c7044a92dbe474219ab99b772940beaf04f4f47593536a47f6b50d3b7
|
Provenance
The following attestation bundles were made for llm_relay-0.8.0-py3-none-any.whl:
Publisher:
publish.yml on ArkNill/llm-relay
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llm_relay-0.8.0-py3-none-any.whl -
Subject digest:
e03809f0888fe5805280f7159344a570d15f05562f0da1dbd9b26284d11bd302 - Sigstore transparency entry: 1395969147
- Sigstore integration time:
-
Permalink:
ArkNill/llm-relay@24855dda7eee6501073f1100379e65e9904bff31 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ArkNill
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@24855dda7eee6501073f1100379e65e9904bff31 -
Trigger Event:
workflow_dispatch
-
Statement type: