Skip to main content

Unified LLM usage management — API proxy, session diagnostics, multi-CLI orchestration.

Reason this release was yanked:

Superseded by v0.9.1. Use: pip install llm-relay

Project description

llm-relay

Unified LLM usage management — API proxy, session diagnostics, multi-CLI orchestration.

한국어 | llms.txt

Why

This project started from a need to escape deep vendor lock-in with a single AI coding tool. After investigating hidden behaviors in Claude Code — silent token inflation, false rate limits, context stripping, and opaque feature flags — it became clear that relying on one vendor's black box was a risk. llm-relay was built to take back visibility and control: monitor what's actually happening, diagnose problems independently, and orchestrate across multiple CLI tools (Claude Code, Codex, Gemini) so no single provider becomes a single point of failure.

Features

  • Proxy: Transparent API proxy with cache/token monitoring and 12-strategy pruning
  • Detect: 7 detectors (orphan, stuck, synthetic, bloat, cache, resume, microcompact)
  • Recover: Session recovery and doctor (7 health checks)
  • Guard: 4-tier threshold daemon with dual-zone classification
  • Cost: Per-1% cost calculation and rate-limit header analysis
  • Orch: Multi-CLI orchestration (Claude Code, Codex CLI, Gemini CLI)
  • Display: Multi-CLI session monitor with context composition pie chart, connection type badges (SSH/tmux/tailscale/mosh), and provider liveness detection
  • History: Proxy-level conversation capture with delta/full storage, compaction detection, and web replay viewer
  • Composition: Real-time context window analysis — classifies content into 6 categories (user/assistant/tool_use/tool_result/thinking/system) with SNR metrics and duplicate read tracking
  • Monitoring: Quota utilization (Q5h/Q7d), cache hit rate, error rate (2xx/4xx/5xx/429), TTL tier detection (1h/5m) — all surfaced from data already collected by the proxy
  • TUI: llm-relay top — btop-style terminal monitor with Rich Live (works over SSH, no browser needed)
  • i18n: Browser locale detection with en/ko support; server-side override via LLM_RELAY_LANG
  • MCP: 8 tools via stdio transport (cli_delegate, cli_status, cli_probe, orch_delegate, orch_history, relay_stats, session_turns, session_history)

Quick Start (Docker — recommended)

Runs on Linux, macOS, and Windows with Docker. No Python or pip required on the host.

# 1. Download docker-compose.yml
curl -sL https://raw.githubusercontent.com/ArkNill/llm-relay/main/docker-compose.yml -o docker-compose.yml

# 2. Start the proxy
docker compose up -d

# 3. Open the dashboard
#    http://localhost:8080/dashboard/

To route Claude Code through the proxy, add to ~/.claude/settings.json:

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://localhost:8080"
  }
}

Web pages:

  • /dashboard/ — CLI status, cost, quota, error rate, cache hit rate, Turn Monitor
  • /display/ — Turn counter with context composition, connection type badges
  • /history/ — Session conversation replay with compaction timeline

Clean removal

docker compose down -v    # Stops container + removes data volume
rm docker-compose.yml     # Remove compose file

No files are left on the host. To stop routing Claude Code, remove ANTHROPIC_BASE_URL from ~/.claude/settings.json.

Install (CLI only — optional)

For lightweight diagnostics without the proxy server:

pip install llm-relay          # Core diagnostics
pip install llm-relay[cli]     # With Rich TUI (llm-relay top)
pip install llm-relay[mcp]     # MCP server (Python 3.10+)
llm-relay scan                 # Session health check (7 detectors)
llm-relay doctor               # Configuration health check (7 checks)
llm-relay top                  # Live terminal monitor (btop-style TUI)
llm-relay init                 # Check Docker status + setup guide

MCP server

llm-relay-mcp                  # stdio transport, 8 tools

API Endpoints

All endpoints are served by the proxy at http://localhost:8080/api/v1/.

Endpoint Description
GET /api/v1/turns Turn counts + token metrics + zone classification for active sessions
GET /api/v1/turns/{session_id} Per-session metrics with cache hit rate and TTL tier
GET /api/v1/display Session cards with prompts, terminal info, composition
GET /api/v1/quota Anthropic Q5h/Q7d quota utilization and overage status
GET /api/v1/errors Error rate breakdown (2xx/4xx/5xx/429)
GET /api/v1/cache Cache hit rate (global or per-session)
GET /api/v1/ttl Cache TTL tier detection (1h/5m/mixed)
GET /api/v1/health CLI + proxy + orchestration DB health
GET /api/v1/cost Cost breakdown by model
GET /api/v1/sessions Proxy session summaries
GET /api/v1/cli/status CLI installation and auth status
GET /api/v1/delegations Multi-CLI delegation history
GET /api/v1/delegations/stats Delegation aggregate statistics
GET /api/v1/history Sessions with conversation history
GET /api/v1/history/{session_id} Conversation turns for a session
GET /api/v1/history/{session_id}/compactions Compaction events
GET /api/v1/history/{session_id}/composition Per-turn context composition
GET /api/v1/i18n Locale-specific UI messages

CLI Status

CLI Status
Claude Code Fully supported
OpenAI Codex Fully supported
Gemini CLI Display supported, oauth-personal has known 403 server-side bug (#25425)

Development

For local development without GHCR image:

docker compose -f docker-compose.yml -f docker-compose.dev.yml up -d --build

Requirements

  • Docker (recommended) — Linux, macOS, or Windows with Docker Desktop
  • Python >= 3.9 (CLI diagnostics only)
  • MCP tools require Python >= 3.10

License

MIT

Ecosystem

Part of the QuartzUnit open-source ecosystem.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_relay-0.8.2.tar.gz (175.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_relay-0.8.2-py3-none-any.whl (152.7 kB view details)

Uploaded Python 3

File details

Details for the file llm_relay-0.8.2.tar.gz.

File metadata

  • Download URL: llm_relay-0.8.2.tar.gz
  • Upload date:
  • Size: 175.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llm_relay-0.8.2.tar.gz
Algorithm Hash digest
SHA256 69eaaac4c99a44ed43c3a531f9fddabc6273dc54f1a1b9e8158a7a8536c3409f
MD5 f5ee433eede452dab7c68c81fffd01bb
BLAKE2b-256 c3690ddb04e0c940701f0c0addc9c688b7be969cd78839c437b86ce0e6731de2

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_relay-0.8.2.tar.gz:

Publisher: publish.yml on ArkNill/llm-relay

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llm_relay-0.8.2-py3-none-any.whl.

File metadata

  • Download URL: llm_relay-0.8.2-py3-none-any.whl
  • Upload date:
  • Size: 152.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llm_relay-0.8.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c3a1a1a06602969828c385b97e2c46715fad51595bc1d18d431f0f78a135c9a8
MD5 70008be41992b94459cad7420e45cdca
BLAKE2b-256 00fbbafcf24b9b9205e7e8344f3affa8f3a18ee6452d0a441db6c5d15c101e69

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_relay-0.8.2-py3-none-any.whl:

Publisher: publish.yml on ArkNill/llm-relay

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page