Skip to main content

Relay — production-grade multi-provider LLM client. One YAML, one interface, every model.

Project description

Relay

The fastest, lightest BYOK relay for any and every LLM model — open source.

CI Apache-2.0 Python 3.10+ Sponsor

A Python library that gives you one interface to every major LLM — chat, streaming, tool calls, structured output, batch, MCP — defined in a YAML file you check into your repo. Production-grade, enterprise-ready, OSS.

Faster than LiteLLM at every percentile, with 8.5× faster cold start (benchmarks).

pip install ai5labs-relay
from relay import Hub

async with Hub.from_yaml("models.yaml") as hub:
    resp = await hub.chat(
        "fast-cheap",
        messages=[{"role": "user", "content": "What is 2+2?"}],
    )
    print(resp.text)
    print(resp.cost_usd, resp.cost.source)

Why Relay

LiteLLM LangChain Relay
YAML model catalog
Built-in pricing snapshot with provenance partial
Live pricing (Bedrock, Azure, OpenRouter)
Tool-call streaming deltas keyed by index (not id) bug (#20711) n/a
MCP universal tool layer (any MCP server → any provider)
Cross-provider tool-schema compiler with Mastra-style fallback
Pydantic structured output (compiles per-provider, not text-coerced) partial
Hub-level cache + Anthropic prompt-cache passthrough partial
Circuit breakers with cooldown + half-open probes
OpenTelemetry GenAI semantic conventions (opt-in)
Reasoning budget unification across OpenAI/Anthropic/Gemini
OpenAI Responses API opt-in (alongside Chat Completions)
Batch API wrapper (OpenAI Batch + Anthropic Message Batches, ~50% off)
Native Bedrock / Azure / Gemini / Vertex / Cohere adapters OpenAI-compat shims partial native
PII redaction pipeline (regex + Presidio hooks)
Audit logging (OTel-aligned schema, pluggable sinks) enterprise SKU
Pre/post guardrails (max-input, blocked-keywords, plugin-able) enterprise SKU
Anthropic thinking blocks preserved flattened flattened
Typed errors (rate-limit / context-window / content-policy distinct) partial
OpenTelemetry GenAI semantic conventions ✓ (opt-in)
mypy --strict clean
Apache-2.0 with explicit patent grant MIT MIT

Quickstart

1. Define your models

Create models.yaml:

# yaml-language-server: $schema=https://relay.ai5labs.com/schema/v1.json
version: 1

models:
  fast-cheap:
    target: groq/llama-3.3-70b-versatile
    credential: $env.GROQ_API_KEY

  smart:
    target: anthropic/claude-sonnet-4-5
    credential: $env.ANTHROPIC_API_KEY
    params:
      max_tokens: 4096

  cheap-vision:
    target: openai/gpt-4o-mini
    credential: $env.OPENAI_API_KEY

groups:
  default:
    strategy: fallback
    members: [smart, fast-cheap]    # try smart first, fall back to fast-cheap

Then point your editor at the schema URL on line 1 — the Red Hat YAML extension for VS Code will give you autocomplete and inline validation while editing.

2. Use it

from relay import Hub

async with Hub.from_yaml("models.yaml") as hub:
    # Single model
    resp = await hub.chat("fast-cheap", messages=[
        {"role": "user", "content": "Hello"}
    ])

    # Group with fallback
    resp = await hub.chat("default", messages=[...])

    # Streaming
    async for ev in hub.stream("smart", messages=[...]):
        if ev.type == "text_delta":
            print(ev.text, end="", flush=True)
        elif ev.type == "thinking_delta":     # Anthropic extended thinking
            ...
        elif ev.type == "end":
            print(f"\nDone in {ev.response.latency_ms:.0f}ms, "
                  f"${ev.response.cost_usd:.4f}")

    # Bound handle for hot loops
    model = hub.get("fast-cheap")
    for prompt in prompts:
        resp = await model.chat(messages=[{"role": "user", "content": prompt}])

3. CLI

relay schema --out relay.schema.json     # JSON Schema for editors / docs
relay validate models.yaml               # validate config
relay models list                        # list configured aliases
relay models inspect smart               # show one alias's full config + catalog row
relay models compare sonnet 4o flash     # side-by-side: price, speed, MMLU, GPQA, HumanEval...
relay models recommend --task code --budget cheap --needs tools  # which model for the job?
relay catalog list --provider anthropic  # browse the built-in catalog
relay providers                          # list all supported providers

Supported providers

OpenAI-compatible (one adapter): OpenAI, Groq, Together, DeepSeek, xAI, Mistral, Fireworks, Perplexity, OpenRouter, Ollama, vLLM, LM Studio.

Native (proper, lossless adapters): Anthropic.

Scaffolded for v0.2: Azure OpenAI, Google Gemini direct, Vertex AI, AWS Bedrock, Cohere.

Pricing & cost tracking

Every response carries a Cost object with full provenance:

resp.cost.total_usd        # 0.00234
resp.cost.source           # "live_api" | "snapshot" | "user_override" | "estimated" | "unknown"
resp.cost.confidence       # "exact" | "list_price" | "estimated" | "unknown"
resp.cost.fetched_at       # ISO 8601 timestamp (when fetched live)

Tier order (first match wins):

  1. User override — explicit cost: block on a model entry, or a pricing_profile.
  2. Live APIs (cached 6h in-process):
    • AWS Pricing API for Bedrock
    • Azure Retail Prices API for Azure OpenAI
    • OpenRouter /api/v1/models for ~400 models from OpenAI, Anthropic, Google, Groq, etc. at list price
  3. Snapshot — JSON shipped with each release, regenerated weekly via CI.
  4. Unknowncost_usd = None, never wrong-by-default.

Negotiated rates

No public API exposes enterprise discounts (AWS EDP, Azure committed-use, OpenAI custom tiers). Configure them yourself:

pricing_profiles:
  acme-aws-prod:
    description: "15% EDP discount"
    input_multiplier: 0.85
    output_multiplier: 0.85

  openai-team-tier:
    fixed_overrides:
      openai/gpt-4o:
        input_per_1m: 1.25
        output_per_1m: 5.00

models:
  bedrock-sonnet:
    target: bedrock/anthropic.claude-sonnet-4-5-20250929-v1:0
    region: us-east-1
    credential: { type: aws_profile, profile: prod }
    pricing_profile: acme-aws-prod

Production-grade design

  • Connection pooling: one httpx.AsyncClient per (provider, base_url), HTTP/2 enabled, keep-alive tuned for streaming workloads.
  • Lazy SDK imports: boto3 and other heavy deps only load when their first call happens.
  • Streaming hot path uses orjson and dicts — no Pydantic validation per-token. Pydantic only runs on the final assembled response.
  • Tool-call delta merging keyed by index, not id. (LiteLLM keys by id and drops ~90% of argument deltas — issue #20711.)
  • Provider-specific blocks preserved: Anthropic thinking, Gemini grounding, citations — emitted as typed events, not flattened.
  • Classified errors: RateLimitError, ContextWindowError, ContentPolicyError, AuthenticationError are distinct types — fall back vs retry vs fail-fast can be decided automatically.
  • OpenTelemetry GenAI semantic conventions (opt-in): emits gen_ai.* spans + metrics that Datadog, Honeycomb, Langfuse, and Arize all consume.

Security

  • Keys never inline in YAML — credentials are reified objects (env var, AWS Secrets Manager, GCP Secret Manager, Vault).
  • Library, not a hosted proxy by default. Your API keys stay in your process. (Compare: the LiteLLM proxy PyPI compromise of March 2026 leaked keys from every centralized deployment.)
  • Releases will be Sigstore-signed via OIDC Trusted Publishing.
  • See SECURITY.md for vulnerability reporting.

Status

v0.1 (alpha) — chat + streaming + tool calls + cost tracking + the OpenAI-compatible adapter (12 providers) + native Anthropic. Bedrock / Azure / Vertex / direct Gemini / Cohere are scaffolded and slated for v0.2.

API surface is stable; everything under _internal/ and _* modules is not.

Development

uv sync --all-groups
uv run pytest
uv run ruff check
uv run mypy
uv run pyright

Contributing

See CONTRIBUTING.md. Please read CODE_OF_CONDUCT.md before opening a PR.

Support & sponsorship

Relay is free, Apache-2.0, and actively maintained by ai5labs Research OPC Pvt Ltd. If your team uses it in production, please consider:

See SUPPORT.md for full details.

Sponsors

This space is for our sponsors — be the first.

Design partners

This space is for our design partners — currently accepting our first cohort.

License

Apache-2.0. See LICENSE. Copyright © 2026 ai5labs Research OPC Pvt Ltd.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai5labs_relay-0.1.0.tar.gz (96.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai5labs_relay-0.1.0-py3-none-any.whl (122.0 kB view details)

Uploaded Python 3

File details

Details for the file ai5labs_relay-0.1.0.tar.gz.

File metadata

  • Download URL: ai5labs_relay-0.1.0.tar.gz
  • Upload date:
  • Size: 96.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai5labs_relay-0.1.0.tar.gz
Algorithm Hash digest
SHA256 efa28e347e0f585e5df37b942561797861288327d76cdeff9aec3d0c9c4e87f0
MD5 36da0bb56d17da812d418910d54cc777
BLAKE2b-256 c7134461918bf688d1b1da85c960a495d75431cc3cb3e72c801afa6842f845e8

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai5labs_relay-0.1.0.tar.gz:

Publisher: release.yml on ai5labs/relay-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ai5labs_relay-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ai5labs_relay-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 122.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai5labs_relay-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a5fce7ca3eb001310c5c07307b492e2cbaa411979597c94d95688c80dd483dbd
MD5 5729221dbf3309ff3f8598718830d017
BLAKE2b-256 b176e5cc130c093a692d3228ea6046f1c653c82b3bd1b9510c027cfeda621efb

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai5labs_relay-0.1.0-py3-none-any.whl:

Publisher: release.yml on ai5labs/relay-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page