Relay — production-grade multi-provider LLM client. One YAML, one interface, every model.

These details have not been verified by PyPI

Project description

Relay

The fastest, lightest BYOK relay for any and every LLM model — open source.

A Python library that gives you one interface to every major LLM — chat, streaming, tool calls, structured output, batch, MCP — defined in a YAML file you check into your repo. Production-grade, enterprise-ready, OSS.

~5–19× faster cold start than LiteLLM, ~20% faster streaming TTFT, and tied at the median on chat overhead with more consistent tails (reproducible benchmarks).

pip install ai5labs-relay

from relay import Hub

async with Hub.from_yaml("models.yaml") as hub:
    resp = await hub.chat(
        "fast-cheap",
        messages=[{"role": "user", "content": "What is 2+2?"}],
    )
    print(resp.text)
    print(resp.cost_usd, resp.cost.source)

Why Relay

	LiteLLM	LangChain	Relay
YAML model catalog	✓	—	✓
Built-in pricing snapshot with provenance	partial	—	✓
Live pricing (Bedrock, Azure, OpenRouter)	—	—	✓
Tool-call streaming deltas keyed by `index` (not `id`)	bug (#20711)	n/a	✓
MCP universal tool layer (any MCP server → any provider)	—	—	✓
Cross-provider tool-schema compiler with Mastra-style fallback	—	—	✓
Pydantic structured output (compiles per-provider, not text-coerced)	—	partial	✓
Hub-level cache + Anthropic prompt-cache passthrough	partial	—	✓
Circuit breakers with cooldown + half-open probes	—	—	✓
OpenTelemetry GenAI semantic conventions (opt-in)	—	—	✓
Reasoning budget unification across OpenAI/Anthropic/Gemini	—	—	✓
OpenAI Responses API opt-in (alongside Chat Completions)	—	—	✓
Batch API wrapper (OpenAI Batch + Anthropic Message Batches, ~50% off)	—	—	✓
Native Bedrock / Azure / Gemini / Vertex / Cohere adapters	OpenAI-compat shims	partial	✓ native
PII redaction pipeline (regex + Presidio hooks)	—	—	✓
Audit logging (OTel-aligned schema, pluggable sinks)	enterprise SKU	—	✓
Pre/post guardrails (max-input, blocked-keywords, plugin-able)	enterprise SKU	—	✓
Anthropic `thinking` blocks preserved	flattened	flattened	✓
Typed errors (rate-limit / context-window / content-policy distinct)	partial	—	✓
`mypy --strict` (3 codes opted-out, see `pyproject.toml`)	—	—	✓
Apache-2.0 with explicit patent grant	MIT	MIT	✓

Quickstart

1. Define your models

Create models.yaml:

# yaml-language-server: $schema=./relay.schema.json
# (generate the schema file once with: `relay schema --out relay.schema.json`)
version: 1

models:
  fast-cheap:
    target: groq/llama-3.3-70b-versatile
    credential: $env.GROQ_API_KEY

  smart:
    target: anthropic/claude-sonnet-4-5
    credential: $env.ANTHROPIC_API_KEY
    params:
      max_tokens: 4096

  cheap-vision:
    target: openai/gpt-4o-mini
    credential: $env.OPENAI_API_KEY

groups:
  default:
    strategy: fallback
    members: [smart, fast-cheap]    # try smart first, fall back to fast-cheap

Then point your editor at the schema URL on line 1 — the Red Hat YAML extension for VS Code will give you autocomplete and inline validation while editing.

2. Use it

from relay import Hub

async with Hub.from_yaml("models.yaml") as hub:
    # Single model
    resp = await hub.chat("fast-cheap", messages=[
        {"role": "user", "content": "Hello"}
    ])

    # Group with fallback
    resp = await hub.chat("default", messages=[...])

    # Streaming
    async for ev in hub.stream("smart", messages=[...]):
        if ev.type == "text_delta":
            print(ev.text, end="", flush=True)
        elif ev.type == "thinking_delta":     # Anthropic extended thinking
            ...
        elif ev.type == "end":
            print(f"\nDone in {ev.response.latency_ms:.0f}ms, "
                  f"${ev.response.cost_usd:.4f}")

    # Bound handle for hot loops
    model = hub.get("fast-cheap")
    for prompt in prompts:
        resp = await model.chat(messages=[{"role": "user", "content": prompt}])

3. CLI

relay schema --out relay.schema.json     # JSON Schema for editors / docs
relay validate models.yaml               # validate config
relay models list                        # list configured aliases
relay models inspect smart               # show one alias's full config + catalog row
relay models compare sonnet 4o flash     # side-by-side: price, speed, MMLU, GPQA, HumanEval...
relay models recommend --task code --budget cheap --needs tools  # which model for the job?
relay catalog list --provider anthropic  # browse the built-in catalog
relay providers                          # list all supported providers

Supported providers

OpenAI-compatible (one adapter): OpenAI, Groq, Together, DeepSeek, xAI, Mistral, Fireworks, Perplexity, OpenRouter, Ollama, vLLM, LM Studio.

Native (proper, lossless adapters): Anthropic, Azure OpenAI, AWS Bedrock, Cohere, Google Gemini direct, Vertex AI.

Routing

relay.routing is the public extension point for picking a model per call. Two implementations ship with v0.2:

RuleBasedRouter — deterministic, constraint-driven, in-process. Same scoring logic as relay models recommend, free.
SemanticRouter — HTTP client for the hosted semantic router (paid, optional). Wire protocol documented in docs/routing/api-spec.md.

Attach a router and call chat_routed instead of chat — Relay picks the alias, falls back through alternates on error, and stamps the decision onto response.metadata["routing"]. Custom routers satisfying the Router Protocol are accepted. See docs/routing/usage.md for examples.

Pricing & cost tracking

Every response carries a Cost object with full provenance:

resp.cost.total_usd        # 0.00234
resp.cost.source           # "live_api" | "snapshot" | "user_override" | "estimated" | "unknown"
resp.cost.confidence       # "exact" | "list_price" | "estimated" | "unknown"
resp.cost.fetched_at       # ISO 8601 timestamp (when fetched live)

Tier order (first match wins):

User override — explicit cost: block on a model entry, or a pricing_profile.
Live APIs (cached 6h in-process):
- AWS Pricing API for Bedrock
- Azure Retail Prices API for Azure OpenAI
- OpenRouter /api/v1/models for ~400 models from OpenAI, Anthropic, Google, Groq, etc. at list price
Snapshot — JSON shipped with each release, regenerated weekly via CI.
Unknown — cost_usd = None, never wrong-by-default.

Negotiated rates

No public API exposes enterprise discounts (AWS EDP, Azure committed-use, OpenAI custom tiers). Configure them yourself:

pricing_profiles:
  acme-aws-prod:
    description: "15% EDP discount"
    input_multiplier: 0.85
    output_multiplier: 0.85

  openai-team-tier:
    fixed_overrides:
      openai/gpt-4o:
        input_per_1m: 1.25
        output_per_1m: 5.00

models:
  bedrock-sonnet:
    target: bedrock/anthropic.claude-sonnet-4-5-20250929-v1:0
    region: us-east-1
    credential: { type: aws_profile, profile: prod }
    pricing_profile: acme-aws-prod

Production-grade design

Connection pooling: one httpx.AsyncClient per (provider, base_url), HTTP/2 enabled, keep-alive tuned for streaming workloads.
Lazy SDK imports: boto3 and other heavy deps only load when their first call happens.
Streaming hot path uses orjson and dicts — no Pydantic validation per-token. Pydantic only runs on the final assembled response.
Tool-call delta merging keyed by index, not id. (LiteLLM keys by id and drops ~90% of argument deltas — issue #20711.)
Provider-specific blocks preserved: Anthropic thinking, Gemini grounding, citations — emitted as typed events, not flattened.
Classified errors: RateLimitError, ContextWindowError, ContentPolicyError, AuthenticationError are distinct types — fall back vs retry vs fail-fast can be decided automatically.
OpenTelemetry GenAI semantic conventions (opt-in): emits gen_ai.* spans + metrics that Datadog, Honeycomb, Langfuse, and Arize all consume.

Security

Keys never inline in YAML — credentials are reified objects (env var, AWS Secrets Manager, GCP Secret Manager, Vault).
Library, not a hosted proxy by default. Your API keys stay in your process. (Compare: the LiteLLM proxy PyPI compromise of March 2026 leaked keys from every centralized deployment.)
Releases will be Sigstore-signed via OIDC Trusted Publishing.
See SECURITY.md for vulnerability reporting.

Status

v0.2.2 (alpha) — chat, streaming, tool calls, structured output, batch (OpenAI Batch + Anthropic Message Batches), MCP, Hub-level cache + provider-cache passthrough, PII redaction, audit logging, pre/post guardrails, OpenTelemetry GenAI semantic conventions, cost tracking with live pricing, 12 OpenAI-compatible providers + 6 native adapters (Anthropic, Azure OpenAI, AWS Bedrock, Cohere, Google Gemini direct, Vertex AI), plus opt-in OpenAI Responses API.

API surface is stable; everything under _internal/ and _* modules is not.

Development

uv sync --all-groups
uv run pytest
uv run ruff check
uv run mypy
uv run pyright

Contributing

See CONTRIBUTING.md. Please read CODE_OF_CONDUCT.md before opening a PR.

Support

Relay is free, Apache-2.0, and actively maintained by ai5labs Research OPC Pvt Ltd. If your team uses it in production, please consider:

⭐ Star the repo — actually helps a lot at this stage
🤝 Become a design partner — direct line to maintainers, roadmap influence, free for the program duration
🏢 Enterprise support (planned for v0.3, Q3 2026) — SLAs, custom features, VPC deployment, SOC 2, BAA/DPA on the roadmap. Email engineering@ai5labs.com to be a design partner.

See SUPPORT.md for full details.

License

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2.0.0 yanked

Jun 8, 2026

1.0.0 yanked

Jun 7, 2026

0.4.0 yanked

Jun 7, 2026

0.3.0 yanked

Jun 6, 2026

This version

0.2.3 yanked

May 24, 2026

0.2.2 yanked

May 16, 2026

0.2.1 yanked

May 16, 2026

0.2.0 yanked

May 13, 2026

0.1.0 yanked

May 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai5labs_relay-0.2.3.tar.gz (126.2 kB view details)

Uploaded May 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai5labs_relay-0.2.3-py3-none-any.whl (155.6 kB view details)

Uploaded May 24, 2026 Python 3

File details

Details for the file ai5labs_relay-0.2.3.tar.gz.

File metadata

Download URL: ai5labs_relay-0.2.3.tar.gz
Upload date: May 24, 2026
Size: 126.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai5labs_relay-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`54326486d7c82f71c36758395db7a032e9e8553f31231d28f63d06f1fbc67881`
MD5	`c476d0dcaa9b6187be41e306a353387c`
BLAKE2b-256	`a525fba68df879f928ab82e37410fb1e0197601250b542bd0351fe625337f714`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai5labs_relay-0.2.3.tar.gz:

Publisher: release.yml on ai5labs/relay-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai5labs_relay-0.2.3.tar.gz
- Subject digest: 54326486d7c82f71c36758395db7a032e9e8553f31231d28f63d06f1fbc67881
- Sigstore transparency entry: 1625592564
- Sigstore integration time: May 24, 2026
Source repository:
- Permalink: ai5labs/relay-llm@f7a493fc632e964807a6e5637b27cb1e297d408c
- Branch / Tag: refs/tags/v0.2.3
- Owner: https://github.com/ai5labs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@f7a493fc632e964807a6e5637b27cb1e297d408c
- Trigger Event: push

File details

Details for the file ai5labs_relay-0.2.3-py3-none-any.whl.

File metadata

Download URL: ai5labs_relay-0.2.3-py3-none-any.whl
Upload date: May 24, 2026
Size: 155.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai5labs_relay-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d13dcbe24bd671aff60547a821ce2980725c537b471b72255bed9540e7b35a5f`
MD5	`288fb3e2eda48c5cd13eefe69f83de02`
BLAKE2b-256	`e685fcafd7392d7be91ae94d5a7317aa2fa9b19f560637c9ff8ead2587811ef5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai5labs_relay-0.2.3-py3-none-any.whl:

Publisher: release.yml on ai5labs/relay-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai5labs_relay-0.2.3-py3-none-any.whl
- Subject digest: d13dcbe24bd671aff60547a821ce2980725c537b471b72255bed9540e7b35a5f
- Sigstore transparency entry: 1625592590
- Sigstore integration time: May 24, 2026
Source repository:
- Permalink: ai5labs/relay-llm@f7a493fc632e964807a6e5637b27cb1e297d408c
- Branch / Tag: refs/tags/v0.2.3
- Owner: https://github.com/ai5labs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@f7a493fc632e964807a6e5637b27cb1e297d408c
- Trigger Event: push

ai5labs-relay 0.2.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Relay

Why Relay

Quickstart

1. Define your models

2. Use it

3. CLI

Supported providers

Routing

Pricing & cost tracking

Negotiated rates

Production-grade design

Security

Status

Development

Contributing

Support

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance