Skip to main content

Composed reliability for multi-model LLM calls — quorum fan-out + primary/failover, category-dispatched, transparent degradation. Built on keel-llm-protocol + keel-circuit-breaker.

Project description

keel-llm-reliability

Production-grade reliability for multi-model LLM calls — quorum fan-out and primary/failover, category-dispatched, with transparent degradation. The composed solution, not a parts bin.

Part of the Keel toolkit. Composes keel-llm-protocol (the error taxonomy) + keel-circuit-breaker into the consumer-side machinery that acts on typed errors — so you don't hand-write the fan-out/failover loop.

Why it exists

A typed error taxonomy tells you what failed; this tells your app what to do about it. The core lesson (measured in production): a rate-limited model is healthy, not failing — defer it, don't trip its circuit. Acting on that one distinction moved a throttled model from 3/10 to 10/10 availability. This package generalizes it into two strategies and makes every decision visible.

Install

# this package + at least one adapter for the providers you call:
pip install keel-llm-reliability keel-llm-adapter-openai
#   keel-llm-adapter-anthropic / keel-llm-adapter-google also available;
#   reliability itself pulls in keel-llm-protocol + keel-circuit-breaker.

Quickstart (copy-paste runnable)

import asyncio
from keel_llm_reliability import ResilientClient, Request
from keel_llm_adapter_openai import OpenAIAdapter
from keel_llm_protocol import user

# Adapters are plain objects implementing keel-llm-protocol. Any OpenAI-compatible
# endpoint works (OpenAI, Groq, OpenRouter, Mistral, vLLM, Ollama, …); mix providers freely.
primary  = OpenAIAdapter(model="llama-3.3-70b-versatile", api_key="gsk_…",
                         base_url="https://api.groq.com/openai/v1", provider="groq")
fallback = OpenAIAdapter(model="llama-3.1-8b", base_url="http://localhost:11434/v1",
                         provider="local")

client = ResilientClient([primary, fallback])     # ordered: primary, then fallbacks

async def main() -> None:
    result = await client.failover(Request(messages=[user("One-line summary of TCP.")]))
    if result.succeeded:
        print(result.response.text)
    for a in result.attempts:                     # every decision is visible data
        print(a.model_key, a.outcome, f"{a.latency_ms}ms")

asyncio.run(main())

Two strategies

from keel_llm_reliability import ResilientClient, Request
from keel_llm_protocol import user

client = ResilientClient([primary, fallback])      # adapters built as above
req = Request(messages=[user("Summarize this in one line.")])

# Primary + ordered failover — the single-good-answer case (most apps):
result = await client.failover(req)
if result.succeeded:
    print(result.response.text)

# Quorum / parallel fan-out — the ensemble case:
result = await client.fan_out(req)
for r in result.successes:        # every model that answered
    ...

Both are also available as plain functions (fan_out, failover) if you'd rather wire collaborators yourself.

Transparent degradation — every decision is data

No silent retries, no hidden fallbacks. Every provider interaction is a visible Attempt:

result = await client.failover(req)
for a in result.attempts:
    print(a.model_key, a.outcome, a.latency_ms, a.error and a.error.category)
# groq:…     deferred_backpressure  120   backpressure   (throttled — skipped, NOT failed)
# gemini:…   failed                 310   transient      (5xx — counted, failed over)
# openai:…   success                420   None

outcome is one of success / preempted_open / preempted_limited / deferred_backpressure / failed. A failed attempt carries its error.category (transient vs terminal) so you can tell "flaky" from "broken config." Degradation you can see and operate on — not a black box.

How it behaves (category-dispatched)

Error category fan_out (quorum) failover
backpressure (429) defer — contributes nothing this round; no breaker failure route to the next candidate immediately; no breaker failure
transient (5xx, timeout) record a breaker failure; that model contributes nothing record a breaker failure; fail over (optionally retry the same model up to transient_retries)
terminal (auth/bad-request/context/content) visible failed; no breaker failure (request-level, not model health) visible failed; fail over

Before any dispatch, both strategies preempt: a model whose breaker is open (preempted_open) or whose limiter predicts it's full (preempted_limited) is skipped — predict, don't block. There are no hidden sleeps; exhaustion returns visibly (empty successes / response=None).

Injected collaborators — born ready for scale

The Breaker and (optional) Limiter are injected async protocols, never owned:

from keel_llm_reliability import InProcessBreaker, ResilientClient

# Default: zero-config in-process breaker (wraps keel-circuit-breaker).
client = ResilientClient(adapters)                       # InProcessBreaker()

# At scale: swap in a Redis-backed breaker/limiter (same protocol) for cross-worker
# state — the orchestrator code doesn't change.
client = ResilientClient(adapters, breaker=my_redis_breaker, limiter=my_redis_limiter)

The protocols are async precisely so a Redis-backed implementation (which does network I/O) can satisfy them — the in-process default just returns immediately.

Status

0.1.1 — quorum semantics are grounded in a production multi-model deployment; failover serves the single-answer broad base. 0.x while the API stabilizes through year one (breaking changes possible at minor bumps, documented in the CHANGELOG; pin exact versions).

The Keel toolkit

Composable, vendor-neutral LLM reliability libraries on PyPI: keel-llm-reliability · keel-llm-protocol · keel-llm-adapter-openai · keel-llm-adapter-anthropic · keel-llm-adapter-google · keel-circuit-breaker

MIT licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keel_llm_reliability-0.1.1.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

keel_llm_reliability-0.1.1-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file keel_llm_reliability-0.1.1.tar.gz.

File metadata

  • Download URL: keel_llm_reliability-0.1.1.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for keel_llm_reliability-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a50d080668d8da78022f706f84e7803540d7c68f07a1f1d0ff03b6e8224ad909
MD5 a6ba79ca22325625ff83e88c718fce83
BLAKE2b-256 fd772554770b88aa6a3f77ee9f4362e6986520ef42dc52593f4b035bd58398ae

See more details on using hashes here.

Provenance

The following attestation bundles were made for keel_llm_reliability-0.1.1.tar.gz:

Publisher: publish-py.yml on keelplatform/keel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file keel_llm_reliability-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for keel_llm_reliability-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2005bc7ece9eebfa6eecceb998ca7116d252c93b22b2b228a0470d555a1314e0
MD5 b4b541d8df546b571fc7180f10a2c6e3
BLAKE2b-256 ee7d022c747e4aac17c2da71066e250791ecacc2a6e62c06e9213cddb4f39315

See more details on using hashes here.

Provenance

The following attestation bundles were made for keel_llm_reliability-0.1.1-py3-none-any.whl:

Publisher: publish-py.yml on keelplatform/keel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page