Skip to main content

Composed reliability for multi-model LLM calls — quorum fan-out + primary/failover, category-dispatched, transparent degradation. Built on keel-llm-protocol + keel-circuit-breaker.

Project description

keel-llm-reliability

Production-grade reliability for multi-model LLM calls — quorum fan-out and primary/failover, category-dispatched, with transparent degradation. The composed solution, not a parts bin.

Part of the Keel toolkit. Composes keel-llm-protocol (the error taxonomy) + keel-circuit-breaker into the consumer-side machinery that acts on typed errors — so you don't hand-write the fan-out/failover loop.

Why it exists

A typed error taxonomy tells you what failed; this tells your app what to do about it. The core lesson (measured in production): a rate-limited model is healthy, not failing — defer it, don't trip its circuit. Acting on that one distinction moved a throttled model from 3/10 to 10/10 availability. This package generalizes it into two strategies and makes every decision visible.

Is this for you?

Adopt when — multi-model apps that need failover or ensemble; production traffic where rate-limit handling matters; operators who'll read the visible-degradation trail. Skip when — single-provider apps (the SDK + a basic retry is enough); you already have working in-tree reliability; prototypes / scripts; you need a runtime/framework rather than called helpers.

Deciding test: does this deliver a capability your codebase genuinely lacks, or could you get the same outcome with the SDK + a library you already trust? If the latter, skip.

Install

# this package + at least one adapter for the providers you call:
pip install keel-llm-reliability keel-llm-adapter-openai
#   keel-llm-adapter-anthropic / keel-llm-adapter-google also available;
#   reliability itself pulls in keel-llm-protocol + keel-circuit-breaker.

Quickstart (copy-paste runnable)

import asyncio
from keel_llm_reliability import ResilientClient, Request
from keel_llm_adapter_openai import OpenAIAdapter
from keel_llm_protocol import user

# Adapters are plain objects implementing keel-llm-protocol. Any OpenAI-compatible
# endpoint works (OpenAI, Groq, OpenRouter, Mistral, vLLM, Ollama, …); mix providers freely.
primary  = OpenAIAdapter(model="llama-3.3-70b-versatile", api_key="gsk_…",
                         base_url="https://api.groq.com/openai/v1", provider="groq")
fallback = OpenAIAdapter(model="llama-3.1-8b", base_url="http://localhost:11434/v1",
                         provider="local")

client = ResilientClient([primary, fallback])     # ordered: primary, then fallbacks

async def main() -> None:
    result = await client.failover(Request(messages=[user("One-line summary of TCP.")]))
    if result.succeeded:
        print(result.response.text)
    for a in result.attempts:                     # every decision is visible data
        print(a.model_key, a.outcome, f"{a.latency_ms}ms")

asyncio.run(main())

Two strategies

from keel_llm_reliability import ResilientClient, Request
from keel_llm_protocol import user

client = ResilientClient([primary, fallback])      # adapters built as above
req = Request(messages=[user("Summarize this in one line.")])

# Primary + ordered failover — the single-good-answer case (most apps):
result = await client.failover(req)
if result.succeeded:
    print(result.response.text)

# Quorum / parallel fan-out — the ensemble case:
result = await client.fan_out(req)
for r in result.successes:        # every model that answered
    ...

Both are also available as plain functions (fan_out, failover) if you'd rather wire collaborators yourself.

Transparent degradation — every decision is data

No silent retries, no hidden fallbacks. Every provider interaction is a visible Attempt:

result = await client.failover(req)
for a in result.attempts:
    print(a.model_key, a.outcome, a.latency_ms, a.error and a.error.category)
# groq:…     deferred_backpressure  120   backpressure   (throttled — skipped, NOT failed)
# gemini:…   failed                 310   transient      (5xx — counted, failed over)
# openai:…   success                420   None

outcome is one of success / preempted_open / preempted_limited / deferred_backpressure / failed. A failed attempt carries its error.category (transient vs terminal) so you can tell "flaky" from "broken config." Degradation you can see and operate on — not a black box.

How it behaves (category-dispatched)

Error category fan_out (quorum) failover
backpressure (429) defer — contributes nothing this round; no breaker failure route to the next candidate immediately; no breaker failure
transient (5xx, timeout) record a breaker failure; that model contributes nothing record a breaker failure; fail over (optionally retry the same model up to transient_retries)
terminal (auth/bad-request/context/content) visible failed; no breaker failure (request-level, not model health) visible failed; fail over

Before any dispatch, both strategies preempt: a model whose breaker is open (preempted_open) or whose limiter predicts it's full (preempted_limited) is skipped — predict, don't block. There are no hidden sleeps; exhaustion returns visibly (empty successes / response=None).

Injected collaborators — born ready for scale

The Breaker and (optional) Limiter are injected async protocols, never owned:

from keel_llm_reliability import InProcessBreaker, ResilientClient

# Default: zero-config in-process breaker (wraps keel-circuit-breaker).
client = ResilientClient(adapters)                       # InProcessBreaker()

# At scale: swap in a Redis-backed breaker/limiter (same protocol) for cross-worker
# state — the orchestrator code doesn't change.
client = ResilientClient(adapters, breaker=my_redis_breaker, limiter=my_redis_limiter)

The protocols are async precisely so a Redis-backed implementation (which does network I/O) can satisfy them — the in-process default just returns immediately.

Status

0.1.2 — quorum semantics are grounded in a production multi-model deployment; failover serves the single-answer broad base. 0.x while the API stabilizes through year one (breaking changes possible at minor bumps, documented in the CHANGELOG; pin exact versions).

The Keel toolkit

Composable, vendor-neutral LLM reliability libraries on PyPI: keel-llm-reliability · keel-llm-protocol · keel-llm-adapter-openai · keel-llm-adapter-anthropic · keel-llm-adapter-google · keel-circuit-breaker

MIT licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keel_llm_reliability-0.1.2.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

keel_llm_reliability-0.1.2-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file keel_llm_reliability-0.1.2.tar.gz.

File metadata

  • Download URL: keel_llm_reliability-0.1.2.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for keel_llm_reliability-0.1.2.tar.gz
Algorithm Hash digest
SHA256 f949d852b696a867e67659c9ccaee33958d2e64726977104437f05e4fcbf9267
MD5 ad92d57b21059ce63ccc38ed036391c0
BLAKE2b-256 81fd022160155aaba82e459e565a29ee45498c0b4cf35009e37925b3c265a560

See more details on using hashes here.

Provenance

The following attestation bundles were made for keel_llm_reliability-0.1.2.tar.gz:

Publisher: publish-py.yml on keelplatform/keel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file keel_llm_reliability-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for keel_llm_reliability-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 85197cc70d06e87015a858c09edf7f1bee316a2c6dd1ff21820663ab1123337a
MD5 2258a9e8313e403b2c346f3accb944ec
BLAKE2b-256 c4e5ca99b890102023667f86fa5a127adaea6ffaa92bff691e771011ecd47163

See more details on using hashes here.

Provenance

The following attestation bundles were made for keel_llm_reliability-0.1.2-py3-none-any.whl:

Publisher: publish-py.yml on keelplatform/keel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page