Composed reliability for multi-model LLM calls — quorum fan-out + primary/failover, category-dispatched, transparent degradation. Built on keel-llm-protocol + keel-circuit-breaker.
Project description
keel-llm-reliability
Production-grade reliability for multi-model LLM calls — quorum fan-out and primary/failover, category-dispatched, with transparent degradation. The composed solution, not a parts bin.
Part of Keel. Composes keel-llm-protocol (the error taxonomy) + keel-circuit-breaker into the consumer-side machinery that acts on typed errors — so you don't hand-write the fan-out/failover loop.
Why it exists
A typed error taxonomy tells you what failed; this tells your app what to do about it. The core lesson (measured in production): a rate-limited model is healthy, not failing — defer it, don't trip its circuit. Acting on that one distinction moved a throttled model from 3/10 to 10/10 availability. This package generalizes it into two strategies and makes every decision visible.
Install
pip install keel-llm-reliability # pulls in keel-llm-protocol + keel-circuit-breaker
Two strategies
from keel_llm_reliability import ResilientClient, Request
from keel_llm_protocol import user
client = ResilientClient([groq_adapter, gemini_adapter, openai_adapter])
req = Request(messages=[user("Summarize this in one line.")])
# Primary + ordered failover — the single-good-answer case (most apps):
result = await client.failover(req)
if result.succeeded:
print(result.response.text)
# Quorum / parallel fan-out — the ensemble/council case:
result = await client.fan_out(req)
for r in result.successes: # every model that answered
...
Both are also available as plain functions (fan_out, failover) if you'd rather wire collaborators yourself.
Transparent degradation — every decision is data
No silent retries, no hidden fallbacks. Every provider interaction is a visible Attempt:
result = await client.failover(req)
for a in result.attempts:
print(a.model_key, a.outcome, a.latency_ms, a.error and a.error.category)
# groq:… deferred_backpressure 120 backpressure (throttled — skipped, NOT failed)
# gemini:… failed 310 transient (5xx — counted, failed over)
# openai:… success 420 None
outcome is one of success / preempted_open / preempted_limited / deferred_backpressure / failed. A failed attempt carries its error.category (transient vs terminal) so you can tell "flaky" from "broken config." This generalizes a council's judges_count — degradation you can see and operate on.
How it behaves (category-dispatched)
| Error category | fan_out (quorum) | failover |
|---|---|---|
backpressure (429) |
defer — contributes nothing this round; no breaker failure | route to the next candidate immediately; no breaker failure |
transient (5xx, timeout) |
record a breaker failure; that model contributes nothing | record a breaker failure; fail over (optionally retry the same model up to transient_retries) |
terminal (auth/bad-request/context/content) |
visible failed; no breaker failure (request-level, not model health) |
visible failed; fail over |
Before any dispatch, both strategies preempt: a model whose breaker is open (preempted_open) or whose limiter predicts it's full (preempted_limited) is skipped — predict, don't block. There are no hidden sleeps; exhaustion returns visibly (empty successes / response=None).
Injected collaborators — born ready for scale
The Breaker and (optional) Limiter are injected async protocols, never owned:
from keel_llm_reliability import InProcessBreaker, ResilientClient
# Default: zero-config in-process breaker (wraps keel-circuit-breaker).
client = ResilientClient(adapters) # InProcessBreaker()
# At scale: swap in a Redis-backed breaker/limiter (same protocol) for cross-worker
# state — the orchestrator code doesn't change.
client = ResilientClient(adapters, breaker=my_redis_breaker, limiter=my_redis_limiter)
The protocols are async precisely so a Redis-backed implementation (which does network I/O) can satisfy them — the in-process default just returns immediately.
Status
0.1.0 — first release. Quorum semantics are grounded in LLMCouncil's production fan-out (PR #77); failover serves the single-answer broad base. Pin exact versions while in 0.x. Source: Keel monorepo.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file keel_llm_reliability-0.1.0.tar.gz.
File metadata
- Download URL: keel_llm_reliability-0.1.0.tar.gz
- Upload date:
- Size: 11.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b267e5137388038575eabe0216e3b8079156077bcec40786919f11ed96be845b
|
|
| MD5 |
2a5973998d6fa1f08472b6c119341f08
|
|
| BLAKE2b-256 |
ecca1bc32d97b482f8af6afffa19564f704cf71e61a19ab7b0122b16b7bc60b0
|
Provenance
The following attestation bundles were made for keel_llm_reliability-0.1.0.tar.gz:
Publisher:
publish-py.yml on keelplatform/keel
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
keel_llm_reliability-0.1.0.tar.gz -
Subject digest:
b267e5137388038575eabe0216e3b8079156077bcec40786919f11ed96be845b - Sigstore transparency entry: 1608407833
- Sigstore integration time:
-
Permalink:
keelplatform/keel@c39e5e08eeba3595ec85cad3340735c748be8b0e -
Branch / Tag:
refs/tags/py-llm-reliability-v0.1.0 - Owner: https://github.com/keelplatform
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-py.yml@c39e5e08eeba3595ec85cad3340735c748be8b0e -
Trigger Event:
push
-
Statement type:
File details
Details for the file keel_llm_reliability-0.1.0-py3-none-any.whl.
File metadata
- Download URL: keel_llm_reliability-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b8c59f0c0293653e25901dd4dc900e2d468a44581e80d5a31458fbf548c01e6
|
|
| MD5 |
ddd1a23cf8ef5a78961f2b0c1e968e21
|
|
| BLAKE2b-256 |
0742108935965908ada23f879daa0c9825e4b29cda54048430a4bd5a69f5c829
|
Provenance
The following attestation bundles were made for keel_llm_reliability-0.1.0-py3-none-any.whl:
Publisher:
publish-py.yml on keelplatform/keel
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
keel_llm_reliability-0.1.0-py3-none-any.whl -
Subject digest:
7b8c59f0c0293653e25901dd4dc900e2d468a44581e80d5a31458fbf548c01e6 - Sigstore transparency entry: 1608407902
- Sigstore integration time:
-
Permalink:
keelplatform/keel@c39e5e08eeba3595ec85cad3340735c748be8b0e -
Branch / Tag:
refs/tags/py-llm-reliability-v0.1.0 - Owner: https://github.com/keelplatform
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-py.yml@c39e5e08eeba3595ec85cad3340735c748be8b0e -
Trigger Event:
push
-
Statement type: