Skip to main content

Composed reliability for multi-model LLM calls — quorum fan-out + primary/failover, category-dispatched, transparent degradation. Built on keel-llm-protocol + keel-circuit-breaker.

Project description

keel-llm-reliability

Production-grade reliability for multi-model LLM calls — quorum fan-out and primary/failover, category-dispatched, with transparent degradation. The composed solution, not a parts bin.

Part of Keel. Composes keel-llm-protocol (the error taxonomy) + keel-circuit-breaker into the consumer-side machinery that acts on typed errors — so you don't hand-write the fan-out/failover loop.

Why it exists

A typed error taxonomy tells you what failed; this tells your app what to do about it. The core lesson (measured in production): a rate-limited model is healthy, not failing — defer it, don't trip its circuit. Acting on that one distinction moved a throttled model from 3/10 to 10/10 availability. This package generalizes it into two strategies and makes every decision visible.

Install

pip install keel-llm-reliability     # pulls in keel-llm-protocol + keel-circuit-breaker

Two strategies

from keel_llm_reliability import ResilientClient, Request
from keel_llm_protocol import user

client = ResilientClient([groq_adapter, gemini_adapter, openai_adapter])
req = Request(messages=[user("Summarize this in one line.")])

# Primary + ordered failover — the single-good-answer case (most apps):
result = await client.failover(req)
if result.succeeded:
    print(result.response.text)

# Quorum / parallel fan-out — the ensemble/council case:
result = await client.fan_out(req)
for r in result.successes:        # every model that answered
    ...

Both are also available as plain functions (fan_out, failover) if you'd rather wire collaborators yourself.

Transparent degradation — every decision is data

No silent retries, no hidden fallbacks. Every provider interaction is a visible Attempt:

result = await client.failover(req)
for a in result.attempts:
    print(a.model_key, a.outcome, a.latency_ms, a.error and a.error.category)
# groq:…     deferred_backpressure  120   backpressure   (throttled — skipped, NOT failed)
# gemini:…   failed                 310   transient      (5xx — counted, failed over)
# openai:…   success                420   None

outcome is one of success / preempted_open / preempted_limited / deferred_backpressure / failed. A failed attempt carries its error.category (transient vs terminal) so you can tell "flaky" from "broken config." This generalizes a council's judges_count — degradation you can see and operate on.

How it behaves (category-dispatched)

Error category fan_out (quorum) failover
backpressure (429) defer — contributes nothing this round; no breaker failure route to the next candidate immediately; no breaker failure
transient (5xx, timeout) record a breaker failure; that model contributes nothing record a breaker failure; fail over (optionally retry the same model up to transient_retries)
terminal (auth/bad-request/context/content) visible failed; no breaker failure (request-level, not model health) visible failed; fail over

Before any dispatch, both strategies preempt: a model whose breaker is open (preempted_open) or whose limiter predicts it's full (preempted_limited) is skipped — predict, don't block. There are no hidden sleeps; exhaustion returns visibly (empty successes / response=None).

Injected collaborators — born ready for scale

The Breaker and (optional) Limiter are injected async protocols, never owned:

from keel_llm_reliability import InProcessBreaker, ResilientClient

# Default: zero-config in-process breaker (wraps keel-circuit-breaker).
client = ResilientClient(adapters)                       # InProcessBreaker()

# At scale: swap in a Redis-backed breaker/limiter (same protocol) for cross-worker
# state — the orchestrator code doesn't change.
client = ResilientClient(adapters, breaker=my_redis_breaker, limiter=my_redis_limiter)

The protocols are async precisely so a Redis-backed implementation (which does network I/O) can satisfy them — the in-process default just returns immediately.

Status

0.1.0 — first release. Quorum semantics are grounded in LLMCouncil's production fan-out (PR #77); failover serves the single-answer broad base. Pin exact versions while in 0.x. Source: Keel monorepo.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keel_llm_reliability-0.1.0.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

keel_llm_reliability-0.1.0-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file keel_llm_reliability-0.1.0.tar.gz.

File metadata

  • Download URL: keel_llm_reliability-0.1.0.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for keel_llm_reliability-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b267e5137388038575eabe0216e3b8079156077bcec40786919f11ed96be845b
MD5 2a5973998d6fa1f08472b6c119341f08
BLAKE2b-256 ecca1bc32d97b482f8af6afffa19564f704cf71e61a19ab7b0122b16b7bc60b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for keel_llm_reliability-0.1.0.tar.gz:

Publisher: publish-py.yml on keelplatform/keel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file keel_llm_reliability-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for keel_llm_reliability-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7b8c59f0c0293653e25901dd4dc900e2d468a44581e80d5a31458fbf548c01e6
MD5 ddd1a23cf8ef5a78961f2b0c1e968e21
BLAKE2b-256 0742108935965908ada23f879daa0c9825e4b29cda54048430a4bd5a69f5c829

See more details on using hashes here.

Provenance

The following attestation bundles were made for keel_llm_reliability-0.1.0-py3-none-any.whl:

Publisher: publish-py.yml on keelplatform/keel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page