Skip to main content

The universal ModelAdapter Protocol for LLM clients — a vendor-neutral adapter interface (the "JDBC of LLMs"). Zero dependencies.

Project description

keel-llm-protocol

The vendor-neutral LLM adapter standard — the "JDBC of LLMs." A small, zero-dependency interface any LLM adapter implements, plus the standard types adapters exchange and a standardized error taxonomy. Implement it once; plug into anything that speaks it.

Part of the Keel toolkit. No provider lock-in, no base class to inherit, no framework. A reference implementation against the OpenAI-compatible wire format ships as keel-llm-adapter-openai, so the standard is grounded in a working adapter — not just an interface document.

Why a standard

Every product talking to more than one LLM provider rebuilds the same shim: a common interface so the rest of the code doesn't care which provider answered. Everyone's shim is slightly different, so nothing composes. keel-llm-protocol is that interface, written once, vendor-neutral, with the part that actually matters for reliability standardized too:

  • A standardized error taxonomy — every adapter raises typed, retryable-tagged failures. A 429 is RateLimitError whether it came from Groq, Gemini, or a local model, so circuit breakers, retries, and failover act on types instead of parsing provider strings. (This is the difference that lets a circuit breaker not trip on a 429 — a healthy-but-throttled model — and defer to a rate limiter instead.)
  • Composable capability protocols — implement only what your backend supports.
  • Normalized resultsUsage, FinishReason, ToolCall mean the same thing everywhere.

Is this for you?

Adopt when — you're building tooling across multiple LLM providers (a router, evaluator, observability layer, gateway); writing a vendor-neutral adapter; or want a shared error vocabulary across LLM clients. Skip when — you only ever call one provider in a simple app (the official SDK is enough); or you're already invested in another abstraction (LangChain's BaseChatModel, LlamaIndex's LLM).

Install

pip install keel-llm-protocol     # or: uv add keel-llm-protocol

Zero runtime dependencies (stdlib only).

The composable protocols

from keel_llm_protocol import ModelAdapter, StreamingModelAdapter, ToolCallingModelAdapter
  • ModelAdapter — the core. model_key, generate(messages, ...) -> AdapterResponse, health_check(). Every adapter implements this.
  • StreamingModelAdapter — adds stream(...) -> AsyncIterator[StreamChunk]. Implement only if your backend streams.
  • ToolCallingModelAdapter — adds generate_with_tools(messages, tools, ...). Implement only if your backend supports tools.

Consumers type against the capability they need — a plain text model and a streaming tool-using model both fit, and the type checker tells a consumer at compile time whether the adapter it was handed can stream.

Implement an adapter (structural — no inheritance)

from keel_llm_protocol import (
    ModelAdapter, Message, AdapterResponse, HealthStatus, Usage, user,
)

class MyAdapter:
    @property
    def model_key(self) -> str:
        return "myprovider:my-model"

    async def generate(self, messages, *, temperature=None, max_tokens=None,
                       stop=None, response_format=None) -> AdapterResponse:
        # ... call the provider, map failures to keel_llm_protocol.errors.* ...
        return AdapterResponse(
            text="...", model_key=self.model_key, model_id="my-model",
            usage=Usage(input_tokens=12, output_tokens=8), finish_reason="stop",
        )

    async def health_check(self) -> HealthStatus:
        return HealthStatus(model_key=self.model_key, healthy=True)

adapter: ModelAdapter = MyAdapter()
assert isinstance(adapter, ModelAdapter)        # @runtime_checkable
response = await adapter.generate([user("hello")])

The error taxonomy (the reliability core)

Every adapter maps its provider's failures to these types, so consuming reliability logic is provider-agnostic:

from keel_llm_protocol.errors import (
    AdapterError,          # base — catch this for "any adapter failure"
    RateLimitError,        # 429 / quota — retryable=True, carries retry_after
    AuthenticationError,   # 401/403 — retryable=False
    AdapterTimeoutError,   # timed out — retryable=True
    TransientError,        # 5xx / connection — retryable=True
    ContentFilterError,    # content policy — retryable=False
    ContextLengthError,    # context window exceeded — retryable=False
    BadRequestError,       # 400 invalid request — retryable=False
    ProviderError,         # unexpected — retryable=False
)

try:
    resp = await adapter.generate(messages)
except RateLimitError as e:
    await asyncio.sleep(e.retry_after or 1.0)   # healthy but throttled — back off, don't trip the breaker
except AdapterError as e:
    if e.retryable:
        ...   # retry / failover
    else:
        raise

err.retryable lets generic retry/breaker logic decide without knowing the provider. That single fact is what every product otherwise reimplements per provider.

Consuming the taxonomy (how to act on a typed error)

Typed errors only improve reliability if reliability machinery acts on them correctly. The key is that there are three reactions, not two — and a plain retryable boolean can't express the most important one. Dispatch on error.category:

  • "backpressure" (e.g. 429) — a rate-limited model is healthy, not failing. Defer (let a rate limiter pace it); do not retry now, and do not record a circuit-breaker failure. Tripping a breaker on a 429 skips a working model — the opposite of what you want.
  • "transient" (5xx, timeout) — a real but temporary failure → retry / fail over (this one does count as a breaker failure).
  • "terminal" (auth, bad request, context-length, content-filter) — won't change on retry → fail fast; retrying just burns quota.
from keel_llm_protocol.errors import AdapterError

try:
    return await adapter.generate(messages)
except AdapterError as e:
    if e.category == "backpressure":
        ...        # defer to the rate limiter; do NOT record a breaker failure
    elif e.category == "transient":
        ...        # retry / fail over (this one DOES count as a breaker failure)
    else:          # "terminal"
        raise      # fail fast — won't change on retry

Grounded result: the backpressure-vs-transient split — defer the 429 instead of retrying-or-failing it — moved a throttled model from 3/10 to 10/10 availability in a real multi-model fan-out (the model stayed in rotation instead of being spuriously circuit-broken). retryable remains as a convenience (category != "terminal"), but category is the source of truth because only it distinguishes defer from retry-now. The machinery that implements this dispatch lives in keel-llm-reliability.

This is guidance, not behavior baked into the protocol — routing, retry, and breaker policy live in your code (the protocol never decides or holds state). The taxonomy gives you the typed signal; this is how to use it well.

Standard types

@dataclass
class AdapterResponse:
    text: str
    model_key: str
    model_id: str
    finish_reason: FinishReason = "stop"     # "stop" | "length" | "tool_calls" | "content_filter" | "error"
    usage: Usage = Usage()                    # input_tokens, output_tokens, cost_usd?, .total_tokens
    tool_calls: list[ToolCall] = []
    latency_ms: int = 0
    raw_response: dict[str, Any] | None = None   # escape hatch; not part of the stable contract

@dataclass
class StreamChunk:
    delta: str = ""
    finish_reason: FinishReason | None = None
    usage: Usage | None = None               # typically only on the final chunk

Message helpers keep call sites clean: system(...), user(...), assistant(...), tool(..., tool_call_id=...).

What this is not (the framework line)

A Protocol describes the shape of a call and the type of a result. It deliberately does not: execute tools or run agent loops, decide retry/backoff policy, route failover, buffer/reassemble streams, or hold conversation state. Those are the consumer's to own. If it makes a decision or holds state, it isn't in this package.

Status

0.1.2 — grounded by keel-llm-adapter-openai as a working reference implementation. An interface is most valuable when stable; the surface is intentionally minimal-but-complete and stays in 0.x through year one (changes documented in the CHANGELOG; pin exact versions).

The Keel toolkit

Composable, vendor-neutral LLM reliability libraries on PyPI: keel-llm-reliability · keel-llm-protocol · keel-llm-adapter-openai · keel-llm-adapter-anthropic · keel-llm-adapter-google · keel-circuit-breaker

MIT licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keel_llm_protocol-0.1.2.tar.gz (13.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

keel_llm_protocol-0.1.2-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file keel_llm_protocol-0.1.2.tar.gz.

File metadata

  • Download URL: keel_llm_protocol-0.1.2.tar.gz
  • Upload date:
  • Size: 13.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for keel_llm_protocol-0.1.2.tar.gz
Algorithm Hash digest
SHA256 ed7789d42ef2f4aa0abd2cd09ac6814019bfe1b1b88b6df199c4339cebc27819
MD5 4f016548103c2071935346af4da795bf
BLAKE2b-256 a4108244f6b9414e3db26166d2f3cd5f6c73a9ee0752553ec66bf7c4e4784e29

See more details on using hashes here.

Provenance

The following attestation bundles were made for keel_llm_protocol-0.1.2.tar.gz:

Publisher: publish-py.yml on keelplatform/keel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file keel_llm_protocol-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for keel_llm_protocol-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 027ddf51ffd787bf69bfa6a0821bf254ee900714b7964965f527d6aaddb755a4
MD5 3f7c7ea3ba3e68a4b1a7a42f0d9707e2
BLAKE2b-256 faa7f82a9c0331c00b20ef3bb4eaa02067a37cb72bb7885e87189cbaeec3a010

See more details on using hashes here.

Provenance

The following attestation bundles were made for keel_llm_protocol-0.1.2-py3-none-any.whl:

Publisher: publish-py.yml on keelplatform/keel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page