Thin, provider-native LLM client for direct model calls within the KAOS ecosystem

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

mjbommar273

These details have not been verified by PyPI

Project links

Project description

kaos-llm-client

Thin, provider-native LLM client for the Kelvin Agentic OS — direct model calls across OpenAI, Anthropic, Google, xAI, Groq, Mistral, OpenRouter, Azure OpenAI (api-key + AAD/Entra), and AWS Bedrock (OpenAI-compatible Responses API), with one interface.

Install

uv add kaos-llm-client
# or
pip install kaos-llm-client

# Azure OpenAI with Microsoft Entra ID / DefaultAzureCredential
# (api-key auth works without this extra — only needed for AAD).
uv add 'kaos-llm-client[azure]'

# MCP server runtime (requires kaos-mcp; deferred to 0.1.0a2 — install
# kaos-mcp manually from source until then).
# uv add 'kaos-llm-client[mcp]'

Set at least one provider API key (KAOS_LLM_OPENAI_API_KEY, KAOS_LLM_ANTHROPIC_API_KEY, KAOS_LLM_GOOGLE_API_KEY, …). Standard names (OPENAI_API_KEY, etc.) are accepted as fallbacks. For Azure with AAD, see the Quick start below.

Features

Direct providers — OpenAI, Anthropic, Google, xAI, Groq, Mistral, OpenRouter, plus a generic OpenAI-compatible client (VLLM, Ollama, LiteLLM, custom endpoints)
Cloud-hosted gateways — Azure OpenAI (chat completions + Responses API; api-key OR Microsoft Entra ID via DefaultAzureCredential) and AWS Bedrock (OpenAI-compatible Responses API on bedrock-mantle.<region>.api.aws)
Multimodal — images (URL, path, bytes), audio input, document input (PDF, text)
Streaming, tools, structured output — SSE StreamAccumulator; ToolDefinition / ToolChoice; json() and pydantic() with native/tool/prompted modes and validation retries
Embeddings — embed() / embed_async() for embedding-capable providers
Composition wrappers — FallbackClient, ConcurrencyLimitedClient, InstrumentedClient
Response caching — pluggable CacheBackend with BLAKE2b-keyed FileCache
Profile-driven behavior — ModelProfile encodes provider/model differences (no if provider == branches)
Lifecycle hooks — RequestHooks(on_request, on_response, on_error, on_retry) for observability
Per-call observability — every successful call emits one LLM call complete structured info-log with provider, model, request_id, token counts, and estimated_usd cost
CLI + MCP — kaos-llm-client CLI with --json output and kaos-llm-serve MCP server

Quick start

from kaos_llm_client import create_client

# Direct OpenAI (or Anthropic, Google, xAI, Groq, Mistral, OpenRouter)
client = create_client("openai:gpt-5.4-mini")
response = client.chat([{"role": "user", "content": "Hello!"}])
print(response.text)
# logs: INFO LLM call complete provider=openai model=gpt-5.4-mini request_id=... estimated_usd=...

Azure OpenAI with Microsoft Entra ID (AAD)

Install the [azure] extra first: uv add 'kaos-llm-client[azure]'. This pulls in Microsoft's azure-identity SDK (~16 MB transitive, mostly cryptography). Without the extra, api-key auth still works on every Azure endpoint — only AAD needs azure-identity.

DefaultAzureCredential gives you managed-identity / az login / service-principal auth without storing static keys. The Responses-API client (azure-responses:) is the recommended path for gpt-5.4+ deployments where tool calling is required.

from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from kaos_llm_client import create_client

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default",
)
client = create_client(
    "azure-responses:gpt-5.4-mini",
    azure_ad_token_provider=token_provider,
)
response = client.chat([{"role": "user", "content": "Hello!"}])

azure-identity ships 22 credential classes — ManagedIdentityCredential, WorkloadIdentityCredential, ClientSecretCredential, CertificateCredential, etc. Any of them works as the first argument to get_bearer_token_provider. The async variants live in azure.identity.aio and are awaited automatically by the kaos-llm-client provider.

AAD requires a custom-subdomain endpoint (https://<resource>.openai.azure.com/); regional endpoints accept api-key only. Both forms work for azure: (chat completions) and azure-responses: (Responses API).

AWS Bedrock (OpenAI-compatible Responses API)

import os
from kaos_llm_client import create_client

# Bearer token from `aws bedrock create-bearer-token` or your AWS auth flow
os.environ["KAOS_LLM_BEDROCK_API_KEY"] = "..."
client = create_client("bedrock:openai.gpt-oss-120b")
response = client.chat([{"role": "user", "content": "Hello!"}])

Providers

Direct-API clients:

Prefix	Client	Models	Auth
`openai:`	`OpenAIClient`	GPT-5.5/5.4/5/4.1, o1/o3/o4 reasoning	`KAOS_LLM_OPENAI_API_KEY`
`anthropic:`	`AnthropicClient`	Claude 4.7 Opus, 4.6 Sonnet, 4.5 Haiku, 3.5/3.7	`KAOS_LLM_ANTHROPIC_API_KEY`
`google:`	`GoogleClient`	Gemini 2.5/3.x Pro/Flash	`KAOS_LLM_GOOGLE_API_KEY`
`xai:`	`XAIClient`	Grok-3, Grok-4	`KAOS_LLM_XAI_API_KEY`
`groq:`	`GroqClient`	LLaMA, Mixtral (OpenAI-compat)	`KAOS_LLM_GROQ_API_KEY`
`mistral:`	`MistralClient`	Mistral, Mixtral	`KAOS_LLM_MISTRAL_API_KEY`
`openrouter:`	`OpenRouterClient`	Any model via OpenRouter	`KAOS_LLM_OPENROUTER_API_KEY`
`openai-compatible:`	`OpenAICompatibleClient`	VLLM, Ollama, LiteLLM, custom	varies (`base_url=...`)

Cloud-hosted gateways:

Prefix	Client	Notes
`azure:` / `azure-openai:`	`AzureOpenAIClient` (chat completions)	Legacy path; works for any deployment
`azure-responses:` / `azure-foundry:`	`AzureOpenAIResponsesClient`	Recommended for `gpt-5.4+` — chat-completions tool calling with `reasoning: none` is unsupported by Azure on those models
`bedrock:`	`BedrockClient`	OpenAI-compatible Responses API on `bedrock-mantle.<region>.api.aws`

Azure auth is api-key (works on regional + custom-subdomain endpoints) or AAD/Entra (Authorization: Bearer <token> — custom-subdomain endpoint required). Use azure_ad_token=... for a static bearer or azure_ad_token_provider=... for DefaultAzureCredential / managed identity / az login flows. See the Quick start for the canonical Entra ID example.

Model strings use provider:model format. If no prefix is given, the provider is inferred from the model name:

create_client("openai:gpt-5.4-mini")          # explicit provider
create_client("claude-sonnet-4-6")            # inferred: anthropic
create_client("gemini-2.5-pro")               # inferred: google
create_client("grok-3")                       # inferred: xai
create_client("azure-responses:gpt-5.4-mini") # Azure Responses API
create_client("bedrock:openai.gpt-oss-120b")  # AWS Bedrock

Compatibility & status

Item	Value
Python	3.13, 3.14
OS	Linux, macOS, Windows
Maturity	Alpha (`Development Status :: 3 - Alpha`); SemVer, pre-1.0 minor bumps may break public API
Tests	924 unit + 5 live integration
Type checker	`ty` (clean)

Configuration

All settings use the KAOS_LLM_ prefix via KaosLLMSettings (ModuleSettings subclass). Each provider key has a legacy fallback (e.g. OPENAI_API_KEY) for backward compatibility.

Variable	Default	Description
`KAOS_LLM_{OPENAI,ANTHROPIC,GOOGLE,XAI,GROQ,MISTRAL,OPENROUTER}_API_KEY`	—	Direct-provider API key (`SecretStr`)
`KAOS_LLM_OPENAI_BASE_URL`	`https://api.openai.com`	Override for proxies / local models (per-provider variants exist)
`KAOS_LLM_AZURE_OPENAI_ENDPOINT`	—	Azure resource URL (e.g. `https://my-resource.openai.azure.com/`)
`KAOS_LLM_AZURE_OPENAI_API_KEY`	—	Azure resource subscription key (alternative to AAD)
`KAOS_LLM_AZURE_OPENAI_AD_TOKEN`	—	Static AAD bearer (use `azure_ad_token_provider=` for refresh)
`KAOS_LLM_AZURE_OPENAI_API_VERSION`	`2024-12-01-preview`	Azure API version (bump to `2025-04-01-preview` for newer Responses-API features)
`KAOS_LLM_BEDROCK_API_KEY`	—	AWS Bedrock bearer token; legacy fallback `AWS_BEARER_TOKEN_BEDROCK`
`KAOS_LLM_BEDROCK_BASE_URL`	`https://bedrock-mantle.us-east-2.api.aws`	Bedrock endpoint (override for other regions)
`KAOS_LLM_DEFAULT_TIMEOUT`	`120.0`	Request timeout (seconds)
`KAOS_LLM_DEFAULT_MAX_RETRIES`	`3`	Max retry attempts
`KAOS_LLM_MAX_RESPONSE_BYTES`	`33554432`	32 MiB cap on non-streaming responses
`KAOS_LLM_STREAM_MAX_DURATION`	`600.0`	Wall-clock cap on a streaming response (seconds)
`KAOS_LLM_CACHE_ENABLED`	`false`	Enable response caching
`KAOS_LLM_CACHE_PATH`	`~/.cache/kaos/llm`	Cache directory

Per-request overrides flow through KaosContext._config for MCP callers.

CLI

kaos-llm-client check [--provider openai,anthropic] [--json]   # verify credentials
kaos-llm-client chat --model openai:gpt-5 --message "Hello!" [--system "..."] [--json]
kaos-llm-client profiles [--json]                              # list known model profiles
kaos-llm-client config [--json]                                # resolved settings (redacted)

All commands support --json with a consistent envelope: {"command": "...", ...}.

MCP Server

kaos-llm-serve                                                # stdio (Claude Code / Desktop)
kaos-llm-serve --http --port 8000                             # streamable HTTP
kaos-llm-serve --model openai:gpt-5 --http --debug            # default model + debug logging

Exposes kaos-llm-chat, kaos-llm-json, and kaos-llm-embed MCP tools.

Security: the HTTP transport has no built-in authentication or rate limiting. The default --host 127.0.0.1 binds to loopback, which is the safe default. Do not bind to a non-loopback interface unless you put an authenticated reverse proxy (mTLS, OAuth, IP allowlist, etc.) in front of it — anyone who can reach the port can spend your configured LLM credits. The server emits a startup warning when --host is not loopback. See kaos_llm_client/serve.py module docstring for the full guidance.

Companion packages

Direct dependencies in the KAOS stack:

kaos-core — runtime, ModuleSettings, KaosContext, structured logging
kaos-mcp (optional via [mcp]) — FastMCP bridge for kaos-llm-serve

Higher layers consume kaos-llm-client for inference: kaos-llm-core (typed programs), kaos-agents (runtime), kaos-citations (verification). Full module roster at docs.kelvin.legal/kaos-llm-client.

Development

uv sync --group dev
uv run ruff format kaos_llm_client/ tests/
uv run ruff check kaos_llm_client/ tests/
uv run ty check kaos_llm_client/ tests/
uv run pytest tests/unit/ -q
# live tier requires provider keys; see tests/integration/
uv run pytest tests/integration/ -q

Build from source

uv build
uv pip install dist/kaos_llm_client-*.whl

Contributing

Issues and pull requests are welcome. By contributing you certify the Developer Certificate of Origin v1.1 — sign every commit with git commit -s. Please open an issue before starting on a non-trivial change so we can align on scope.

Security

For security issues, please do not file a public issue. Report privately via GitHub Private Vulnerability Reporting or email security@273ventures.com. See SECURITY.md for the full disclosure policy.

License

Apache License 2.0 — see LICENSE and NOTICE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

mjbommar273

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0a2 pre-release

May 7, 2026

0.1.0a1 pre-release

May 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaos_llm_client-0.1.0a2.tar.gz (224.6 kB view details)

Uploaded May 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kaos_llm_client-0.1.0a2-py3-none-any.whl (149.6 kB view details)

Uploaded May 7, 2026 Python 3

File details

Details for the file kaos_llm_client-0.1.0a2.tar.gz.

File metadata

Download URL: kaos_llm_client-0.1.0a2.tar.gz
Upload date: May 7, 2026
Size: 224.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kaos_llm_client-0.1.0a2.tar.gz
Algorithm	Hash digest
SHA256	`81f28448d54fdf26d7e890d24d281d22937ae51bbd1e6fce24f06e3faa1e9d6b`
MD5	`752dfd9f686e36350ce320411623a5a4`
BLAKE2b-256	`30b6a564b40c0ee6eea6e9c444ce45f32a0d6b6a530e1e01ae1acbf547a98036`

See more details on using hashes here.

Provenance

The following attestation bundles were made for kaos_llm_client-0.1.0a2.tar.gz:

Publisher: release.yml on 273v/kaos-llm-client

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: kaos_llm_client-0.1.0a2.tar.gz
- Subject digest: 81f28448d54fdf26d7e890d24d281d22937ae51bbd1e6fce24f06e3faa1e9d6b
- Sigstore transparency entry: 1462429770
- Sigstore integration time: May 7, 2026
Source repository:
- Permalink: 273v/kaos-llm-client@b8e5660e3a1481600b9fb2657a9a76677f6529a0
- Branch / Tag: refs/tags/v0.1.0a2
- Owner: https://github.com/273v
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@b8e5660e3a1481600b9fb2657a9a76677f6529a0
- Trigger Event: push

File details

Details for the file kaos_llm_client-0.1.0a2-py3-none-any.whl.

File metadata

Download URL: kaos_llm_client-0.1.0a2-py3-none-any.whl
Upload date: May 7, 2026
Size: 149.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kaos_llm_client-0.1.0a2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bd6cd8cac47d4c37491306098f98c51a4a5a9b789ba1c332d4ecb5074b42716a`
MD5	`2a033feeaa79ab8356d2db95398ed3af`
BLAKE2b-256	`d812eb00ecbc9bb36781717d9208cc9121f03aece43d7cbe2d02dc18d7149a9d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for kaos_llm_client-0.1.0a2-py3-none-any.whl:

Publisher: release.yml on 273v/kaos-llm-client

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: kaos_llm_client-0.1.0a2-py3-none-any.whl
- Subject digest: bd6cd8cac47d4c37491306098f98c51a4a5a9b789ba1c332d4ecb5074b42716a
- Sigstore transparency entry: 1462429813
- Sigstore integration time: May 7, 2026
Source repository:
- Permalink: 273v/kaos-llm-client@b8e5660e3a1481600b9fb2657a9a76677f6529a0
- Branch / Tag: refs/tags/v0.1.0a2
- Owner: https://github.com/273v
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@b8e5660e3a1481600b9fb2657a9a76677f6529a0
- Trigger Event: push

kaos-llm-client 0.1.0a2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

kaos-llm-client

Install

Features

Quick start

Azure OpenAI with Microsoft Entra ID (AAD)

AWS Bedrock (OpenAI-compatible Responses API)

Providers

Compatibility & status

Configuration

CLI

MCP Server

Companion packages

Development

Build from source

Contributing

Security

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance