Thin, provider-native LLM client for direct model calls within the KAOS ecosystem
Project description
kaos-llm-client
Thin, provider-native LLM client for the Kelvin Agentic OS — direct model calls across OpenAI, Anthropic, Google, xAI, Groq, Mistral, OpenRouter, Azure OpenAI (api-key + AAD/Entra), and AWS Bedrock (OpenAI-compatible Responses API), with one interface.
Install
uv add kaos-llm-client
# or
pip install kaos-llm-client
# Azure OpenAI with Microsoft Entra ID / DefaultAzureCredential
# (api-key auth works without this extra — only needed for AAD).
uv add 'kaos-llm-client[azure]'
# MCP server runtime (requires kaos-mcp; deferred to 0.1.0a2 — install
# kaos-mcp manually from source until then).
# uv add 'kaos-llm-client[mcp]'
Set at least one provider API key (KAOS_LLM_OPENAI_API_KEY, KAOS_LLM_ANTHROPIC_API_KEY, KAOS_LLM_GOOGLE_API_KEY, …). Standard names (OPENAI_API_KEY, etc.) are accepted as fallbacks. For Azure with AAD, see the Quick start below.
Features
- Direct providers — OpenAI, Anthropic, Google, xAI, Groq, Mistral, OpenRouter, plus a generic OpenAI-compatible client (VLLM, Ollama, LiteLLM, custom endpoints)
- Cloud-hosted gateways — Azure OpenAI (chat completions + Responses API; api-key OR Microsoft Entra ID via
DefaultAzureCredential) and AWS Bedrock (OpenAI-compatible Responses API onbedrock-mantle.<region>.api.aws) - Multimodal — images (URL, path, bytes), audio input, document input (PDF, text)
- Streaming, tools, structured output — SSE
StreamAccumulator;ToolDefinition/ToolChoice;json()andpydantic()with native/tool/prompted modes and validation retries - Embeddings —
embed()/embed_async()for embedding-capable providers - Composition wrappers —
FallbackClient,ConcurrencyLimitedClient,InstrumentedClient - Response caching — pluggable
CacheBackendwith BLAKE2b-keyedFileCache - Profile-driven behavior —
ModelProfileencodes provider/model differences (noif provider ==branches) - Lifecycle hooks —
RequestHooks(on_request, on_response, on_error, on_retry)for observability - Per-call observability — every successful call emits one
LLM call completestructured info-log with provider, model, request_id, token counts, andestimated_usdcost - CLI + MCP —
kaos-llm-clientCLI with--jsonoutput andkaos-llm-serveMCP server
Quick start
from kaos_llm_client import create_client
# Direct OpenAI (or Anthropic, Google, xAI, Groq, Mistral, OpenRouter)
client = create_client("openai:gpt-5.4-mini")
response = client.chat([{"role": "user", "content": "Hello!"}])
print(response.text)
# logs: INFO LLM call complete provider=openai model=gpt-5.4-mini request_id=... estimated_usd=...
Azure OpenAI with Microsoft Entra ID (AAD)
Install the
[azure]extra first:uv add 'kaos-llm-client[azure]'. This pulls in Microsoft'sazure-identitySDK (~16 MB transitive, mostlycryptography). Without the extra, api-key auth still works on every Azure endpoint — only AAD needsazure-identity.
DefaultAzureCredential gives you managed-identity / az login /
service-principal auth without storing static keys. The Responses-API
client (azure-responses:) is the recommended path for gpt-5.4+
deployments where tool calling is required.
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from kaos_llm_client import create_client
token_provider = get_bearer_token_provider(
DefaultAzureCredential(),
"https://cognitiveservices.azure.com/.default",
)
client = create_client(
"azure-responses:gpt-5.4-mini",
azure_ad_token_provider=token_provider,
)
response = client.chat([{"role": "user", "content": "Hello!"}])
azure-identity ships 22 credential classes — ManagedIdentityCredential, WorkloadIdentityCredential, ClientSecretCredential, CertificateCredential, etc. Any of them works as the first argument to get_bearer_token_provider. The async variants live in azure.identity.aio and are awaited automatically by the kaos-llm-client provider.
AAD requires a custom-subdomain endpoint
(https://<resource>.openai.azure.com/); regional endpoints accept
api-key only. Both forms work for azure: (chat completions) and
azure-responses: (Responses API).
AWS Bedrock (OpenAI-compatible Responses API)
import os
from kaos_llm_client import create_client
# Bearer token from `aws bedrock create-bearer-token` or your AWS auth flow
os.environ["KAOS_LLM_BEDROCK_API_KEY"] = "..."
client = create_client("bedrock:openai.gpt-oss-120b")
response = client.chat([{"role": "user", "content": "Hello!"}])
Providers
Direct-API clients:
| Prefix | Client | Models | Auth |
|---|---|---|---|
openai: |
OpenAIClient |
GPT-5.5/5.4/5/4.1, o1/o3/o4 reasoning | KAOS_LLM_OPENAI_API_KEY |
anthropic: |
AnthropicClient |
Claude 4.7 Opus, 4.6 Sonnet, 4.5 Haiku, 3.5/3.7 | KAOS_LLM_ANTHROPIC_API_KEY |
google: |
GoogleClient |
Gemini 2.5/3.x Pro/Flash | KAOS_LLM_GOOGLE_API_KEY |
xai: |
XAIClient |
Grok-3, Grok-4 | KAOS_LLM_XAI_API_KEY |
groq: |
GroqClient |
LLaMA, Mixtral (OpenAI-compat) | KAOS_LLM_GROQ_API_KEY |
mistral: |
MistralClient |
Mistral, Mixtral | KAOS_LLM_MISTRAL_API_KEY |
openrouter: |
OpenRouterClient |
Any model via OpenRouter | KAOS_LLM_OPENROUTER_API_KEY |
openai-compatible: |
OpenAICompatibleClient |
VLLM, Ollama, LiteLLM, custom | varies (base_url=...) |
Cloud-hosted gateways:
| Prefix | Client | Notes |
|---|---|---|
azure: / azure-openai: |
AzureOpenAIClient (chat completions) |
Legacy path; works for any deployment |
azure-responses: / azure-foundry: |
AzureOpenAIResponsesClient |
Recommended for gpt-5.4+ — chat-completions tool calling with reasoning: none is unsupported by Azure on those models |
bedrock: |
BedrockClient |
OpenAI-compatible Responses API on bedrock-mantle.<region>.api.aws |
Azure auth is api-key (works on regional + custom-subdomain endpoints) or AAD/Entra (Authorization: Bearer <token> — custom-subdomain endpoint required). Use azure_ad_token=... for a static bearer or azure_ad_token_provider=... for DefaultAzureCredential / managed identity / az login flows. See the Quick start for the canonical Entra ID example.
Model strings use provider:model format. If no prefix is given, the provider is inferred from the model name:
create_client("openai:gpt-5.4-mini") # explicit provider
create_client("claude-sonnet-4-6") # inferred: anthropic
create_client("gemini-2.5-pro") # inferred: google
create_client("grok-3") # inferred: xai
create_client("azure-responses:gpt-5.4-mini") # Azure Responses API
create_client("bedrock:openai.gpt-oss-120b") # AWS Bedrock
Compatibility & status
| Item | Value |
|---|---|
| Python | 3.13, 3.14 |
| OS | Linux, macOS, Windows |
| Maturity | Alpha (Development Status :: 3 - Alpha); SemVer, pre-1.0 minor bumps may break public API |
| Tests | 924 unit + 5 live integration |
| Type checker | ty (clean) |
Configuration
All settings use the KAOS_LLM_ prefix via KaosLLMSettings (ModuleSettings subclass). Each provider key has a legacy fallback (e.g. OPENAI_API_KEY) for backward compatibility.
| Variable | Default | Description |
|---|---|---|
KAOS_LLM_{OPENAI,ANTHROPIC,GOOGLE,XAI,GROQ,MISTRAL,OPENROUTER}_API_KEY |
— | Direct-provider API key (SecretStr) |
KAOS_LLM_OPENAI_BASE_URL |
https://api.openai.com |
Override for proxies / local models (per-provider variants exist) |
KAOS_LLM_AZURE_OPENAI_ENDPOINT |
— | Azure resource URL (e.g. https://my-resource.openai.azure.com/) |
KAOS_LLM_AZURE_OPENAI_API_KEY |
— | Azure resource subscription key (alternative to AAD) |
KAOS_LLM_AZURE_OPENAI_AD_TOKEN |
— | Static AAD bearer (use azure_ad_token_provider= for refresh) |
KAOS_LLM_AZURE_OPENAI_API_VERSION |
2024-12-01-preview |
Azure API version (bump to 2025-04-01-preview for newer Responses-API features) |
KAOS_LLM_BEDROCK_API_KEY |
— | AWS Bedrock bearer token; legacy fallback AWS_BEARER_TOKEN_BEDROCK |
KAOS_LLM_BEDROCK_BASE_URL |
https://bedrock-mantle.us-east-2.api.aws |
Bedrock endpoint (override for other regions) |
KAOS_LLM_DEFAULT_TIMEOUT |
120.0 |
Request timeout (seconds) |
KAOS_LLM_DEFAULT_MAX_RETRIES |
3 |
Max retry attempts |
KAOS_LLM_MAX_RESPONSE_BYTES |
33554432 |
32 MiB cap on non-streaming responses |
KAOS_LLM_STREAM_MAX_DURATION |
600.0 |
Wall-clock cap on a streaming response (seconds) |
KAOS_LLM_CACHE_ENABLED |
false |
Enable response caching |
KAOS_LLM_CACHE_PATH |
~/.cache/kaos/llm |
Cache directory |
Per-request overrides flow through KaosContext._config for MCP callers.
CLI
kaos-llm-client check [--provider openai,anthropic] [--json] # verify credentials
kaos-llm-client chat --model openai:gpt-5 --message "Hello!" [--system "..."] [--json]
kaos-llm-client profiles [--json] # list known model profiles
kaos-llm-client config [--json] # resolved settings (redacted)
All commands support --json with a consistent envelope: {"command": "...", ...}.
MCP Server
kaos-llm-serve # stdio (Claude Code / Desktop)
kaos-llm-serve --http --port 8000 # streamable HTTP
kaos-llm-serve --model openai:gpt-5 --http --debug # default model + debug logging
Exposes kaos-llm-chat, kaos-llm-json, and kaos-llm-embed MCP tools.
Security: the HTTP transport has no built-in authentication or rate limiting. The default
--host 127.0.0.1binds to loopback, which is the safe default. Do not bind to a non-loopback interface unless you put an authenticated reverse proxy (mTLS, OAuth, IP allowlist, etc.) in front of it — anyone who can reach the port can spend your configured LLM credits. The server emits a startup warning when--hostis not loopback. Seekaos_llm_client/serve.pymodule docstring for the full guidance.
Companion packages
Direct dependencies in the KAOS stack:
- kaos-core — runtime,
ModuleSettings,KaosContext, structured logging - kaos-mcp (optional via
[mcp]) — FastMCP bridge forkaos-llm-serve
Higher layers consume kaos-llm-client for inference: kaos-llm-core (typed programs), kaos-agents (runtime), kaos-citations (verification). Full module roster at docs.kelvin.legal/kaos-llm-client.
Development
uv sync --group dev
uv run ruff format kaos_llm_client/ tests/
uv run ruff check kaos_llm_client/ tests/
uv run ty check kaos_llm_client/ tests/
uv run pytest tests/unit/ -q
# live tier requires provider keys; see tests/integration/
uv run pytest tests/integration/ -q
Build from source
uv build
uv pip install dist/kaos_llm_client-*.whl
Contributing
Issues and pull requests are welcome. By contributing you certify the
Developer Certificate of Origin v1.1 —
sign every commit with git commit -s. Please open an issue before starting
on a non-trivial change so we can align on scope.
Security
For security issues, please do not file a public issue. Report privately via GitHub Private Vulnerability Reporting or email security@273ventures.com. See SECURITY.md for the full disclosure policy.
License
Apache License 2.0 — see LICENSE and NOTICE.
Copyright 2026 273 Ventures LLC. Built for kelvin.legal.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kaos_llm_client-0.1.0a2.tar.gz.
File metadata
- Download URL: kaos_llm_client-0.1.0a2.tar.gz
- Upload date:
- Size: 224.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81f28448d54fdf26d7e890d24d281d22937ae51bbd1e6fce24f06e3faa1e9d6b
|
|
| MD5 |
752dfd9f686e36350ce320411623a5a4
|
|
| BLAKE2b-256 |
30b6a564b40c0ee6eea6e9c444ce45f32a0d6b6a530e1e01ae1acbf547a98036
|
Provenance
The following attestation bundles were made for kaos_llm_client-0.1.0a2.tar.gz:
Publisher:
release.yml on 273v/kaos-llm-client
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kaos_llm_client-0.1.0a2.tar.gz -
Subject digest:
81f28448d54fdf26d7e890d24d281d22937ae51bbd1e6fce24f06e3faa1e9d6b - Sigstore transparency entry: 1462429770
- Sigstore integration time:
-
Permalink:
273v/kaos-llm-client@b8e5660e3a1481600b9fb2657a9a76677f6529a0 -
Branch / Tag:
refs/tags/v0.1.0a2 - Owner: https://github.com/273v
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b8e5660e3a1481600b9fb2657a9a76677f6529a0 -
Trigger Event:
push
-
Statement type:
File details
Details for the file kaos_llm_client-0.1.0a2-py3-none-any.whl.
File metadata
- Download URL: kaos_llm_client-0.1.0a2-py3-none-any.whl
- Upload date:
- Size: 149.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd6cd8cac47d4c37491306098f98c51a4a5a9b789ba1c332d4ecb5074b42716a
|
|
| MD5 |
2a033feeaa79ab8356d2db95398ed3af
|
|
| BLAKE2b-256 |
d812eb00ecbc9bb36781717d9208cc9121f03aece43d7cbe2d02dc18d7149a9d
|
Provenance
The following attestation bundles were made for kaos_llm_client-0.1.0a2-py3-none-any.whl:
Publisher:
release.yml on 273v/kaos-llm-client
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kaos_llm_client-0.1.0a2-py3-none-any.whl -
Subject digest:
bd6cd8cac47d4c37491306098f98c51a4a5a9b789ba1c332d4ecb5074b42716a - Sigstore transparency entry: 1462429813
- Sigstore integration time:
-
Permalink:
273v/kaos-llm-client@b8e5660e3a1481600b9fb2657a9a76677f6529a0 -
Branch / Tag:
refs/tags/v0.1.0a2 - Owner: https://github.com/273v
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b8e5660e3a1481600b9fb2657a9a76677f6529a0 -
Trigger Event:
push
-
Statement type: