Skip to main content

Thin, provider-native LLM client for direct model calls within the KAOS ecosystem

Project description

kaos-llm-client

PyPI - Version Python License

Thin, provider-native LLM client for the Kelvin Agentic OS — direct model calls across OpenAI, Anthropic, Google, xAI, Groq, Mistral, OpenRouter, Azure OpenAI (api-key + AAD/Entra), and AWS Bedrock (OpenAI-compatible Responses API), with one interface.

Install

uv add "kaos-llm-client>=0.1.0"
# or
pip install "kaos-llm-client>=0.1.0"

# Azure OpenAI with Microsoft Entra ID / DefaultAzureCredential
# (api-key auth works without this extra — only needed for AAD).
uv add 'kaos-llm-client[azure]>=0.1.0'

# MCP server runtime (pulls in kaos-mcp)
uv add 'kaos-llm-client[mcp]>=0.1.0'

Set at least one provider API key (KAOS_LLM_OPENAI_API_KEY, KAOS_LLM_ANTHROPIC_API_KEY, KAOS_LLM_GOOGLE_API_KEY, …). Standard names (OPENAI_API_KEY, etc.) are accepted as fallbacks. For Azure with AAD, see the Quick start below.

Features

  • Direct providers — OpenAI, Anthropic, Google, xAI, Groq, Mistral, OpenRouter, plus a generic OpenAI-compatible client (VLLM, Ollama, LiteLLM, custom endpoints)
  • Cloud-hosted gatewaysAzure OpenAI (chat completions + Responses API; api-key OR Microsoft Entra ID via DefaultAzureCredential) and AWS Bedrock (OpenAI-compatible Responses API on bedrock-mantle.<region>.api.aws)
  • Multimodal — images (URL, path, bytes), audio input, document input (PDF, text)
  • Streaming, tools, structured output — SSE StreamAccumulator; ToolDefinition / ToolChoice; json() and pydantic() with native/tool/prompted modes and validation retries
  • Embeddingsembed() / embed_async() for embedding-capable providers
  • Composition wrappersFallbackClient, ConcurrencyLimitedClient, InstrumentedClient
  • Response caching — pluggable CacheBackend with BLAKE2b-keyed FileCache
  • Profile-driven behaviorModelProfile encodes provider/model differences (no if provider == branches)
  • Lifecycle hooksRequestHooks(on_request, on_response, on_error, on_retry) for observability
  • Per-call observability — every successful call emits one LLM call complete structured info-log with provider, model, request_id, token counts, and estimated_usd cost
  • CLI + MCPkaos-llm-client CLI with --json output and kaos-llm-serve MCP server

Quick start

from kaos_llm_client import create_client

# Direct OpenAI (or Anthropic, Google, xAI, Groq, Mistral, OpenRouter)
client = create_client("openai:gpt-5.4-mini")
response = client.chat([{"role": "user", "content": "Hello!"}])
print(response.text)
# logs: INFO LLM call complete provider=openai model=gpt-5.4-mini request_id=... estimated_usd=...

Azure OpenAI with Microsoft Entra ID (AAD)

Install the [azure] extra first: uv add 'kaos-llm-client[azure]'. This pulls in Microsoft's azure-identity SDK (~16 MB transitive, mostly cryptography). Without the extra, api-key auth still works on every Azure endpoint — only AAD needs azure-identity.

DefaultAzureCredential gives you managed-identity / az login / service-principal auth without storing static keys. The Responses-API client (azure-responses:) is the recommended path for gpt-5.4+ deployments where tool calling is required.

from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from kaos_llm_client import create_client

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default",
)
client = create_client(
    "azure-responses:gpt-5.4-mini",
    azure_ad_token_provider=token_provider,
)
response = client.chat([{"role": "user", "content": "Hello!"}])

azure-identity ships 22 credential classes — ManagedIdentityCredential, WorkloadIdentityCredential, ClientSecretCredential, CertificateCredential, etc. Any of them works as the first argument to get_bearer_token_provider. The async variants live in azure.identity.aio and are awaited automatically by the kaos-llm-client provider.

AAD requires a custom-subdomain endpoint (https://<resource>.openai.azure.com/); regional endpoints accept api-key only. Both forms work for azure: (chat completions) and azure-responses: (Responses API).

AWS Bedrock (OpenAI-compatible Responses API)

import os
from kaos_llm_client import create_client

# Bearer token from `aws bedrock create-bearer-token` or your AWS auth flow
os.environ["KAOS_LLM_BEDROCK_API_KEY"] = "..."
client = create_client("bedrock:openai.gpt-oss-120b")
response = client.chat([{"role": "user", "content": "Hello!"}])

Providers

Direct-API clients:

Prefix Client Models Auth
openai: OpenAIClient GPT-5.5/5.4/5/4.1, o1/o3/o4 reasoning KAOS_LLM_OPENAI_API_KEY
anthropic: AnthropicClient Claude 4.7 Opus, 4.6 Sonnet, 4.5 Haiku, 3.5/3.7 KAOS_LLM_ANTHROPIC_API_KEY
google: GoogleClient Gemini 2.5/3.x Pro/Flash KAOS_LLM_GOOGLE_API_KEY
xai: XAIClient Grok-3, Grok-4 KAOS_LLM_XAI_API_KEY
groq: GroqClient LLaMA, Mixtral (OpenAI-compat) KAOS_LLM_GROQ_API_KEY
mistral: MistralClient Mistral, Mixtral KAOS_LLM_MISTRAL_API_KEY
openrouter: OpenRouterClient Any model via OpenRouter KAOS_LLM_OPENROUTER_API_KEY
openai-compatible: OpenAICompatibleClient VLLM, Ollama, LiteLLM, custom varies (base_url=...)

Cloud-hosted gateways:

Prefix Client Notes
azure: / azure-openai: AzureOpenAIClient (chat completions) Legacy path; works for any deployment
azure-responses: / azure-foundry: AzureOpenAIResponsesClient Recommended for gpt-5.4+ — chat-completions tool calling with reasoning: none is unsupported by Azure on those models
bedrock: BedrockClient OpenAI-compatible Responses API on bedrock-mantle.<region>.api.aws

Azure auth is api-key (works on regional + custom-subdomain endpoints) or AAD/Entra (Authorization: Bearer <token> — custom-subdomain endpoint required). Use azure_ad_token=... for a static bearer or azure_ad_token_provider=... for DefaultAzureCredential / managed identity / az login flows. See the Quick start for the canonical Entra ID example.

Model strings use provider:model format. If no prefix is given, the provider is inferred from the model name:

create_client("openai:gpt-5.4-mini")          # explicit provider
create_client("claude-sonnet-4-6")            # inferred: anthropic
create_client("gemini-2.5-pro")               # inferred: google
create_client("grok-3")                       # inferred: xai
create_client("azure-responses:gpt-5.4-mini") # Azure Responses API
create_client("bedrock:openai.gpt-oss-120b")  # AWS Bedrock

Compatibility & status

Item Value
Python 3.13, 3.14
OS Linux, macOS, Windows
Maturity 0.1.0 GA; SemVer, pre-1.0 minor bumps may break public API
Tests 924 unit + 5 live integration
Type checker ty (clean)

Configuration

All settings use the KAOS_LLM_ prefix via KaosLLMSettings (ModuleSettings subclass). Each provider key has a legacy fallback (e.g. OPENAI_API_KEY) for backward compatibility.

Variable Default Description
KAOS_LLM_{OPENAI,ANTHROPIC,GOOGLE,XAI,GROQ,MISTRAL,OPENROUTER}_API_KEY Direct-provider API key (SecretStr)
KAOS_LLM_OPENAI_BASE_URL https://api.openai.com Override for proxies / local models (per-provider variants exist)
KAOS_LLM_AZURE_OPENAI_ENDPOINT Azure resource URL (e.g. https://my-resource.openai.azure.com/)
KAOS_LLM_AZURE_OPENAI_API_KEY Azure resource subscription key (alternative to AAD)
KAOS_LLM_AZURE_OPENAI_AD_TOKEN Static AAD bearer (use azure_ad_token_provider= for refresh)
KAOS_LLM_AZURE_OPENAI_API_VERSION 2024-12-01-preview Azure API version (bump to 2025-04-01-preview for newer Responses-API features)
KAOS_LLM_BEDROCK_API_KEY AWS Bedrock bearer token; legacy fallback AWS_BEARER_TOKEN_BEDROCK
KAOS_LLM_BEDROCK_BASE_URL https://bedrock-mantle.us-east-2.api.aws Bedrock endpoint (override for other regions)
KAOS_LLM_DEFAULT_TIMEOUT 120.0 Request timeout (seconds)
KAOS_LLM_DEFAULT_MAX_RETRIES 3 Max retry attempts
KAOS_LLM_MAX_RESPONSE_BYTES 33554432 32 MiB cap on non-streaming responses
KAOS_LLM_STREAM_MAX_DURATION 600.0 Wall-clock cap on a streaming response (seconds)
KAOS_LLM_CACHE_ENABLED false Enable response caching
KAOS_LLM_CACHE_PATH ~/.cache/kaos/llm Cache directory

Per-request overrides flow through KaosContext._config for MCP callers.

CLI

kaos-llm-client check [--provider openai,anthropic] [--json]   # verify credentials
kaos-llm-client chat --model openai:gpt-5 --message "Hello!" [--system "..."] [--json]
kaos-llm-client profiles [--json]                              # list known model profiles
kaos-llm-client config [--json]                                # resolved settings (redacted)

All commands support --json with a consistent envelope: {"command": "...", ...}.

MCP Server

kaos-llm-serve                                                # stdio (Claude Code / Desktop)
kaos-llm-serve --http --port 8000                             # streamable HTTP
kaos-llm-serve --model openai:gpt-5 --http --debug            # default model + debug logging

Exposes 7 MCP tools: kaos-llm-chat, kaos-llm-json, kaos-llm-embed, kaos-llm-tools, kaos-llm-pydantic, kaos-llm-provider-check, and kaos-llm-cost-estimate. Pinned in tests/unit/test_tools.py:541-592.

Security: the HTTP transport has no built-in authentication or rate limiting. The default --host 127.0.0.1 binds to loopback, which is the safe default. Do not bind to a non-loopback interface unless you put an authenticated reverse proxy (mTLS, OAuth, IP allowlist, etc.) in front of it — anyone who can reach the port can spend your configured LLM credits. The server emits a startup warning when --host is not loopback. See kaos_llm_client/serve.py module docstring for the full guidance.

Documentation

Per-package reference: see the in-tree docstrings and the CHANGELOG.md.

Cross-cutting KAOS guides (agentic patterns, persona presets, settings policy, citations, MCP data flow, migration to 0.1.0 GA) live in kaos-modules/docs/guides/.

Companion packages

Direct dependencies in the KAOS stack:

  • kaos-core — runtime, ModuleSettings, KaosContext, structured logging
  • kaos-mcp (optional via [mcp]) — FastMCP bridge for kaos-llm-serve

Higher layers consume kaos-llm-client for inference: kaos-llm-core (typed programs), kaos-agents (runtime), kaos-citations (verification). Full module roster at docs.kelvin.legal/kaos-llm-client.

Development

uv sync --group dev
uv run ruff format kaos_llm_client/ tests/
uv run ruff check kaos_llm_client/ tests/
uv run ty check kaos_llm_client/ tests/
uv run pytest tests/unit/ -q
# live tier requires provider keys; see tests/integration/
uv run pytest tests/integration/ -q

Build from source

uv build
uv pip install dist/kaos_llm_client-*.whl

Contributing

Issues and pull requests are welcome. See CONTRIBUTING.md for setup, quality gates, pull request expectations, and engineering standards. By contributing you agree to follow the project conduct expectations and certify the Developer Certificate of Origin v1.1 — sign every commit with git commit -s. Please open an issue before starting on a non-trivial change so we can align on scope.

Security

For security issues, please do not file a public issue. Report privately via GitHub Private Vulnerability Reporting or email security@273ventures.com. See SECURITY.md for the full disclosure policy.

License

Apache License 2.0 — see LICENSE and NOTICE.

Copyright 2026 273 Ventures LLC. Built for kelvin.legal.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaos_llm_client-0.1.9.tar.gz (274.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kaos_llm_client-0.1.9-py3-none-any.whl (162.7 kB view details)

Uploaded Python 3

File details

Details for the file kaos_llm_client-0.1.9.tar.gz.

File metadata

  • Download URL: kaos_llm_client-0.1.9.tar.gz
  • Upload date:
  • Size: 274.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for kaos_llm_client-0.1.9.tar.gz
Algorithm Hash digest
SHA256 e51237aa418ada7c69f5bf163023ad45d25e5c21adf27b2f0dc96a3a6c3b6ac2
MD5 ec73989c435e92c9c83a998158577d5f
BLAKE2b-256 dd9900f3378d59b7a31c2497024bef811735269382ee247322484fbb099ac906

See more details on using hashes here.

Provenance

The following attestation bundles were made for kaos_llm_client-0.1.9.tar.gz:

Publisher: release.yml on 273v/kaos-llm-client

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kaos_llm_client-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: kaos_llm_client-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 162.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for kaos_llm_client-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 9b39624ca8301d666b9d64274acdb62ed381502ad36374a7be6ee04072359311
MD5 1fd91b39a0ef3e7343f1cbf4f2c520bf
BLAKE2b-256 1af53d49ba40a5aa6134f8fbcb4027f023f6893dd33482c6eb694bb8d12ba08a

See more details on using hashes here.

Provenance

The following attestation bundles were made for kaos_llm_client-0.1.9-py3-none-any.whl:

Publisher: release.yml on 273v/kaos-llm-client

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page