Unified Python client for OpenAI, Azure OpenAI, Vertex AI, Anthropic, Gemini, DeepSeek, OpenRouter, Cerebras, Fireworks, Together AI, LM Studio, Bedrock, LiteLLM, and ChatGPT.

These details have not been verified by PyPI

Project description

llmai

llmai is a Python library for working with OpenAI, Azure OpenAI, Vertex AI, Anthropic, Google Gemini, DeepSeek, OpenRouter, Cerebras, Fireworks, Together AI, LM Studio, Bedrock, LiteLLM, and ChatGPT through a shared set of message, tool, schema, and response primitives.

Today the repository includes adapters for:

ChatGPT
OpenAI
Azure OpenAI
Vertex AI
DeepSeek
OpenRouter
Cerebras
Fireworks
Together AI
LM Studio
Anthropic
Google Gemini
Amazon Bedrock
LiteLLM

Each provider client exposes the same core entrypoint:

generate(..., stream=False)

Why This Exists

Provider SDKs differ in how they represent messages, tool calls, structured output, and streaming events. llmai smooths those differences out so application code can stay closer to one mental model.

Installation

Install the project locally with uv:

uv sync

Or install it in editable mode with pip:

pip install -e .

Quick Start

from llmai import OpenAIClient, OpenAIClientConfig
from llmai.shared import UserMessage

client = OpenAIClient(
    config=OpenAIClientConfig(api_key="<your-openai-api-key>"),
)

result = client.generate(
    model="your-openai-model",
    messages=[
        UserMessage(content="Write a two-line poem about clean interfaces."),
    ],
)

print(result.content)
print(result.usage)
print(result.duration_seconds)

For text-only prompts, UserMessage(content="...") is the simplest form. SystemMessage(content="...") also takes a plain string. Use explicit content parts like TextContentPart only when you need mixed multimodal input or tighter control over user message structure.

If you want to swap providers, the overall call shape stays the same. In most cases you only need to change the client class, credentials, and model name.

Azure OpenAI

from llmai import AzureOpenAIClient, AzureOpenAIClientConfig
from llmai.shared import UserMessage


client = AzureOpenAIClient(
    config=AzureOpenAIClientConfig(
        api_key="<your-azure-openai-api-key>",
        endpoint="https://your-resource.openai.azure.com",
        api_version="2024-10-21",
    ),
)

result = client.generate(
    model="your-azure-deployment",
    messages=[
        UserMessage(content="Write a two-line poem about clean interfaces."),
    ],
)

print(result.content)

AzureOpenAIClient uses the official OpenAI SDK's Azure client and requires an explicit AzureOpenAIClientConfig. The config supports API-key auth or Entra token auth and accepts endpoint or base_url, api_version, and optional deployment. Azure is always routed through chat completions.

Vertex AI

from llmai import VertexAIClient, VertexAIClientConfig
from llmai.shared import UserMessage


client = VertexAIClient(
    config=VertexAIClientConfig(
        project="your-gcp-project",
        location="us-central1",
    ),
)

result = client.generate(
    model="gemini-2.5-flash",
    messages=[
        UserMessage(content="Write a two-line poem about clean interfaces."),
    ],
)

print(result.content)

VertexAIClient uses the google-genai Vertex AI path internally and requires an explicit VertexAIClientConfig. Use either api_key for Vertex express mode or project/location/credentials for standard Vertex auth; do not combine them. base_url remains optional.

ChatGPT

from llmai import ChatGPTClient, ChatGPTClientConfig
from llmai.shared import UserMessage


client = ChatGPTClient(
    config=ChatGPTClientConfig(access_token="<your-chatgpt-access-token>"),
)

result = client.generate(
    model="chatgpt-4o-latest",
    messages=[
        UserMessage(content="Write a two-line poem about clean interfaces."),
    ],
)

print(result.content)

ChatGPTClient targets ChatGPT's Codex backend at https://chatgpt.com/backend-api/codex and always uses the Responses API internally. Credentials and optional overrides are passed through ChatGPTClientConfig, which uses access_token. When you include SystemMessage entries, ChatGPT collects them in order and sends them through the Responses API instructions field; otherwise it falls back to instructions="Follow the prompt". The ChatGPT backend requires stream=True, so generate(stream=False) streams internally and returns the aggregated final response. It also does not support Responses temperature or max_output_tokens, so temperature and max_tokens are ignored for this client.

DeepSeek

from llmai import DeepSeekClient, DeepSeekClientConfig
from llmai.shared import JSONSchemaResponse, UserMessage


client = DeepSeekClient(
    config=DeepSeekClientConfig(api_key="<your-deepseek-api-key>"),
)

result = client.generate(
    model="deepseek-chat",
    messages=[
        UserMessage(content="Return a JSON object with one field named answer."),
    ],
    response_format=JSONSchemaResponse(
        name="final_answer",
        json_schema={
            "type": "object",
            "properties": {
                "answer": {"type": "string"},
            },
            "required": ["answer"],
        },
    ),
)

print(result.content)

DeepSeekClient uses the OpenAI SDK against DeepSeek's OpenAI-compatible API and requires an explicit DeepSeekClientConfig. For structured output, it always uses an internal function-tool schema because DeepSeek does not support response_format={"type":"json_schema"}. During streaming, the internal response tool is surfaced as incremental JSON content chunks, and the stream still ends with parsed JSON on the final completion chunk's content. If you need DeepSeek's server-side strict tool enforcement, point base_url at https://api.deepseek.com/beta.

OpenRouter

from llmai import OpenRouterClient, OpenRouterClientConfig
from llmai.shared import UserMessage


client = OpenRouterClient(
    config=OpenRouterClientConfig(api_key="<your-openrouter-api-key>"),
)

result = client.generate(
    model="openai/gpt-5.4-mini",
    messages=[
        UserMessage(content="Write a two-line poem about clean interfaces."),
    ],
)

print(result.content)

OpenRouterClient uses the OpenAI SDK against OpenRouter's OpenAI-compatible chat-completions API. The default base URL is https://openrouter.ai/api/v1.

Fireworks

from llmai import FireworksClient, FireworksClientConfig
from llmai.shared import UserMessage


client = FireworksClient(
    config=FireworksClientConfig(api_key="<your-fireworks-api-key>"),
)

result = client.generate(
    model="accounts/fireworks/models/llama-v3p1-8b-instruct",
    messages=[
        UserMessage(content="Write a two-line poem about clean interfaces."),
    ],
)

print(result.content)

FireworksClient uses the OpenAI SDK against Fireworks' OpenAI-compatible chat-completions API. The default base URL is https://api.fireworks.ai/inference/v1.

Together AI

from llmai import TogetherAIClient, TogetherAIClientConfig
from llmai.shared import UserMessage


client = TogetherAIClient(
    config=TogetherAIClientConfig(api_key="<your-together-api-key>"),
)

result = client.generate(
    model="openai/gpt-oss-20b",
    messages=[
        UserMessage(content="Write a two-line poem about clean interfaces."),
    ],
)

print(result.content)

TogetherAIClient uses the OpenAI SDK against Together AI's OpenAI-compatible chat-completions API. The default base URL is https://api.together.ai/v1. Together AI does not support the OpenAI Responses API, so this adapter always uses chat completions.

LM Studio

from llmai import LMStudioClient, LMStudioClientConfig
from llmai.shared import UserMessage


client = LMStudioClient(
    config=LMStudioClientConfig(
        base_url="http://localhost:1234",
    ),
)

result = client.generate(
    model="openai/gpt-oss-20b",
    messages=[
        UserMessage(content="Write a two-line poem about clean interfaces."),
    ],
)

print(result.content)

LMStudioClient uses the OpenAI SDK against LM Studio's OpenAI-compatible chat-completions endpoint. The default base URL is http://localhost:1234/v1, custom base URLs have /v1 appended automatically when omitted, and api_key is optional. Schemas are reduced to LM Studio's grammar-friendly core fields, so unsupported regex pattern constraints are removed before sending.

Cerebras

from llmai import CerebrasClient, CerebrasClientConfig
from llmai.shared import UserMessage


client = CerebrasClient(
    config=CerebrasClientConfig(api_key="<your-cerebras-api-key>"),
)

result = client.generate(
    model="llama-4-scout-17b-16e-instruct",
    messages=[
        UserMessage(content="Write a two-line poem about clean interfaces."),
    ],
)

print(result.content)

CerebrasClient uses the OpenAI SDK against Cerebras' OpenAI-compatible API. The default base URL is https://api.cerebras.ai/v1.

Amazon Bedrock

from llmai import BedrockClient, BedrockClientConfig
from llmai.shared import UserMessage


client = BedrockClient(
    config=BedrockClientConfig(
        region="us-east-1",
        aws_access_key_id="<your-aws-access-key-id>",
        aws_secret_access_key="<your-aws-secret-access-key>",
    ),
)

# Or use Bedrock API-key auth:
# client = BedrockClient(
#     config=BedrockClientConfig(region="us-east-1", api_key="<your-bedrock-api-key>")
# )

result = client.generate(
    model="us.anthropic.claude-3-5-haiku-20241022-v1:0",
    messages=[
        UserMessage(content="Say hello."),
    ],
)

print(result.content)

LiteLLM

from llmai import LiteLLMClient, LiteLLMClientConfig
from llmai.shared import UserMessage


client = LiteLLMClient(
    config=LiteLLMClientConfig(
        api_key="litellm-proxy-key",
        base_url="https://litellm.example/v1",
    )
)

result = client.generate(
    model="gpt-5.4-mini",
    messages=[
        UserMessage(content="Write a two-line poem about clean interfaces."),
    ],
)

print(result.content)

LiteLLMClient uses the OpenAI Python client against an OpenAI-compatible LiteLLM proxy. Set LiteLLMClientConfig(api_type=OpenAIApiType.RESPONSES) to call the proxy's Responses API instead of chat completions. Pass the proxy key and URL through api_key and base_url; config-level extra_kwargs and per-call generate(..., extra_body={...}) are forwarded as request extra_body.

Structured Output

from pydantic import BaseModel

from llmai import GoogleClient, GoogleClientConfig
from llmai.shared import JSONSchemaResponse, UserMessage


class Summary(BaseModel):
    title: str
    bullets: list[str]


client = GoogleClient(config=GoogleClientConfig(api_key="<your-google-api-key>"))

result = client.generate(
    model="your-google-model",
    messages=[
        UserMessage(content="Summarize retrieval-augmented generation in simple terms."),
    ],
    response_format=JSONSchemaResponse(json_schema=Summary),
)

print(result.content)

Use JSONSchemaResponse, JSONObjectResponse, or TextResponse to request different response shapes.

Multimodal Content

from llmai import GoogleClient, GoogleClientConfig
from llmai.shared import ImageContentPart, TextContentPart, UserMessage


client = GoogleClient(config=GoogleClientConfig(api_key="<your-google-api-key>"))

result = client.generate(
    model="your-google-model",
    messages=[
        UserMessage(
            content=[
                TextContentPart(text="Describe this image."),
                ImageContentPart(url="https://example.com/cat.png"),
            ]
        ),
    ],
)

print(result.content)
print(result.thinking)

Use explicit content parts when you need multimodal inputs or want to mix text with images in one message. Normal completion content is surfaced as list[TextContentPart | ImageContentPart] when the provider returns message content, including text-only replies. Reasoning is exposed on ResponseContent.thinking as list[str] when the provider returns one or more thinking blocks, and the same value is also available on the final AssistantMessage.

Tool Calling

from pydantic import BaseModel

from llmai import OpenAIClient, OpenAIClientConfig
from llmai.shared import Tool, ToolResponseMessage, UserMessage


class WeatherArgs(BaseModel):
    city: str


weather_tool = Tool(
    name="get_weather",
    description="Look up the weather for a city.",
    schema=WeatherArgs,
)

client = OpenAIClient(config=OpenAIClientConfig(api_key="<your-openai-api-key>"))

first = client.generate(
    model="your-openai-model",
    messages=[
        UserMessage(content="What is the weather in Kathmandu?"),
    ],
    tools=[weather_tool],
    tool_choice={"tools": ["get_weather"]},
)

for tool_call in first.tool_calls:
    if tool_call.name != "get_weather":
        continue

    follow_up = client.generate(
        model="your-openai-model",
        messages=[
            *first.messages,
            ToolResponseMessage(
                id=tool_call.id,
                content=["It is sunny in Kathmandu."],
            ),
        ],
        tools=[weather_tool],
    )
    print(follow_up.content)

llmai returns tool calls in first.tool_calls and leaves execution to the caller.

Hosted Web Search

llmai also supports a provider-hosted web search tool that is not a function tool:

from llmai import OpenAIApiType, OpenAIClient, OpenAIClientConfig
from llmai.shared import UserMessage, WebSearchTool

client = OpenAIClient(
    config=OpenAIClientConfig(
        api_key="<your-openai-api-key>",
        api_type=OpenAIApiType.RESPONSES,
    )
)

result = client.generate(
    model="your-openai-model",
    messages=[
        UserMessage(content="What was a positive news story from today? Cite sources."),
    ],
    tools=[WebSearchTool()],
)

print(result.content)
print(result.thinking)

You can also target it explicitly in tool_choice:

tool_choice = {
    "mode": "required",
    "tools": ["web_search"],
}

Current llmai behavior:

OpenAI Responses: attaches built-in web_search
Azure OpenAI: follows the same OpenAI adapter surface; service support depends on your Azure API version and deployment
Vertex AI: attaches google_search
ChatGPT/Codex: attaches built-in web_search
Anthropic: attaches Anthropic's hosted web-search tool
Google Gemini: attaches google_search
OpenAI Chat Completions: ignores hosted web_search
DeepSeek: ignores hosted web_search
Amazon Bedrock: ignores hosted web_search

web_search can be mixed with normal function tools in the same request.

If you use OpenAIClient with api_type=OpenAIApiType.RESPONSES, OpenAIClientConfig(provide_system_message_as_instructions=True) lifts all SystemMessage values into the top-level Responses API instructions field. The default is False, which keeps system messages inline in input.

Streaming

from llmai import AnthropicClient, AnthropicClientConfig
from llmai.shared import UserMessage

client = AnthropicClient(
    config=AnthropicClientConfig(api_key="<your-anthropic-api-key>"),
)

for chunk in client.generate(
    model="your-anthropic-model",
    messages=[
        UserMessage(content="Explain recursion in one paragraph."),
    ],
    stream=True,
):
    if chunk.type == "content":
        print(chunk.chunk, end="")
    elif chunk.type == "completion":
        print("\nDone:", chunk.usage)

generate(..., stream=True) yields marker chunks with type="event" and event="start" / event="end" around each content, thinking, and tool section. If a provider returns multiple reasoning blocks, each block gets its own thinking start/end pair. The final chunk has type="completion" and includes top-level content, thinking, usage, and accumulated messages.

Package Layout

llmai/openai: OpenAI adapter
llmai/azure: Azure OpenAI adapter
llmai/vertex: Vertex AI adapter
llmai/deepseek: DeepSeek adapter
llmai/openrouter: OpenRouter adapter
llmai/cerebras: Cerebras adapter
llmai/fireworks: Fireworks adapter
llmai/anthropic: Anthropic adapter
llmai/google: Google Gemini adapter
llmai/bedrock: Amazon Bedrock adapter
llmai/litellm: LiteLLM adapter
llmai/shared: common message, tool, schema, and response models

Core Types

The shared layer includes the main primitives you will use across providers:

UserMessage, SystemMessage, AssistantMessage
TextContentPart, ImageContentPart
Tool, WebSearchTool, ToolResponseMessage
JSONSchemaResponse, JSONObjectResponse, TextResponse
ResponseContent, ResponseStreamChunk, ResponseStreamContentChunk, ResponseStreamThinkingChunk, ResponseStreamToolChunk, ResponseStreamToolCompleteChunk, ResponseStreamCompletionChunk
ResponseUsage

UserMessage.content accepts either a plain string or explicit content parts. SystemMessage.content is always a plain string.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.5

May 19, 2026

0.2.4

May 12, 2026

0.2.3

May 11, 2026

0.2.2

Apr 27, 2026

0.2.1

Apr 24, 2026

0.2.0

Apr 24, 2026

0.1.9

Apr 23, 2026

0.1.8

Apr 23, 2026

0.1.7

Apr 23, 2026

0.1.6

Apr 23, 2026

0.1.5

Apr 23, 2026

0.1.4

Apr 23, 2026

0.1.3

Apr 22, 2026

0.1.2

Apr 19, 2026

0.1.1

Apr 19, 2026

0.1.0

Apr 19, 2026

0.0.1

Feb 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmai-0.2.5.tar.gz (54.9 kB view details)

Uploaded May 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmai-0.2.5-py3-none-any.whl (71.1 kB view details)

Uploaded May 19, 2026 Python 3

File details

Details for the file llmai-0.2.5.tar.gz.

File metadata

Download URL: llmai-0.2.5.tar.gz
Upload date: May 19, 2026
Size: 54.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llmai-0.2.5.tar.gz
Algorithm	Hash digest
SHA256	`f24d4e9b6f02d0a92aa45cd715d870c648b1eaac1cfe9648e947b98bf7ffea9b`
MD5	`3a2a1492bae612afa04f8335cdb1a953`
BLAKE2b-256	`9bada7e4b65d5660d906b9298ca06f0c618b06bacbdd89ada68b65ffb33740cd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmai-0.2.5.tar.gz:

Publisher: publish.yml on presenton/llmai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llmai-0.2.5.tar.gz
- Subject digest: f24d4e9b6f02d0a92aa45cd715d870c648b1eaac1cfe9648e947b98bf7ffea9b
- Sigstore transparency entry: 1571633930
- Sigstore integration time: May 19, 2026
Source repository:
- Permalink: presenton/llmai@975f6712006921a0d5cf3450ad114e6cb874c055
- Branch / Tag: refs/tags/v0.2.5
- Owner: https://github.com/presenton
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@975f6712006921a0d5cf3450ad114e6cb874c055
- Trigger Event: push

File details

Details for the file llmai-0.2.5-py3-none-any.whl.

File metadata

Download URL: llmai-0.2.5-py3-none-any.whl
Upload date: May 19, 2026
Size: 71.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llmai-0.2.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`625032cde871f5f3518c57d8083137a6455a12c4f1ad3aaa0e5e5d9b46ff51ec`
MD5	`4ee517ecffbf9c9fd46f7f1055a874e3`
BLAKE2b-256	`5181789e04023b7d0dcfa43cc110fc33f1a09b565fa21187482c3004a47bf0aa`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmai-0.2.5-py3-none-any.whl:

Publisher: publish.yml on presenton/llmai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llmai-0.2.5-py3-none-any.whl
- Subject digest: 625032cde871f5f3518c57d8083137a6455a12c4f1ad3aaa0e5e5d9b46ff51ec
- Sigstore transparency entry: 1571633963
- Sigstore integration time: May 19, 2026
Source repository:
- Permalink: presenton/llmai@975f6712006921a0d5cf3450ad114e6cb874c055
- Branch / Tag: refs/tags/v0.2.5
- Owner: https://github.com/presenton
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@975f6712006921a0d5cf3450ad114e6cb874c055
- Trigger Event: push

llmai 0.2.5

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

llmai

Why This Exists

Installation

Quick Start

Azure OpenAI

Vertex AI

ChatGPT

DeepSeek

OpenRouter

Fireworks

Together AI

LM Studio

Cerebras

Amazon Bedrock

LiteLLM

Structured Output

Multimodal Content

Tool Calling

Hosted Web Search

Streaming

Package Layout

Core Types

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance