Client and Tools for LLMs

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

llmskit

llmskit provides a unified Python interface for chat, embeddings, and reranking across multiple LLM providers.

The current codebase exposes:

Unified sync and async chat wrappers
OpenAI-style streaming and completion responses
Provider adapters for openai, gemini, and claude
Canonical multimodal message parts and tool definitions
OpenAI-compatible embeddings helpers
Generic reranker clients

Installation

pip install llmskit

Public API

from llmskit import (
    AsyncChatLLM,
    AsyncOpenAIEmbeddings,
    AsyncReranker,
    ChatLLM,
    OpenAIEmbeddings,
    Reranker,
)

Chat Quick Start

Synchronous chat

from llmskit import ChatLLM

chat = ChatLLM.from_openai(
    model="gpt-4o-mini",
    api_key="YOUR_API_KEY",
    base_url="https://api.openai.com/v1",  # replace with your OpenAI-compatible endpoint
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Introduce yourself in one sentence."},
]

response = chat.complete(messages=messages)
message = response["choices"][0]["message"]

print(message["content"])
print(message["reasoning_content"])
print(response["usage"])

ChatLLM is intended for blocking synchronous code paths and now uses native sync provider clients. In Jupyter notebooks, async web frameworks, or inside async def code, prefer AsyncChatLLM so you do not block the active event loop.

Asynchronous chat

import asyncio

from llmskit import AsyncChatLLM


async def main() -> None:
    chat = AsyncChatLLM.from_gemini(
        model="gemini-2.5-flash",
        api_key="YOUR_API_KEY",
    )

    response = await chat.complete(
        messages=[
            {"role": "system", "content": "Answer briefly."},
            {"role": "user", "content": "What is llmskit?"},
        ]
    )

    print(response["choices"][0]["message"]["content"])


asyncio.run(main())

If your runtime already has an event loop, such as Jupyter Notebook or FastAPI / Starlette request handlers, prefer AsyncChatLLM to keep that loop non-blocking.

Provider Factories

Use explicit factory methods when you already know the backend:

from llmskit import ChatLLM

openai_chat = ChatLLM.from_openai(
    model="gpt-4o-mini",
    api_key="YOUR_API_KEY",
    base_url="https://api.openai.com/v1",
)

gemini_chat = ChatLLM.from_gemini(
    model="gemini-2.5-flash",
    api_key="YOUR_API_KEY",
)

claude_chat = ChatLLM.from_claude(
    model="claude-sonnet-4-20250514",
    api_key="YOUR_API_KEY",
)

Or choose the provider dynamically:

from llmskit import ChatLLM

chat = ChatLLM.create(
    provider="openai",
    model="gpt-4o-mini",
    api_key="YOUR_API_KEY",
    base_url="https://api.openai.com/v1",
)

Supported provider names for create(...):

openai
gemini
claude

Deprecated aliases still exist in code, but new code should prefer:

from_openai(...) instead of from_gpt(...) or from_local(...)
from_claude(...) instead of from_anthropic(...)

Factory methods are for construction-time options such as base_url, client_logger, and retry_config. Request options such as temperature, max_tokens, or provider response_format belong on complete(...) / stream(...). Use result_format when you want llmskit itself to return the legacy compatibility object.

You can also register custom chat providers without editing llmskit.chat:

from typing import Any, AsyncIterator

from llmskit import AsyncChatLLM
from llmskit.clients import AsyncLLMClient
from llmskit.core import register_chat_provider
from llmskit.types import Message, ProviderEvent, ToolDefinition


class MyChatClient(AsyncLLMClient):
    provider = "my-provider"
    model = "demo-model"
    capabilities = {
        "tool_calling": False,
        "reasoning": False,
        "streaming": True,
        "vision": False,
        "audio_input": False,
        "audio_output": False,
        "document_input": False,
        "video_input": False,
        "native_multimodal_output": False,
    }

    async def events(
        self,
        messages: list[Message],
        *,
        tools: list[ToolDefinition] | None = None,
        **kwargs: Any,
    ) -> AsyncIterator[ProviderEvent]:
        del messages, tools, kwargs
        if False:  # pragma: no cover
            yield ProviderEvent()


register_chat_provider(name="my-provider", async_client_factory=MyChatClient, replace=True)
chat = AsyncChatLLM.create("my-provider", model="demo-model")

If you also want ChatLLM.create("my-provider", ...) support, register a native sync client with sync_client_factory=... as well.

If your custom provider needs per-model capability differences, declare provider_capability_defaults and model_capability_catalog on the client class. For OpenAI-compatible private models, you can also override the shared model capability snapshot via from_openai(..., capability_overrides={...}).

Response Formats

ChatLLM.complete(...) and AsyncChatLLM.complete(...) return an OpenAI-style response by default.

response = chat.complete(messages=messages)

print(response["object"])  # chat.completion
print(response["choices"][0]["message"]["content"])
print(response["choices"][0]["message"]["tool_calls"])
print(response["usage"])
print(response["provider_extensions"])

If you still need the old compatibility object, request result_format="legacy":

legacy_response = chat.complete(
    messages=messages,
    result_format="legacy",
)

print(legacy_response.content)
print(legacy_response.reasoning_content)
print(legacy_response.tool_calls)

Provider request formatting still uses response_format, for example:

response = chat.complete(
    messages=messages,
    response_format={"type": "json_object"},
)

Provider-native request options should go inside provider_options, for example:

chat.complete(
    messages=messages,
    provider_options={"reasoning_effort": "high"},  # OpenAI native
)

chat.complete(
    messages=messages,
    provider_options={"thinking": {"type": "enabled", "budget_tokens": 1024}},  # Claude native
)

chat.complete(
    messages=messages,
    provider_options={"candidate_count": 2},  # Gemini native
)

Keep shared llmskit options such as temperature, max_tokens, modalities, audio, and response_format at the top level. Unknown top-level provider kwargs now raise a validation error instead of being silently ignored, and provider_options cannot override llmskit-managed keys such as model, messages, or stream.

response_format="legacy" still works as a deprecated compatibility alias for older code, but new code should prefer result_format="legacy".

Streaming

stream(...) yields OpenAI-style chat completion chunks.

from llmskit import ChatLLM

chat = ChatLLM.from_openai(
    model="gpt-4o-mini",
    api_key="YOUR_API_KEY",
    base_url="https://api.openai.com/v1",
)

for chunk in chat.stream(
    messages=[{"role": "user", "content": "Count from 1 to 3."}]
):
    choice = chunk["choices"][0]
    delta = choice["delta"]

    if delta.get("role"):
        print("role:", delta["role"])
    if delta.get("content"):
        print(delta["content"], end="")
    if delta.get("reasoning_content"):
        print("\nreasoning:", delta["reasoning_content"])
    if delta.get("tool_calls"):
        print("\ntool_calls:", delta["tool_calls"])
    if choice.get("finish_reason"):
        print("\nfinish_reason:", choice["finish_reason"])

Tool Calling

Tool definitions use one canonical schema across providers:

tools = [
    {
        "name": "get_weather",
        "description": "Get the weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string"},
            },
            "required": ["city"],
        },
    }
]

Pass them to complete(...) or stream(...):

response = chat.complete(
    messages=[{"role": "user", "content": "What is the weather in Beijing?"}],
    tools=tools,
)

tool_calls = response["choices"][0]["message"]["tool_calls"]
print(tool_calls)

Returned tool calls are normalized to an OpenAI-style structure:

[
    {
        "id": "call_123",
        "type": "function",
        "function": {
            "name": "get_weather",
            "arguments": "{\"city\":\"Beijing\"}",
        },
    }
]

Multimodal Messages

Message content can be either a plain string or a list of structured content parts.

Supported canonical content part types:

text
image_url
input_audio
file
video_url

Vision example

from llmskit import ChatLLM

chat = ChatLLM.from_gemini(
    model="gemini-2.5-flash",
    api_key="YOUR_API_KEY",
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image."},
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://example.com/cat.png",
                    "format": "image/png",
                },
            },
        ],
    }
]

response = chat.complete(messages=messages)
print(response["choices"][0]["message"]["content"])

Provider Capability Overview

The table below describes default model-family capabilities for built-in providers. At runtime, the authoritative behavior is the model-level capability snapshot, not class-level static constants:

Provider	Tool calling	Reasoning	Vision	Audio input	Audio output	Document input	Video input
OpenAI-compatible	Yes	Yes	Yes	Yes	Yes	No	No
Claude	Yes	Yes	Yes	No	No	Yes	No
Gemini	Yes	Yes	Yes	Yes	Yes	Yes	Yes

Embeddings

OpenAIEmbeddings and AsyncOpenAIEmbeddings target OpenAI-compatible embedding endpoints.

Synchronous embeddings

from llmskit import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    base_url="https://api.openai.com/v1",
    model="text-embedding-3-small",
    api_key="YOUR_API_KEY",
    batch_size=16,
)

query_vector = embeddings.embed_query("What is llmskit?")
document_vectors = embeddings.embed_documents(
    [
        "llmskit wraps multiple chat providers.",
        "It also includes embeddings and reranking helpers.",
    ]
)

print(len(query_vector))
print(len(document_vectors))
print(embeddings.get_embedding_dimension())

Asynchronous embeddings

import asyncio

from llmskit import AsyncOpenAIEmbeddings


async def main() -> None:
    embeddings = AsyncOpenAIEmbeddings(
        base_url="https://api.openai.com/v1",
        model="text-embedding-3-small",
        api_key="YOUR_API_KEY",
    )

    vector = await embeddings.embed_query("hello")
    print(len(vector))


asyncio.run(main())

Embedding helpers include:

batching
retry with exponential backoff
max input length truncation
cached dimension detection

Reranker

Reranker and AsyncReranker call a rerank service with a /rerank endpoint.

Synchronous reranking

from llmskit import Reranker

reranker = Reranker(
    base_url="https://your-reranker-service",
    model="bge-reranker-v2-m3",
    api_key="YOUR_API_KEY",
)

result = reranker.rerank(
    query="python async http client",
    documents=[
        "httpx supports both sync and async clients",
        "Redis is an in-memory database",
        "Python generators can yield values lazily",
    ],
    top_n=2,
    threshold=0.0,
)

print(result.results)
print(result.usage)

Asynchronous reranking

import asyncio

from llmskit import AsyncReranker


async def main() -> None:
    reranker = AsyncReranker(
        base_url="https://your-reranker-service",
        model="bge-reranker-v2-m3",
        api_key="YOUR_API_KEY",
    )

    result = await reranker.rerank(
        query="python async http client",
        documents=[
            "httpx supports both sync and async clients",
            "Redis is an in-memory database",
        ],
        top_n=1,
    )
    print(result.results)


asyncio.run(main())

Notes

ChatLLM and AsyncChatLLM normalize provider responses into OpenAI-style chunks and completion payloads.
The default non-streaming response is the new OpenAI-style dictionary, not the legacy dataclass.
OpenAIEmbeddings works with OpenAI-compatible services, including self-hosted endpoints that implement the embeddings API.
Retries are built in for transient network and server-side failures.
For local development and CI, run python -m pytest -q from the repository root.
The examples above use placeholder model names and endpoints; replace them with the values supported by your provider.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.2.1

Apr 10, 2026

0.2.0

Apr 10, 2026

0.1.0

Apr 9, 2026

0.0.9

Jan 13, 2026

0.0.8

Jan 13, 2026

0.0.7

Jan 13, 2026

0.0.6

Jan 5, 2026

0.0.5

Dec 26, 2025

0.0.4

Dec 24, 2025

0.0.3

Dec 16, 2025

0.0.2

Dec 16, 2025

0.0.1

Dec 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmskit-0.2.1.tar.gz (78.5 kB view details)

Uploaded Apr 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmskit-0.2.1-py3-none-any.whl (67.7 kB view details)

Uploaded Apr 10, 2026 Python 3

File details

Details for the file llmskit-0.2.1.tar.gz.

File metadata

Download URL: llmskit-0.2.1.tar.gz
Upload date: Apr 10, 2026
Size: 78.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for llmskit-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`df9d816bb86106fb61448c7f60701a8fb1e6485938626eaf37a95a4a7fa3549d`
MD5	`7331684aecc97a8b5b56978be7724795`
BLAKE2b-256	`50935f73cfcc29de696ab5d359e464c7e21dd1dcac6c0d378932a1145c83e263`

See more details on using hashes here.

File details

Details for the file llmskit-0.2.1-py3-none-any.whl.

File metadata

Download URL: llmskit-0.2.1-py3-none-any.whl
Upload date: Apr 10, 2026
Size: 67.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for llmskit-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b4f8c457ab1bbbe52671bfaf461fdf3209df7a4b6469ea74ffad461a91e19801`
MD5	`841f8f739de1d440dcafe33887da9d05`
BLAKE2b-256	`875fc7680aaa21256d884fa805fe267ef8aba6d9213f4a981a64ee19d3dbc9df`

See more details on using hashes here.

llmskit 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

llmskit

Installation

Public API

Chat Quick Start

Synchronous chat

Asynchronous chat

Provider Factories

Response Formats

Streaming

Tool Calling

Multimodal Messages

Vision example

Other content part shapes

Provider Capability Overview

Embeddings

Synchronous embeddings

Asynchronous embeddings

Reranker

Synchronous reranking

Asynchronous reranking

Notes

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes