Skip to main content

Client and Tools for LLMs

Project description

llmskit

llmskit provides a unified Python interface for chat, embeddings, and reranking across multiple LLM providers.

The current codebase exposes:

  • Unified sync and async chat wrappers
  • OpenAI-style streaming and completion responses
  • Provider adapters for openai, gemini, and claude
  • Canonical multimodal message parts and tool definitions
  • OpenAI-compatible embeddings helpers
  • Generic reranker clients

Installation

pip install llmskit

Public API

from llmskit import (
    AsyncChatLLM,
    AsyncOpenAIEmbeddings,
    AsyncReranker,
    ChatLLM,
    OpenAIEmbeddings,
    Reranker,
)

Chat Quick Start

Synchronous chat

from llmskit import ChatLLM

chat = ChatLLM.from_openai(
    model="gpt-4o-mini",
    api_key="YOUR_API_KEY",
    base_url="https://api.openai.com/v1",  # replace with your OpenAI-compatible endpoint
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Introduce yourself in one sentence."},
]

response = chat.complete(messages=messages)
message = response["choices"][0]["message"]

print(message["content"])
print(message["reasoning_content"])
print(response["usage"])

ChatLLM is intended for blocking synchronous code paths and now uses native sync provider clients. In Jupyter notebooks, async web frameworks, or inside async def code, prefer AsyncChatLLM so you do not block the active event loop.

Asynchronous chat

import asyncio

from llmskit import AsyncChatLLM


async def main() -> None:
    chat = AsyncChatLLM.from_gemini(
        model="gemini-2.5-flash",
        api_key="YOUR_API_KEY",
    )

    response = await chat.complete(
        messages=[
            {"role": "system", "content": "Answer briefly."},
            {"role": "user", "content": "What is llmskit?"},
        ]
    )

    print(response["choices"][0]["message"]["content"])


asyncio.run(main())

If your runtime already has an event loop, such as Jupyter Notebook or FastAPI / Starlette request handlers, prefer AsyncChatLLM to keep that loop non-blocking.

Provider Factories

Use explicit factory methods when you already know the backend:

from llmskit import ChatLLM

openai_chat = ChatLLM.from_openai(
    model="gpt-4o-mini",
    api_key="YOUR_API_KEY",
    base_url="https://api.openai.com/v1",
)

gemini_chat = ChatLLM.from_gemini(
    model="gemini-2.5-flash",
    api_key="YOUR_API_KEY",
)

claude_chat = ChatLLM.from_claude(
    model="claude-sonnet-4-20250514",
    api_key="YOUR_API_KEY",
)

Or choose the provider dynamically:

from llmskit import ChatLLM

chat = ChatLLM.create(
    provider="openai",
    model="gpt-4o-mini",
    api_key="YOUR_API_KEY",
    base_url="https://api.openai.com/v1",
)

Supported provider names for create(...):

  • openai
  • gemini
  • claude

Deprecated aliases still exist in code, but new code should prefer:

  • from_openai(...) instead of from_gpt(...) or from_local(...)
  • from_claude(...) instead of from_anthropic(...)

Factory methods are for construction-time options such as base_url, client_logger, and retry_config. Request options such as temperature, max_tokens, or provider response_format belong on complete(...) / stream(...). Use result_format when you want llmskit itself to return the legacy compatibility object.

You can also register custom chat providers without editing llmskit.chat:

from typing import Any, AsyncIterator

from llmskit import AsyncChatLLM
from llmskit.clients import AsyncLLMClient
from llmskit.core import register_chat_provider
from llmskit.types import Message, ProviderEvent, ToolDefinition


class MyChatClient(AsyncLLMClient):
    provider = "my-provider"
    model = "demo-model"
    capabilities = {
        "tool_calling": False,
        "reasoning": False,
        "streaming": True,
        "vision": False,
        "audio_input": False,
        "audio_output": False,
        "document_input": False,
        "video_input": False,
        "native_multimodal_output": False,
    }

    async def events(
        self,
        messages: list[Message],
        *,
        tools: list[ToolDefinition] | None = None,
        **kwargs: Any,
    ) -> AsyncIterator[ProviderEvent]:
        del messages, tools, kwargs
        if False:  # pragma: no cover
            yield ProviderEvent()


register_chat_provider(name="my-provider", async_client_factory=MyChatClient, replace=True)
chat = AsyncChatLLM.create("my-provider", model="demo-model")

If you also want ChatLLM.create("my-provider", ...) support, register a native sync client with sync_client_factory=... as well.

If your custom provider needs per-model capability differences, declare provider_capability_defaults and model_capability_catalog on the client class. For OpenAI-compatible private models, you can also override the shared model capability snapshot via from_openai(..., capability_overrides={...}).

Response Formats

ChatLLM.complete(...) and AsyncChatLLM.complete(...) return an OpenAI-style response by default.

response = chat.complete(messages=messages)

print(response["object"])  # chat.completion
print(response["choices"][0]["message"]["content"])
print(response["choices"][0]["message"]["tool_calls"])
print(response["usage"])
print(response["provider_extensions"])

If you still need the old compatibility object, request result_format="legacy":

legacy_response = chat.complete(
    messages=messages,
    result_format="legacy",
)

print(legacy_response.content)
print(legacy_response.reasoning_content)
print(legacy_response.tool_calls)

Provider request formatting still uses response_format, for example:

response = chat.complete(
    messages=messages,
    response_format={"type": "json_object"},
)

Provider-native request options should go inside provider_options, for example:

chat.complete(
    messages=messages,
    provider_options={"reasoning_effort": "high"},  # OpenAI native
)

chat.complete(
    messages=messages,
    provider_options={"thinking": {"type": "enabled", "budget_tokens": 1024}},  # Claude native
)

chat.complete(
    messages=messages,
    provider_options={"candidate_count": 2},  # Gemini native
)

Keep shared llmskit options such as temperature, max_tokens, modalities, audio, and response_format at the top level. Unknown top-level provider kwargs now raise a validation error instead of being silently ignored, and provider_options cannot override llmskit-managed keys such as model, messages, or stream.

response_format="legacy" still works as a deprecated compatibility alias for older code, but new code should prefer result_format="legacy".

Streaming

stream(...) yields OpenAI-style chat completion chunks.

from llmskit import ChatLLM

chat = ChatLLM.from_openai(
    model="gpt-4o-mini",
    api_key="YOUR_API_KEY",
    base_url="https://api.openai.com/v1",
)

for chunk in chat.stream(
    messages=[{"role": "user", "content": "Count from 1 to 3."}]
):
    choice = chunk["choices"][0]
    delta = choice["delta"]

    if delta.get("role"):
        print("role:", delta["role"])
    if delta.get("content"):
        print(delta["content"], end="")
    if delta.get("reasoning_content"):
        print("\nreasoning:", delta["reasoning_content"])
    if delta.get("tool_calls"):
        print("\ntool_calls:", delta["tool_calls"])
    if choice.get("finish_reason"):
        print("\nfinish_reason:", choice["finish_reason"])

Tool Calling

Tool definitions use one canonical schema across providers:

tools = [
    {
        "name": "get_weather",
        "description": "Get the weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string"},
            },
            "required": ["city"],
        },
    }
]

Pass them to complete(...) or stream(...):

response = chat.complete(
    messages=[{"role": "user", "content": "What is the weather in Beijing?"}],
    tools=tools,
)

tool_calls = response["choices"][0]["message"]["tool_calls"]
print(tool_calls)

Returned tool calls are normalized to an OpenAI-style structure:

[
    {
        "id": "call_123",
        "type": "function",
        "function": {
            "name": "get_weather",
            "arguments": "{\"city\":\"Beijing\"}",
        },
    }
]

Multimodal Messages

Message content can be either a plain string or a list of structured content parts.

Supported canonical content part types:

  • text
  • image_url
  • input_audio
  • file
  • video_url

Vision example

from llmskit import ChatLLM

chat = ChatLLM.from_gemini(
    model="gemini-2.5-flash",
    api_key="YOUR_API_KEY",
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image."},
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://example.com/cat.png",
                    "format": "image/png",
                },
            },
        ],
    }
]

response = chat.complete(messages=messages)
print(response["choices"][0]["message"]["content"])

Other content part shapes

audio_part = {
    "type": "input_audio",
    "input_audio": {
        "data": "<base64-audio-data>",
        "format": "wav",
    },
}

file_part = {
    "type": "file",
    "file": {
        "file_id": "gs://bucket/report.pdf",
        "format": "application/pdf",
    },
}

video_part = {
    "type": "video_url",
    "video_url": {
        "url": "gs://bucket/demo.mp4",
        "format": "video/mp4",
    },
}

The wrapper validates unsupported modalities early, so provider/model mismatches fail fast instead of being silently forwarded.

You can inspect capabilities at runtime:

print(chat.capabilities)
print(chat.capability_snapshot())
print(chat.refresh_capabilities())
print(chat.supports_vision())
print(chat.supports_audio_input())
print(chat.supports_audio_output())
print(chat.supports_document_input())
print(chat.supports_video_input())

Where:

  • chat.capabilities is the backward-compatible boolean view.
  • chat.capability_snapshot() returns a model-level snapshot with state / source metadata.
  • chat.refresh_capabilities() re-resolves the shared snapshot for the current provider + model + base_url tuple and preserves runtime-learned corrections by default.

Provider Capability Overview

The table below describes default model-family capabilities for built-in providers. At runtime, the authoritative behavior is the model-level capability snapshot, not class-level static constants:

Provider Tool calling Reasoning Vision Audio input Audio output Document input Video input
OpenAI-compatible Yes Yes Yes Yes Yes No No
Claude Yes Yes Yes No No Yes No
Gemini Yes Yes Yes Yes Yes Yes Yes

Embeddings

OpenAIEmbeddings and AsyncOpenAIEmbeddings target OpenAI-compatible embedding endpoints.

Synchronous embeddings

from llmskit import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    base_url="https://api.openai.com/v1",
    model="text-embedding-3-small",
    api_key="YOUR_API_KEY",
    batch_size=16,
)

query_vector = embeddings.embed_query("What is llmskit?")
document_vectors = embeddings.embed_documents(
    [
        "llmskit wraps multiple chat providers.",
        "It also includes embeddings and reranking helpers.",
    ]
)

print(len(query_vector))
print(len(document_vectors))
print(embeddings.get_embedding_dimension())

Asynchronous embeddings

import asyncio

from llmskit import AsyncOpenAIEmbeddings


async def main() -> None:
    embeddings = AsyncOpenAIEmbeddings(
        base_url="https://api.openai.com/v1",
        model="text-embedding-3-small",
        api_key="YOUR_API_KEY",
    )

    vector = await embeddings.embed_query("hello")
    print(len(vector))


asyncio.run(main())

Embedding helpers include:

  • batching
  • retry with exponential backoff
  • max input length truncation
  • cached dimension detection

Reranker

Reranker and AsyncReranker call a rerank service with a /rerank endpoint.

Synchronous reranking

from llmskit import Reranker

reranker = Reranker(
    base_url="https://your-reranker-service",
    model="bge-reranker-v2-m3",
    api_key="YOUR_API_KEY",
)

result = reranker.rerank(
    query="python async http client",
    documents=[
        "httpx supports both sync and async clients",
        "Redis is an in-memory database",
        "Python generators can yield values lazily",
    ],
    top_n=2,
    threshold=0.0,
)

print(result.results)
print(result.usage)

Asynchronous reranking

import asyncio

from llmskit import AsyncReranker


async def main() -> None:
    reranker = AsyncReranker(
        base_url="https://your-reranker-service",
        model="bge-reranker-v2-m3",
        api_key="YOUR_API_KEY",
    )

    result = await reranker.rerank(
        query="python async http client",
        documents=[
            "httpx supports both sync and async clients",
            "Redis is an in-memory database",
        ],
        top_n=1,
    )
    print(result.results)


asyncio.run(main())

Notes

  • ChatLLM and AsyncChatLLM normalize provider responses into OpenAI-style chunks and completion payloads.
  • The default non-streaming response is the new OpenAI-style dictionary, not the legacy dataclass.
  • OpenAIEmbeddings works with OpenAI-compatible services, including self-hosted endpoints that implement the embeddings API.
  • Retries are built in for transient network and server-side failures.
  • For local development and CI, run python -m pytest -q from the repository root.
  • The examples above use placeholder model names and endpoints; replace them with the values supported by your provider.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmskit-0.2.1.tar.gz (78.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmskit-0.2.1-py3-none-any.whl (67.7 kB view details)

Uploaded Python 3

File details

Details for the file llmskit-0.2.1.tar.gz.

File metadata

  • Download URL: llmskit-0.2.1.tar.gz
  • Upload date:
  • Size: 78.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for llmskit-0.2.1.tar.gz
Algorithm Hash digest
SHA256 df9d816bb86106fb61448c7f60701a8fb1e6485938626eaf37a95a4a7fa3549d
MD5 7331684aecc97a8b5b56978be7724795
BLAKE2b-256 50935f73cfcc29de696ab5d359e464c7e21dd1dcac6c0d378932a1145c83e263

See more details on using hashes here.

File details

Details for the file llmskit-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: llmskit-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 67.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for llmskit-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b4f8c457ab1bbbe52671bfaf461fdf3209df7a4b6469ea74ffad461a91e19801
MD5 841f8f739de1d440dcafe33887da9d05
BLAKE2b-256 875fc7680aaa21256d884fa805fe267ef8aba6d9213f4a981a64ee19d3dbc9df

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page