Skip to main content

Client and Tools for LLMs

Project description

llmskit

llmskit provides a unified Python interface for chat, embeddings, and reranking across multiple LLM providers.

The current codebase exposes:

  • Unified sync and async chat wrappers
  • OpenAI-style streaming and completion responses
  • Provider adapters for openai, gemini, and claude
  • Canonical multimodal message parts and tool definitions
  • OpenAI-compatible embeddings helpers
  • Generic reranker clients

Installation

pip install llmskit

Public API

from llmskit import (
    AsyncChatLLM,
    AsyncOpenAIEmbeddings,
    AsyncReranker,
    ChatLLM,
    OpenAIEmbeddings,
    Reranker,
)

Chat Quick Start

Synchronous chat

from llmskit import ChatLLM

chat = ChatLLM.from_openai(
    model="gpt-4o-mini",
    api_key="YOUR_API_KEY",
    base_url="https://api.openai.com/v1",  # replace with your OpenAI-compatible endpoint
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Introduce yourself in one sentence."},
]

response = chat.complete(messages=messages)
message = response["choices"][0]["message"]

print(message["content"])
print(message["reasoning_content"])
print(response["usage"])

ChatLLM is intended for blocking synchronous code paths and now uses native sync provider clients. In Jupyter notebooks, async web frameworks, or inside async def code, prefer AsyncChatLLM so you do not block the active event loop.

Asynchronous chat

import asyncio

from llmskit import AsyncChatLLM


async def main() -> None:
    chat = AsyncChatLLM.from_gemini(
        model="gemini-2.5-flash",
        api_key="YOUR_API_KEY",
    )

    response = await chat.complete(
        messages=[
            {"role": "system", "content": "Answer briefly."},
            {"role": "user", "content": "What is llmskit?"},
        ]
    )

    print(response["choices"][0]["message"]["content"])


asyncio.run(main())

If your runtime already has an event loop, such as Jupyter Notebook or FastAPI / Starlette request handlers, prefer AsyncChatLLM to keep that loop non-blocking.

Provider Factories

Use explicit factory methods when you already know the backend:

from llmskit import ChatLLM

openai_chat = ChatLLM.from_openai(
    model="gpt-4o-mini",
    api_key="YOUR_API_KEY",
    base_url="https://api.openai.com/v1",
)

gemini_chat = ChatLLM.from_gemini(
    model="gemini-2.5-flash",
    api_key="YOUR_API_KEY",
)

claude_chat = ChatLLM.from_claude(
    model="claude-sonnet-4-20250514",
    api_key="YOUR_API_KEY",
)

Or choose the provider dynamically:

from llmskit import ChatLLM

chat = ChatLLM.create(
    provider="openai",
    model="gpt-4o-mini",
    api_key="YOUR_API_KEY",
    base_url="https://api.openai.com/v1",
)

Supported provider names for create(...):

  • openai
  • gemini
  • claude

Deprecated aliases still exist in code, but new code should prefer:

  • from_openai(...) instead of from_gpt(...) or from_local(...)
  • from_claude(...) instead of from_anthropic(...)

Factory methods are for construction-time options such as base_url, client_logger, and retry_config. Request options such as temperature, max_tokens, or provider response_format belong on complete(...) / stream(...). Use result_format when you want llmskit itself to return the legacy compatibility object.

You can also register custom chat providers without editing llmskit.chat:

from typing import Any, AsyncIterator

from llmskit import AsyncChatLLM
from llmskit.clients import AsyncLLMClient
from llmskit.core import register_chat_provider
from llmskit.types import Message, ProviderEvent, ToolDefinition


class MyChatClient(AsyncLLMClient):
    provider = "my-provider"
    model = "demo-model"
    capabilities = {
        "tool_calling": False,
        "reasoning": False,
        "streaming": True,
        "vision": False,
        "audio_input": False,
        "audio_output": False,
        "document_input": False,
        "video_input": False,
        "native_multimodal_output": False,
    }

    async def events(
        self,
        messages: list[Message],
        *,
        tools: list[ToolDefinition] | None = None,
        **kwargs: Any,
    ) -> AsyncIterator[ProviderEvent]:
        del messages, tools, kwargs
        if False:  # pragma: no cover
            yield ProviderEvent()


register_chat_provider(name="my-provider", async_client_factory=MyChatClient, replace=True)
chat = AsyncChatLLM.create("my-provider", model="demo-model")

If you also want ChatLLM.create("my-provider", ...) support, register a native sync client with sync_client_factory=... as well.

Response Formats

ChatLLM.complete(...) and AsyncChatLLM.complete(...) return an OpenAI-style response by default.

response = chat.complete(messages=messages)

print(response["object"])  # chat.completion
print(response["choices"][0]["message"]["content"])
print(response["choices"][0]["message"]["tool_calls"])
print(response["usage"])
print(response["provider_extensions"])

If you still need the old compatibility object, request result_format="legacy":

legacy_response = chat.complete(
    messages=messages,
    result_format="legacy",
)

print(legacy_response.content)
print(legacy_response.reasoning_content)
print(legacy_response.tool_calls)

Provider request formatting still uses response_format, for example:

response = chat.complete(
    messages=messages,
    response_format={"type": "json_object"},
)

Provider-native request options should go inside provider_options, for example:

chat.complete(
    messages=messages,
    provider_options={"reasoning_effort": "high"},  # OpenAI native
)

chat.complete(
    messages=messages,
    provider_options={"thinking": {"type": "enabled", "budget_tokens": 1024}},  # Claude native
)

chat.complete(
    messages=messages,
    provider_options={"candidate_count": 2},  # Gemini native
)

Keep shared llmskit options such as temperature, max_tokens, modalities, audio, and response_format at the top level. Unknown top-level provider kwargs now raise a validation error instead of being silently ignored, and provider_options cannot override llmskit-managed keys such as model, messages, or stream.

response_format="legacy" still works as a deprecated compatibility alias for older code, but new code should prefer result_format="legacy".

Streaming

stream(...) yields OpenAI-style chat completion chunks.

from llmskit import ChatLLM

chat = ChatLLM.from_openai(
    model="gpt-4o-mini",
    api_key="YOUR_API_KEY",
    base_url="https://api.openai.com/v1",
)

for chunk in chat.stream(
    messages=[{"role": "user", "content": "Count from 1 to 3."}]
):
    choice = chunk["choices"][0]
    delta = choice["delta"]

    if delta.get("role"):
        print("role:", delta["role"])
    if delta.get("content"):
        print(delta["content"], end="")
    if delta.get("reasoning_content"):
        print("\nreasoning:", delta["reasoning_content"])
    if delta.get("tool_calls"):
        print("\ntool_calls:", delta["tool_calls"])
    if choice.get("finish_reason"):
        print("\nfinish_reason:", choice["finish_reason"])

Tool Calling

Tool definitions use one canonical schema across providers:

tools = [
    {
        "name": "get_weather",
        "description": "Get the weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string"},
            },
            "required": ["city"],
        },
    }
]

Pass them to complete(...) or stream(...):

response = chat.complete(
    messages=[{"role": "user", "content": "What is the weather in Beijing?"}],
    tools=tools,
)

tool_calls = response["choices"][0]["message"]["tool_calls"]
print(tool_calls)

Returned tool calls are normalized to an OpenAI-style structure:

[
    {
        "id": "call_123",
        "type": "function",
        "function": {
            "name": "get_weather",
            "arguments": "{\"city\":\"Beijing\"}",
        },
    }
]

Multimodal Messages

Message content can be either a plain string or a list of structured content parts.

Supported canonical content part types:

  • text
  • image_url
  • input_audio
  • file
  • video_url

Vision example

from llmskit import ChatLLM

chat = ChatLLM.from_gemini(
    model="gemini-2.5-flash",
    api_key="YOUR_API_KEY",
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image."},
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://example.com/cat.png",
                    "format": "image/png",
                },
            },
        ],
    }
]

response = chat.complete(messages=messages)
print(response["choices"][0]["message"]["content"])

Other content part shapes

audio_part = {
    "type": "input_audio",
    "input_audio": {
        "data": "<base64-audio-data>",
        "format": "wav",
    },
}

file_part = {
    "type": "file",
    "file": {
        "file_id": "gs://bucket/report.pdf",
        "format": "application/pdf",
    },
}

video_part = {
    "type": "video_url",
    "video_url": {
        "url": "gs://bucket/demo.mp4",
        "format": "video/mp4",
    },
}

The wrapper validates unsupported modalities early, so provider/model mismatches fail fast instead of being silently forwarded.

You can inspect capabilities at runtime:

print(chat.capabilities)
print(chat.supports_vision())
print(chat.supports_audio_input())
print(chat.supports_audio_output())
print(chat.supports_document_input())
print(chat.supports_video_input())

Provider Capability Overview

The current provider adapters expose the following capability flags in code:

Provider Tool calling Reasoning Vision Audio input Audio output Document input Video input
OpenAI-compatible Yes Yes Yes Yes Yes No No
Claude Yes Yes Yes No No Yes No
Gemini Yes Yes Yes Yes Yes Yes Yes

Embeddings

OpenAIEmbeddings and AsyncOpenAIEmbeddings target OpenAI-compatible embedding endpoints.

Synchronous embeddings

from llmskit import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    base_url="https://api.openai.com/v1",
    model="text-embedding-3-small",
    api_key="YOUR_API_KEY",
    batch_size=16,
)

query_vector = embeddings.embed_query("What is llmskit?")
document_vectors = embeddings.embed_documents(
    [
        "llmskit wraps multiple chat providers.",
        "It also includes embeddings and reranking helpers.",
    ]
)

print(len(query_vector))
print(len(document_vectors))
print(embeddings.get_embedding_dimension())

Asynchronous embeddings

import asyncio

from llmskit import AsyncOpenAIEmbeddings


async def main() -> None:
    embeddings = AsyncOpenAIEmbeddings(
        base_url="https://api.openai.com/v1",
        model="text-embedding-3-small",
        api_key="YOUR_API_KEY",
    )

    vector = await embeddings.embed_query("hello")
    print(len(vector))


asyncio.run(main())

Embedding helpers include:

  • batching
  • retry with exponential backoff
  • max input length truncation
  • cached dimension detection

Reranker

Reranker and AsyncReranker call a rerank service with a /rerank endpoint.

Synchronous reranking

from llmskit import Reranker

reranker = Reranker(
    base_url="https://your-reranker-service",
    model="bge-reranker-v2-m3",
    api_key="YOUR_API_KEY",
)

result = reranker.rerank(
    query="python async http client",
    documents=[
        "httpx supports both sync and async clients",
        "Redis is an in-memory database",
        "Python generators can yield values lazily",
    ],
    top_n=2,
    threshold=0.0,
)

print(result.results)
print(result.usage)

Asynchronous reranking

import asyncio

from llmskit import AsyncReranker


async def main() -> None:
    reranker = AsyncReranker(
        base_url="https://your-reranker-service",
        model="bge-reranker-v2-m3",
        api_key="YOUR_API_KEY",
    )

    result = await reranker.rerank(
        query="python async http client",
        documents=[
            "httpx supports both sync and async clients",
            "Redis is an in-memory database",
        ],
        top_n=1,
    )
    print(result.results)


asyncio.run(main())

Notes

  • ChatLLM and AsyncChatLLM normalize provider responses into OpenAI-style chunks and completion payloads.
  • The default non-streaming response is the new OpenAI-style dictionary, not the legacy dataclass.
  • OpenAIEmbeddings works with OpenAI-compatible services, including self-hosted endpoints that implement the embeddings API.
  • Retries are built in for transient network and server-side failures.
  • For local development and CI, run python -m pytest -q from the repository root.
  • The examples above use placeholder model names and endpoints; replace them with the values supported by your provider.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmskit-0.2.0.tar.gz (65.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmskit-0.2.0-py3-none-any.whl (58.1 kB view details)

Uploaded Python 3

File details

Details for the file llmskit-0.2.0.tar.gz.

File metadata

  • Download URL: llmskit-0.2.0.tar.gz
  • Upload date:
  • Size: 65.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for llmskit-0.2.0.tar.gz
Algorithm Hash digest
SHA256 fdde6e51145af4fe9b5e690f208541fd0aac730ea9a506ed753f267d6fd2c844
MD5 174adfa111325cd8970a08449fcb89e3
BLAKE2b-256 e05b9df6d941cf9ea0de8a5e904a71838d1c253a85ecac7a89e935b28ea5b1a8

See more details on using hashes here.

File details

Details for the file llmskit-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: llmskit-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 58.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for llmskit-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7d911e3db5d56390563390bfb5bd4484f89517db943983abd6beafb58e1e441c
MD5 d74e6b4638c5aef6cbe57cf766a0199e
BLAKE2b-256 a07b62a7d642eaf1802d3daba01625bdecb56836253008c165e1bd040dbf57fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page