Skip to main content

Client and Tools for LLMs

Project description

llmskit

llmskit provides a unified Python interface for chat, embeddings, and reranking across multiple LLM providers.

The current codebase exposes:

  • Unified sync and async chat wrappers
  • OpenAI-style streaming and completion responses
  • Provider adapters for openai, gemini, and claude
  • Canonical multimodal message parts and tool definitions
  • OpenAI-compatible embeddings helpers
  • Generic reranker clients

Installation

pip install llmskit

Public API

from llmskit import (
    AsyncChatLLM,
    AsyncOpenAIEmbeddings,
    AsyncReranker,
    ChatLLM,
    OpenAIEmbeddings,
    Reranker,
)

Chat Quick Start

Synchronous chat

from llmskit import ChatLLM

chat = ChatLLM.from_openai(
    model="gpt-4o-mini",
    api_key="YOUR_API_KEY",
    base_url="https://api.openai.com/v1",  # replace with your OpenAI-compatible endpoint
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Introduce yourself in one sentence."},
]

response = chat.complete(messages=messages)
message = response["choices"][0]["message"]

print(message["content"])
print(message["reasoning_content"])
print(response["usage"])

ChatLLM is intended for purely synchronous scripts. In Jupyter notebooks, async web frameworks, or inside async def code, use AsyncChatLLM instead to avoid conflicts with an already running event loop.

Asynchronous chat

import asyncio

from llmskit import AsyncChatLLM


async def main() -> None:
    chat = AsyncChatLLM.from_gemini(
        model="gemini-2.5-flash",
        api_key="YOUR_API_KEY",
    )

    response = await chat.complete(
        messages=[
            {"role": "system", "content": "Answer briefly."},
            {"role": "user", "content": "What is llmskit?"},
        ]
    )

    print(response["choices"][0]["message"]["content"])


asyncio.run(main())

If your runtime already has an event loop, such as Jupyter Notebook or FastAPI / Starlette request handlers, prefer AsyncChatLLM.

Provider Factories

Use explicit factory methods when you already know the backend:

from llmskit import ChatLLM

openai_chat = ChatLLM.from_openai(
    model="gpt-4o-mini",
    api_key="YOUR_API_KEY",
    base_url="https://api.openai.com/v1",
)

gemini_chat = ChatLLM.from_gemini(
    model="gemini-2.5-flash",
    api_key="YOUR_API_KEY",
)

claude_chat = ChatLLM.from_claude(
    model="claude-sonnet-4-20250514",
    api_key="YOUR_API_KEY",
)

Or choose the provider dynamically:

from llmskit import ChatLLM

chat = ChatLLM.create(
    provider="openai",
    model="gpt-4o-mini",
    api_key="YOUR_API_KEY",
    base_url="https://api.openai.com/v1",
)

Supported provider names for create(...):

  • openai
  • gemini
  • claude

Deprecated aliases still exist in code, but new code should prefer:

  • from_openai(...) instead of from_gpt(...) or from_local(...)
  • from_claude(...) instead of from_anthropic(...)

Factory methods are for construction-time options such as base_url, client_logger, and retry_config. Request options such as temperature, max_tokens, or response_format belong on complete(...) / stream(...).

Response Formats

ChatLLM.complete(...) and AsyncChatLLM.complete(...) return an OpenAI-style response by default.

response = chat.complete(messages=messages)

print(response["object"])  # chat.completion
print(response["choices"][0]["message"]["content"])
print(response["choices"][0]["message"]["tool_calls"])
print(response["usage"])
print(response["provider_extensions"])

If you still need the old compatibility object, request response_format="legacy":

legacy_response = chat.complete(
    messages=messages,
    response_format="legacy",
)

print(legacy_response.content)
print(legacy_response.reasoning_content)
print(legacy_response.tool_calls)

Streaming

stream(...) yields OpenAI-style chat completion chunks.

from llmskit import ChatLLM

chat = ChatLLM.from_openai(
    model="gpt-4o-mini",
    api_key="YOUR_API_KEY",
    base_url="https://api.openai.com/v1",
)

for chunk in chat.stream(
    messages=[{"role": "user", "content": "Count from 1 to 3."}]
):
    choice = chunk["choices"][0]
    delta = choice["delta"]

    if delta.get("role"):
        print("role:", delta["role"])
    if delta.get("content"):
        print(delta["content"], end="")
    if delta.get("reasoning_content"):
        print("\nreasoning:", delta["reasoning_content"])
    if delta.get("tool_calls"):
        print("\ntool_calls:", delta["tool_calls"])
    if choice.get("finish_reason"):
        print("\nfinish_reason:", choice["finish_reason"])

Tool Calling

Tool definitions use one canonical schema across providers:

tools = [
    {
        "name": "get_weather",
        "description": "Get the weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string"},
            },
            "required": ["city"],
        },
    }
]

Pass them to complete(...) or stream(...):

response = chat.complete(
    messages=[{"role": "user", "content": "What is the weather in Beijing?"}],
    tools=tools,
)

tool_calls = response["choices"][0]["message"]["tool_calls"]
print(tool_calls)

Returned tool calls are normalized to an OpenAI-style structure:

[
    {
        "id": "call_123",
        "type": "function",
        "function": {
            "name": "get_weather",
            "arguments": "{\"city\":\"Beijing\"}",
        },
    }
]

Multimodal Messages

Message content can be either a plain string or a list of structured content parts.

Supported canonical content part types:

  • text
  • image_url
  • input_audio
  • file
  • video_url

Vision example

from llmskit import ChatLLM

chat = ChatLLM.from_gemini(
    model="gemini-2.5-flash",
    api_key="YOUR_API_KEY",
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image."},
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://example.com/cat.png",
                    "format": "image/png",
                },
            },
        ],
    }
]

response = chat.complete(messages=messages)
print(response["choices"][0]["message"]["content"])

Other content part shapes

audio_part = {
    "type": "input_audio",
    "input_audio": {
        "data": "<base64-audio-data>",
        "format": "wav",
    },
}

file_part = {
    "type": "file",
    "file": {
        "file_id": "gs://bucket/report.pdf",
        "format": "application/pdf",
    },
}

video_part = {
    "type": "video_url",
    "video_url": {
        "url": "gs://bucket/demo.mp4",
        "format": "video/mp4",
    },
}

The wrapper validates unsupported modalities early, so provider/model mismatches fail fast instead of being silently forwarded.

You can inspect capabilities at runtime:

print(chat.capabilities)
print(chat.supports_vision())
print(chat.supports_audio_input())
print(chat.supports_audio_output())
print(chat.supports_document_input())
print(chat.supports_video_input())

Provider Capability Overview

The current provider adapters expose the following capability flags in code:

Provider Tool calling Reasoning Vision Audio input Audio output Document input Video input
OpenAI-compatible Yes Yes Yes Yes Yes No No
Claude Yes Yes Yes No No Yes No
Gemini Yes Yes Yes Yes Yes Yes Yes

Embeddings

OpenAIEmbeddings and AsyncOpenAIEmbeddings target OpenAI-compatible embedding endpoints.

Synchronous embeddings

from llmskit import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    base_url="https://api.openai.com/v1",
    model_name="text-embedding-3-small",
    api_key="YOUR_API_KEY",
    batch_size=16,
)

query_vector = embeddings.embed_query("What is llmskit?")
document_vectors = embeddings.embed_documents(
    [
        "llmskit wraps multiple chat providers.",
        "It also includes embeddings and reranking helpers.",
    ]
)

print(len(query_vector))
print(len(document_vectors))
print(embeddings.get_embedding_dimension())

Asynchronous embeddings

import asyncio

from llmskit import AsyncOpenAIEmbeddings


async def main() -> None:
    embeddings = AsyncOpenAIEmbeddings(
        base_url="https://api.openai.com/v1",
        model_name="text-embedding-3-small",
        api_key="YOUR_API_KEY",
    )

    vector = await embeddings.embed_query("hello")
    print(len(vector))


asyncio.run(main())

Embedding helpers include:

  • batching
  • retry with exponential backoff
  • max input length truncation
  • cached dimension detection

Reranker

Reranker and AsyncReranker call a rerank service with a /rerank endpoint.

Synchronous reranking

from llmskit import Reranker

reranker = Reranker(
    base_url="https://your-reranker-service",
    model_name="bge-reranker-v2-m3",
    api_key="YOUR_API_KEY",
)

result = reranker.rerank(
    query="python async http client",
    documents=[
        "httpx supports both sync and async clients",
        "Redis is an in-memory database",
        "Python generators can yield values lazily",
    ],
    top_n=2,
    threshold=0.0,
)

print(result.results)
print(result.usage)

Asynchronous reranking

import asyncio

from llmskit import AsyncReranker


async def main() -> None:
    reranker = AsyncReranker(
        base_url="https://your-reranker-service",
        model_name="bge-reranker-v2-m3",
        api_key="YOUR_API_KEY",
    )

    result = await reranker.rerank(
        query="python async http client",
        documents=[
            "httpx supports both sync and async clients",
            "Redis is an in-memory database",
        ],
        top_n=1,
    )
    print(result.results)


asyncio.run(main())

Notes

  • ChatLLM and AsyncChatLLM normalize provider responses into OpenAI-style chunks and completion payloads.
  • The default non-streaming response is the new OpenAI-style dictionary, not the legacy dataclass.
  • OpenAIEmbeddings works with OpenAI-compatible services, including self-hosted endpoints that implement the embeddings API.
  • Retries are built in for transient network and server-side failures.
  • The examples above use placeholder model names and endpoints; replace them with the values supported by your provider.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmskit-0.1.0.tar.gz (46.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmskit-0.1.0-py3-none-any.whl (43.7 kB view details)

Uploaded Python 3

File details

Details for the file llmskit-0.1.0.tar.gz.

File metadata

  • Download URL: llmskit-0.1.0.tar.gz
  • Upload date:
  • Size: 46.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for llmskit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c38b9573f53a0f34fdc2b212c08f05ddb4e1dea87bee75463bfd4dd33b0cf1d4
MD5 863a336257f79cc86b78078eed8bfb80
BLAKE2b-256 4a177e15766a1feaa71f84f5c1c4650d6bb6811b213f7d5e825cba6ab6a1e453

See more details on using hashes here.

File details

Details for the file llmskit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llmskit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 43.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for llmskit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 545db3158a4357f41cda01861fb6e2c9699451f81967316b4d1db9187e584da9
MD5 835269f72683c7a4430df2f794b4b625
BLAKE2b-256 3d30cde8653efed205a376dfb2ca837a392b2b614b7c9d072d4aa5d3e3193853

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page