Client and Tools for LLMs
Project description
llmskit
llmskit provides a unified Python interface for chat, embeddings, and reranking across multiple LLM providers.
The current codebase exposes:
- Unified sync and async chat wrappers
- OpenAI-style streaming and completion responses
- Provider adapters for
openai,gemini, andclaude - Canonical multimodal message parts and tool definitions
- OpenAI-compatible embeddings helpers
- Generic reranker clients
Installation
pip install llmskit
Public API
from llmskit import (
AsyncChatLLM,
AsyncOpenAIEmbeddings,
AsyncReranker,
ChatLLM,
OpenAIEmbeddings,
Reranker,
)
Chat Quick Start
Synchronous chat
from llmskit import ChatLLM
chat = ChatLLM.from_openai(
model="gpt-4o-mini",
api_key="YOUR_API_KEY",
base_url="https://api.openai.com/v1", # replace with your OpenAI-compatible endpoint
)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Introduce yourself in one sentence."},
]
response = chat.complete(messages=messages)
message = response["choices"][0]["message"]
print(message["content"])
print(message["reasoning_content"])
print(response["usage"])
ChatLLM is intended for blocking synchronous code paths and now uses native sync provider clients. In Jupyter notebooks, async web frameworks, or inside async def code, prefer AsyncChatLLM so you do not block the active event loop.
Asynchronous chat
import asyncio
from llmskit import AsyncChatLLM
async def main() -> None:
chat = AsyncChatLLM.from_gemini(
model="gemini-2.5-flash",
api_key="YOUR_API_KEY",
)
response = await chat.complete(
messages=[
{"role": "system", "content": "Answer briefly."},
{"role": "user", "content": "What is llmskit?"},
]
)
print(response["choices"][0]["message"]["content"])
asyncio.run(main())
If your runtime already has an event loop, such as Jupyter Notebook or FastAPI / Starlette request handlers, prefer AsyncChatLLM to keep that loop non-blocking.
Provider Factories
Use explicit factory methods when you already know the backend:
from llmskit import ChatLLM
openai_chat = ChatLLM.from_openai(
model="gpt-4o-mini",
api_key="YOUR_API_KEY",
base_url="https://api.openai.com/v1",
)
gemini_chat = ChatLLM.from_gemini(
model="gemini-2.5-flash",
api_key="YOUR_API_KEY",
)
claude_chat = ChatLLM.from_claude(
model="claude-sonnet-4-20250514",
api_key="YOUR_API_KEY",
)
Or choose the provider dynamically:
from llmskit import ChatLLM
chat = ChatLLM.create(
provider="openai",
model="gpt-4o-mini",
api_key="YOUR_API_KEY",
base_url="https://api.openai.com/v1",
)
Supported provider names for create(...):
openaigeminiclaude
Deprecated aliases still exist in code, but new code should prefer:
from_openai(...)instead offrom_gpt(...)orfrom_local(...)from_claude(...)instead offrom_anthropic(...)
Factory methods are for construction-time options such as base_url,
client_logger, and retry_config. Request options such as temperature,
max_tokens, or provider response_format belong on complete(...) / stream(...).
Use result_format when you want llmskit itself to return the legacy compatibility object.
You can also register custom chat providers without editing llmskit.chat:
from typing import Any, AsyncIterator
from llmskit import AsyncChatLLM
from llmskit.clients import AsyncLLMClient
from llmskit.core import register_chat_provider
from llmskit.types import Message, ProviderEvent, ToolDefinition
class MyChatClient(AsyncLLMClient):
provider = "my-provider"
model = "demo-model"
capabilities = {
"tool_calling": False,
"reasoning": False,
"streaming": True,
"vision": False,
"audio_input": False,
"audio_output": False,
"document_input": False,
"video_input": False,
"native_multimodal_output": False,
}
async def events(
self,
messages: list[Message],
*,
tools: list[ToolDefinition] | None = None,
**kwargs: Any,
) -> AsyncIterator[ProviderEvent]:
del messages, tools, kwargs
if False: # pragma: no cover
yield ProviderEvent()
register_chat_provider(name="my-provider", async_client_factory=MyChatClient, replace=True)
chat = AsyncChatLLM.create("my-provider", model="demo-model")
If you also want ChatLLM.create("my-provider", ...) support, register a
native sync client with sync_client_factory=... as well.
Response Formats
ChatLLM.complete(...) and AsyncChatLLM.complete(...) return an OpenAI-style response by default.
response = chat.complete(messages=messages)
print(response["object"]) # chat.completion
print(response["choices"][0]["message"]["content"])
print(response["choices"][0]["message"]["tool_calls"])
print(response["usage"])
print(response["provider_extensions"])
If you still need the old compatibility object, request result_format="legacy":
legacy_response = chat.complete(
messages=messages,
result_format="legacy",
)
print(legacy_response.content)
print(legacy_response.reasoning_content)
print(legacy_response.tool_calls)
Provider request formatting still uses response_format, for example:
response = chat.complete(
messages=messages,
response_format={"type": "json_object"},
)
Provider-native request options should go inside provider_options, for example:
chat.complete(
messages=messages,
provider_options={"reasoning_effort": "high"}, # OpenAI native
)
chat.complete(
messages=messages,
provider_options={"thinking": {"type": "enabled", "budget_tokens": 1024}}, # Claude native
)
chat.complete(
messages=messages,
provider_options={"candidate_count": 2}, # Gemini native
)
Keep shared llmskit options such as temperature, max_tokens, modalities,
audio, and response_format at the top level. Unknown top-level provider kwargs
now raise a validation error instead of being silently ignored, and
provider_options cannot override llmskit-managed keys such as model,
messages, or stream.
response_format="legacy" still works as a deprecated compatibility alias for
older code, but new code should prefer result_format="legacy".
Streaming
stream(...) yields OpenAI-style chat completion chunks.
from llmskit import ChatLLM
chat = ChatLLM.from_openai(
model="gpt-4o-mini",
api_key="YOUR_API_KEY",
base_url="https://api.openai.com/v1",
)
for chunk in chat.stream(
messages=[{"role": "user", "content": "Count from 1 to 3."}]
):
choice = chunk["choices"][0]
delta = choice["delta"]
if delta.get("role"):
print("role:", delta["role"])
if delta.get("content"):
print(delta["content"], end="")
if delta.get("reasoning_content"):
print("\nreasoning:", delta["reasoning_content"])
if delta.get("tool_calls"):
print("\ntool_calls:", delta["tool_calls"])
if choice.get("finish_reason"):
print("\nfinish_reason:", choice["finish_reason"])
Tool Calling
Tool definitions use one canonical schema across providers:
tools = [
{
"name": "get_weather",
"description": "Get the weather for a city",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string"},
},
"required": ["city"],
},
}
]
Pass them to complete(...) or stream(...):
response = chat.complete(
messages=[{"role": "user", "content": "What is the weather in Beijing?"}],
tools=tools,
)
tool_calls = response["choices"][0]["message"]["tool_calls"]
print(tool_calls)
Returned tool calls are normalized to an OpenAI-style structure:
[
{
"id": "call_123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\":\"Beijing\"}",
},
}
]
Multimodal Messages
Message content can be either a plain string or a list of structured content parts.
Supported canonical content part types:
textimage_urlinput_audiofilevideo_url
Vision example
from llmskit import ChatLLM
chat = ChatLLM.from_gemini(
model="gemini-2.5-flash",
api_key="YOUR_API_KEY",
)
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image."},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/cat.png",
"format": "image/png",
},
},
],
}
]
response = chat.complete(messages=messages)
print(response["choices"][0]["message"]["content"])
Other content part shapes
audio_part = {
"type": "input_audio",
"input_audio": {
"data": "<base64-audio-data>",
"format": "wav",
},
}
file_part = {
"type": "file",
"file": {
"file_id": "gs://bucket/report.pdf",
"format": "application/pdf",
},
}
video_part = {
"type": "video_url",
"video_url": {
"url": "gs://bucket/demo.mp4",
"format": "video/mp4",
},
}
The wrapper validates unsupported modalities early, so provider/model mismatches fail fast instead of being silently forwarded.
You can inspect capabilities at runtime:
print(chat.capabilities)
print(chat.supports_vision())
print(chat.supports_audio_input())
print(chat.supports_audio_output())
print(chat.supports_document_input())
print(chat.supports_video_input())
Provider Capability Overview
The current provider adapters expose the following capability flags in code:
| Provider | Tool calling | Reasoning | Vision | Audio input | Audio output | Document input | Video input |
|---|---|---|---|---|---|---|---|
| OpenAI-compatible | Yes | Yes | Yes | Yes | Yes | No | No |
| Claude | Yes | Yes | Yes | No | No | Yes | No |
| Gemini | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Embeddings
OpenAIEmbeddings and AsyncOpenAIEmbeddings target OpenAI-compatible embedding endpoints.
Synchronous embeddings
from llmskit import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(
base_url="https://api.openai.com/v1",
model="text-embedding-3-small",
api_key="YOUR_API_KEY",
batch_size=16,
)
query_vector = embeddings.embed_query("What is llmskit?")
document_vectors = embeddings.embed_documents(
[
"llmskit wraps multiple chat providers.",
"It also includes embeddings and reranking helpers.",
]
)
print(len(query_vector))
print(len(document_vectors))
print(embeddings.get_embedding_dimension())
Asynchronous embeddings
import asyncio
from llmskit import AsyncOpenAIEmbeddings
async def main() -> None:
embeddings = AsyncOpenAIEmbeddings(
base_url="https://api.openai.com/v1",
model="text-embedding-3-small",
api_key="YOUR_API_KEY",
)
vector = await embeddings.embed_query("hello")
print(len(vector))
asyncio.run(main())
Embedding helpers include:
- batching
- retry with exponential backoff
- max input length truncation
- cached dimension detection
Reranker
Reranker and AsyncReranker call a rerank service with a /rerank endpoint.
Synchronous reranking
from llmskit import Reranker
reranker = Reranker(
base_url="https://your-reranker-service",
model="bge-reranker-v2-m3",
api_key="YOUR_API_KEY",
)
result = reranker.rerank(
query="python async http client",
documents=[
"httpx supports both sync and async clients",
"Redis is an in-memory database",
"Python generators can yield values lazily",
],
top_n=2,
threshold=0.0,
)
print(result.results)
print(result.usage)
Asynchronous reranking
import asyncio
from llmskit import AsyncReranker
async def main() -> None:
reranker = AsyncReranker(
base_url="https://your-reranker-service",
model="bge-reranker-v2-m3",
api_key="YOUR_API_KEY",
)
result = await reranker.rerank(
query="python async http client",
documents=[
"httpx supports both sync and async clients",
"Redis is an in-memory database",
],
top_n=1,
)
print(result.results)
asyncio.run(main())
Notes
ChatLLMandAsyncChatLLMnormalize provider responses into OpenAI-style chunks and completion payloads.- The default non-streaming response is the new OpenAI-style dictionary, not the legacy dataclass.
OpenAIEmbeddingsworks with OpenAI-compatible services, including self-hosted endpoints that implement the embeddings API.- Retries are built in for transient network and server-side failures.
- For local development and CI, run
python -m pytest -qfrom the repository root. - The examples above use placeholder model names and endpoints; replace them with the values supported by your provider.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmskit-0.2.0.tar.gz.
File metadata
- Download URL: llmskit-0.2.0.tar.gz
- Upload date:
- Size: 65.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fdde6e51145af4fe9b5e690f208541fd0aac730ea9a506ed753f267d6fd2c844
|
|
| MD5 |
174adfa111325cd8970a08449fcb89e3
|
|
| BLAKE2b-256 |
e05b9df6d941cf9ea0de8a5e904a71838d1c253a85ecac7a89e935b28ea5b1a8
|
File details
Details for the file llmskit-0.2.0-py3-none-any.whl.
File metadata
- Download URL: llmskit-0.2.0-py3-none-any.whl
- Upload date:
- Size: 58.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d911e3db5d56390563390bfb5bd4484f89517db943983abd6beafb58e1e441c
|
|
| MD5 |
d74e6b4638c5aef6cbe57cf766a0199e
|
|
| BLAKE2b-256 |
a07b62a7d642eaf1802d3daba01625bdecb56836253008c165e1bd040dbf57fd
|