Client and Tools for LLMs
Project description
llmskit
llmskit provides a unified Python interface for chat, embeddings, and reranking across multiple LLM providers.
The current codebase exposes:
- Unified sync and async chat wrappers
- OpenAI-style streaming and completion responses
- Provider adapters for
openai,gemini, andclaude - Canonical multimodal message parts and tool definitions
- OpenAI-compatible embeddings helpers
- Generic reranker clients
Installation
pip install llmskit
Public API
from llmskit import (
AsyncChatLLM,
AsyncOpenAIEmbeddings,
AsyncReranker,
ChatLLM,
OpenAIEmbeddings,
Reranker,
)
Chat Quick Start
Synchronous chat
from llmskit import ChatLLM
chat = ChatLLM.from_openai(
model="gpt-4o-mini",
api_key="YOUR_API_KEY",
base_url="https://api.openai.com/v1", # replace with your OpenAI-compatible endpoint
)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Introduce yourself in one sentence."},
]
response = chat.complete(messages=messages)
message = response["choices"][0]["message"]
print(message["content"])
print(message["reasoning_content"])
print(response["usage"])
ChatLLM is intended for purely synchronous scripts. In Jupyter notebooks, async web frameworks, or inside async def code, use AsyncChatLLM instead to avoid conflicts with an already running event loop.
Asynchronous chat
import asyncio
from llmskit import AsyncChatLLM
async def main() -> None:
chat = AsyncChatLLM.from_gemini(
model="gemini-2.5-flash",
api_key="YOUR_API_KEY",
)
response = await chat.complete(
messages=[
{"role": "system", "content": "Answer briefly."},
{"role": "user", "content": "What is llmskit?"},
]
)
print(response["choices"][0]["message"]["content"])
asyncio.run(main())
If your runtime already has an event loop, such as Jupyter Notebook or FastAPI / Starlette request handlers, prefer AsyncChatLLM.
Provider Factories
Use explicit factory methods when you already know the backend:
from llmskit import ChatLLM
openai_chat = ChatLLM.from_openai(
model="gpt-4o-mini",
api_key="YOUR_API_KEY",
base_url="https://api.openai.com/v1",
)
gemini_chat = ChatLLM.from_gemini(
model="gemini-2.5-flash",
api_key="YOUR_API_KEY",
)
claude_chat = ChatLLM.from_claude(
model="claude-sonnet-4-20250514",
api_key="YOUR_API_KEY",
)
Or choose the provider dynamically:
from llmskit import ChatLLM
chat = ChatLLM.create(
provider="openai",
model="gpt-4o-mini",
api_key="YOUR_API_KEY",
base_url="https://api.openai.com/v1",
)
Supported provider names for create(...):
openaigeminiclaude
Deprecated aliases still exist in code, but new code should prefer:
from_openai(...)instead offrom_gpt(...)orfrom_local(...)from_claude(...)instead offrom_anthropic(...)
Factory methods are for construction-time options such as base_url,
client_logger, and retry_config. Request options such as temperature,
max_tokens, or response_format belong on complete(...) / stream(...).
Response Formats
ChatLLM.complete(...) and AsyncChatLLM.complete(...) return an OpenAI-style response by default.
response = chat.complete(messages=messages)
print(response["object"]) # chat.completion
print(response["choices"][0]["message"]["content"])
print(response["choices"][0]["message"]["tool_calls"])
print(response["usage"])
print(response["provider_extensions"])
If you still need the old compatibility object, request response_format="legacy":
legacy_response = chat.complete(
messages=messages,
response_format="legacy",
)
print(legacy_response.content)
print(legacy_response.reasoning_content)
print(legacy_response.tool_calls)
Streaming
stream(...) yields OpenAI-style chat completion chunks.
from llmskit import ChatLLM
chat = ChatLLM.from_openai(
model="gpt-4o-mini",
api_key="YOUR_API_KEY",
base_url="https://api.openai.com/v1",
)
for chunk in chat.stream(
messages=[{"role": "user", "content": "Count from 1 to 3."}]
):
choice = chunk["choices"][0]
delta = choice["delta"]
if delta.get("role"):
print("role:", delta["role"])
if delta.get("content"):
print(delta["content"], end="")
if delta.get("reasoning_content"):
print("\nreasoning:", delta["reasoning_content"])
if delta.get("tool_calls"):
print("\ntool_calls:", delta["tool_calls"])
if choice.get("finish_reason"):
print("\nfinish_reason:", choice["finish_reason"])
Tool Calling
Tool definitions use one canonical schema across providers:
tools = [
{
"name": "get_weather",
"description": "Get the weather for a city",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string"},
},
"required": ["city"],
},
}
]
Pass them to complete(...) or stream(...):
response = chat.complete(
messages=[{"role": "user", "content": "What is the weather in Beijing?"}],
tools=tools,
)
tool_calls = response["choices"][0]["message"]["tool_calls"]
print(tool_calls)
Returned tool calls are normalized to an OpenAI-style structure:
[
{
"id": "call_123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\":\"Beijing\"}",
},
}
]
Multimodal Messages
Message content can be either a plain string or a list of structured content parts.
Supported canonical content part types:
textimage_urlinput_audiofilevideo_url
Vision example
from llmskit import ChatLLM
chat = ChatLLM.from_gemini(
model="gemini-2.5-flash",
api_key="YOUR_API_KEY",
)
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image."},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/cat.png",
"format": "image/png",
},
},
],
}
]
response = chat.complete(messages=messages)
print(response["choices"][0]["message"]["content"])
Other content part shapes
audio_part = {
"type": "input_audio",
"input_audio": {
"data": "<base64-audio-data>",
"format": "wav",
},
}
file_part = {
"type": "file",
"file": {
"file_id": "gs://bucket/report.pdf",
"format": "application/pdf",
},
}
video_part = {
"type": "video_url",
"video_url": {
"url": "gs://bucket/demo.mp4",
"format": "video/mp4",
},
}
The wrapper validates unsupported modalities early, so provider/model mismatches fail fast instead of being silently forwarded.
You can inspect capabilities at runtime:
print(chat.capabilities)
print(chat.supports_vision())
print(chat.supports_audio_input())
print(chat.supports_audio_output())
print(chat.supports_document_input())
print(chat.supports_video_input())
Provider Capability Overview
The current provider adapters expose the following capability flags in code:
| Provider | Tool calling | Reasoning | Vision | Audio input | Audio output | Document input | Video input |
|---|---|---|---|---|---|---|---|
| OpenAI-compatible | Yes | Yes | Yes | Yes | Yes | No | No |
| Claude | Yes | Yes | Yes | No | No | Yes | No |
| Gemini | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Embeddings
OpenAIEmbeddings and AsyncOpenAIEmbeddings target OpenAI-compatible embedding endpoints.
Synchronous embeddings
from llmskit import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(
base_url="https://api.openai.com/v1",
model_name="text-embedding-3-small",
api_key="YOUR_API_KEY",
batch_size=16,
)
query_vector = embeddings.embed_query("What is llmskit?")
document_vectors = embeddings.embed_documents(
[
"llmskit wraps multiple chat providers.",
"It also includes embeddings and reranking helpers.",
]
)
print(len(query_vector))
print(len(document_vectors))
print(embeddings.get_embedding_dimension())
Asynchronous embeddings
import asyncio
from llmskit import AsyncOpenAIEmbeddings
async def main() -> None:
embeddings = AsyncOpenAIEmbeddings(
base_url="https://api.openai.com/v1",
model_name="text-embedding-3-small",
api_key="YOUR_API_KEY",
)
vector = await embeddings.embed_query("hello")
print(len(vector))
asyncio.run(main())
Embedding helpers include:
- batching
- retry with exponential backoff
- max input length truncation
- cached dimension detection
Reranker
Reranker and AsyncReranker call a rerank service with a /rerank endpoint.
Synchronous reranking
from llmskit import Reranker
reranker = Reranker(
base_url="https://your-reranker-service",
model_name="bge-reranker-v2-m3",
api_key="YOUR_API_KEY",
)
result = reranker.rerank(
query="python async http client",
documents=[
"httpx supports both sync and async clients",
"Redis is an in-memory database",
"Python generators can yield values lazily",
],
top_n=2,
threshold=0.0,
)
print(result.results)
print(result.usage)
Asynchronous reranking
import asyncio
from llmskit import AsyncReranker
async def main() -> None:
reranker = AsyncReranker(
base_url="https://your-reranker-service",
model_name="bge-reranker-v2-m3",
api_key="YOUR_API_KEY",
)
result = await reranker.rerank(
query="python async http client",
documents=[
"httpx supports both sync and async clients",
"Redis is an in-memory database",
],
top_n=1,
)
print(result.results)
asyncio.run(main())
Notes
ChatLLMandAsyncChatLLMnormalize provider responses into OpenAI-style chunks and completion payloads.- The default non-streaming response is the new OpenAI-style dictionary, not the legacy dataclass.
OpenAIEmbeddingsworks with OpenAI-compatible services, including self-hosted endpoints that implement the embeddings API.- Retries are built in for transient network and server-side failures.
- The examples above use placeholder model names and endpoints; replace them with the values supported by your provider.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmskit-0.1.0.tar.gz.
File metadata
- Download URL: llmskit-0.1.0.tar.gz
- Upload date:
- Size: 46.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c38b9573f53a0f34fdc2b212c08f05ddb4e1dea87bee75463bfd4dd33b0cf1d4
|
|
| MD5 |
863a336257f79cc86b78078eed8bfb80
|
|
| BLAKE2b-256 |
4a177e15766a1feaa71f84f5c1c4650d6bb6811b213f7d5e825cba6ab6a1e453
|
File details
Details for the file llmskit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llmskit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 43.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
545db3158a4357f41cda01861fb6e2c9699451f81967316b4d1db9187e584da9
|
|
| MD5 |
835269f72683c7a4430df2f794b4b625
|
|
| BLAKE2b-256 |
3d30cde8653efed205a376dfb2ca837a392b2b614b7c9d072d4aa5d3e3193853
|