Unified streaming LLM interface with provider-agnostic reasoning/tool-call abstraction
Project description
yuullm
Unified streaming LLM interface with provider-agnostic reasoning / tool-call abstraction.
Overview
yuullm provides a standardised streaming abstraction layer over different LLM providers. It has two core responsibilities:
- Stream standardisation — normalises differences in thinking formats (
reasoning_content/thinking/ …) and tool-call protocols across providers, outputting a uniformAsyncIterator[Reasoning | ToolCall | Response]stream. - Usage + Cost collection — after the stream ends, structured
Usage(from the API) andCost(calculated by yuullm) are available via a store dict.
yuullm is stateless — it has no session concept and does not maintain conversation history.
Design Philosophy
yuullm intentionally avoids heavy abstractions:
- Messages are tuples, not classes.
("role", [items])— noSystemMessage,UserMessageimports needed. - Tools are dicts, not a custom
ToolSpec. Passlist[dict]directly — works seamlessly withyuutools.ToolManager.specs(), but with zero dependency. - Helper functions
system(),user(),assistant(),tool()for ergonomic one-liner message construction. - Multimodal native —
Item = str | dict, so images, audio, and structured content are first-class.
Installation
pip install yuullm
Quick Start
Basic Chat (with helpers)
import yuullm
client = yuullm.YLLMClient(
provider=yuullm.providers.OpenAIProvider(api_key="sk-..."),
default_model="gpt-4o",
)
messages = [
yuullm.system("You are a helpful assistant."),
yuullm.user("What is 2+2?"),
]
stream, store = await client.stream(messages)
async for item in stream:
match item:
case yuullm.Reasoning(text=t):
print(f"[thinking] {t}", end="")
case yuullm.Response(text=t):
print(t, end="")
# After stream ends
usage = store["usage"]
print(f"\nTokens: {usage.input_tokens} in / {usage.output_tokens} out")
Basic Chat (raw tuples)
Messages are just (role, items) tuples — no imports needed beyond yuullm:
import yuullm
client = yuullm.YLLMClient(
provider=yuullm.providers.OpenAIProvider(api_key="sk-..."),
default_model="gpt-4o",
)
messages = [
("system", ["You are a helpful assistant."]),
("user", ["What is 2+2?"]),
]
stream, store = await client.stream(messages)
async for stream_item in stream:
match stream_item:
case yuullm.Reasoning(item=i):
if isinstance(i, str):
print(f"[thinking] {i}", end="")
case yuullm.Response(item=i):
if isinstance(i, str):
print(i, end="")
Multimodal (with helpers)
messages = [
yuullm.system("You are a vision assistant."),
yuullm.user("What is in this image?", {
"type": "image_url",
"image_url": {"url": "https://example.com/photo.png"},
}),
]
Multimodal (raw tuples)
messages = [
("system", ["You are a vision assistant."]),
("user", [
"What is in this image?",
{"type": "image_url", "image_url": {"url": "https://example.com/photo.png"}},
]),
]
Tool Calling (with helpers)
Tools are plain list[dict] — pass json_schema dicts directly:
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
},
"required": ["city"],
},
},
}]
client = yuullm.YLLMClient(
provider=yuullm.providers.OpenAIProvider(api_key="sk-..."),
default_model="gpt-4o",
tools=tools,
)
messages = [yuullm.user("What's the weather in Tokyo?")]
stream, store = await client.stream(messages)
async for stream_item in stream:
match stream_item:
case yuullm.Reasoning(item=i):
if isinstance(i, str):
print(f"[thinking] {i}", end="")
case yuullm.Response(item=i):
if isinstance(i, str):
print(i, end="")
Or override tools per-request:
stream, store = await client.stream(messages, tools=other_tools)
Integration with yuutools
import yuutools as yt
import yuullm
manager = yt.ToolManager([search_tool, calculator_tool])
# manager.specs() returns list[dict] in OpenAI function-calling format
# pass directly to yuullm — no conversion needed
stream, store = await client.stream(messages, tools=manager.specs())
Multi-turn Conversation (with helpers)
yuullm is stateless — you manage the message list yourself:
messages = [
yuullm.system("You are a helpful assistant."),
yuullm.user("Hi, my name is Alice."),
]
# First turn
stream, store = await client.stream(messages)
reply = ""
async for stream_item in stream:
if isinstance(stream_item, yuullm.Response):
if isinstance(stream_item.item, str):
reply += stream_item.item
# Append assistant reply and next user message
messages.append(yuullm.assistant(reply))
messages.append(yuullm.user("What's my name?"))
# Second turn
stream, store = await client.stream(messages)
async for stream_item in stream:
if isinstance(stream_item, yuullm.Response):
if isinstance(stream_item.item, str):
print(stream_item.item, end="")
Multi-turn Conversation (raw tuples)
messages = [
("system", ["You are a helpful assistant."]),
("user", ["Hi, my name is Alice."]),
]
# First turn
stream, store = await client.stream(messages)
reply = ""
async for stream_item in stream:
if isinstance(stream_item, yuullm.Response):
if isinstance(stream_item.item, str):
reply += stream_item.item
# Append assistant reply and next user message
messages.append(("assistant", [reply]))
messages.append(("user", ["What's my name?"]))
# Second turn
stream, store = await client.stream(messages)
async for stream_item in stream:
if isinstance(stream_item, yuullm.Response):
if isinstance(stream_item.item, str):
print(stream_item.item, end="")
Tool Call Round-trip (with helpers)
A full tool-use loop: model calls a tool, you execute it, then feed the result back:
import json
messages = [yuullm.user("What's the weather in Paris?")]
stream, store = await client.stream(messages)
tool_calls = []
async for stream_item in stream:
match stream_item:
case yuullm.ToolCall() as tc:
tool_calls.append(tc)
case yuullm.Response(item=i):
if isinstance(i, str):
print(i, end="")
if tool_calls:
# Append assistant message with tool calls as dicts
messages.append(yuullm.assistant(
*[{"type": "tool_call", "id": tc.id, "name": tc.name, "arguments": tc.arguments}
for tc in tool_calls]
))
# Execute each tool and append results
for tc in tool_calls:
result = execute_tool(tc.name, json.loads(tc.arguments)) # your function
messages.append(yuullm.tool(tc.id, json.dumps(result)))
# Continue the conversation
stream, store = await client.stream(messages)
async for stream_item in stream:
if isinstance(stream_item, yuullm.Response):
if isinstance(stream_item.item, str):
print(stream_item.item, end="")
Tool Call Round-trip (raw tuples)
import json
messages = [("user", ["What's the weather in Paris?"])]
stream, store = await client.stream(messages)
tool_calls = []
async for stream_item in stream:
match stream_item:
case yuullm.ToolCall() as tc:
tool_calls.append(tc)
case yuullm.Response(item=i):
if isinstance(i, str):
print(i, end="")
if tool_calls:
# Append assistant message with tool call dicts
messages.append(("assistant", [
{"type": "tool_call", "id": tc.id, "name": tc.name, "arguments": tc.arguments}
for tc in tool_calls
]))
# Execute each tool and append results
for tc in tool_calls:
result = execute_tool(tc.name, json.loads(tc.arguments))
messages.append(("tool", [
{"type": "tool_result", "tool_call_id": tc.id, "content": json.dumps(result)}
]))
# Continue the conversation
stream, store = await client.stream(messages)
async for stream_item in stream:
if isinstance(stream_item, yuullm.Response):
if isinstance(stream_item.item, str):
print(stream_item.item, end="")
Cost Tracking
client = yuullm.YLLMClient(
provider=yuullm.providers.OpenAIProvider(api_key="sk-..."),
default_model="gpt-4o",
price_calculator=yuullm.PriceCalculator(
yaml_path="./custom_prices.yaml", # optional, for custom pricing
),
)
stream, store = await client.stream(messages)
async for item in stream:
... # consume the stream
usage: yuullm.Usage = store["usage"]
cost: yuullm.Cost | None = store["cost"]
print(f"Tokens: {usage.input_tokens} in / {usage.output_tokens} out")
print(f"Cache: {usage.cache_read_tokens} read / {usage.cache_write_tokens} write")
if cost:
print(f"Cost: ${cost.total_cost:.6f} (source: {cost.source})")
else:
print("Cost: unavailable (model price not found)")
Providers
OpenAI / OpenAI-compatible
provider = yuullm.providers.OpenAIProvider(
api_key="sk-...",
base_url="https://api.openai.com/v1", # or any compatible endpoint
provider_name="openai", # used for price lookup
)
Works with any OpenAI-compatible API (Azure, OpenRouter, vLLM, etc.) by setting base_url and provider_name.
Anthropic
provider = yuullm.providers.AnthropicProvider(
api_key="sk-ant-...",
provider_name="anthropic",
)
Handles Anthropic-specific streaming events including thinking_delta for extended thinking and tool_use content blocks.
Development Setup
To set up the development environment and install all project-specific git hooks:
./scripts/setup-dev.sh
This script installs git hooks for code quality and release safety. Currently includes:
- pre-push: Validates that git tag versions match
pyproject.tomlversion before pushing tags
Future development tools (linting hooks, commit message validation, etc.) will be added to this centralized setup script.
Pricing
Cost is calculated using a three-level priority system:
| Priority | Source | Description |
|---|---|---|
| 1 (highest) | Provider-supplied | Aggregators like OpenRouter / LiteLLM return cost in the API response |
| 2 | YAML config | User-supplied price table for custom / negotiated pricing |
| 3 (lowest) | genai-prices | Community-maintained database via pydantic/genai-prices |
If none of the sources can determine the price, store["cost"] is None.
YAML Price File Format
- provider: openai
models:
- id: gpt-4o
prices:
input_mtok: 2.5 # USD per million input tokens
output_mtok: 10 # USD per million output tokens
cache_read_mtok: 1.25 # optional
- provider: anthropic
models:
- id: claude-sonnet-4-20250514
prices:
input_mtok: 3
output_mtok: 15
cache_read_mtok: 0.3
cache_write_mtok: 3.75
Matching is exact on (provider, model_id). No fuzzy matching.
API Reference
YLLMClient
YLLMClient(
provider: Provider,
default_model: str,
tools: list[dict] | None = None, # json_schema tool dicts
price_calculator: PriceCalculator | None = None,
)
client.stream(messages, *, model=None, tools=None, **kwargs)
Returns (AsyncIterator[StreamItem], store). The model and tools params override the defaults set at init.
Messages
Message = tuple[str, list[Item]] # (role, items)
Item = str | dict[str, Any] # text or structured content
History = list[Message]
Helper functions:
| Function | Signature | Example |
|---|---|---|
system |
system(content: str) |
system("You are helpful.") |
user |
user(*items: Item) |
user("Hello!") / user("Look:", {"type": "image_url", ...}) |
assistant |
assistant(*items: Item) |
assistant("Sure!", {"type": "tool_call", ...}) |
tool |
tool(tool_call_id: str, content: str) |
tool("tc_1", '{"result": 42}') |
Tool call items in assistant messages use this dict shape:
{"type": "tool_call", "id": "...", "name": "...", "arguments": "..."}
Tool result items in tool messages use this dict shape:
{"type": "tool_result", "tool_call_id": "...", "content": "..."}
Stream Items
| Type | Fields | Description |
|---|---|---|
Reasoning |
item: Item |
Chain-of-thought / extended thinking fragment (text or multimodal) |
ToolCall |
id: str, name: str, arguments: str |
Tool invocation request (arguments is raw JSON) |
Response |
item: Item |
Final reply fragment (text or multimodal) |
Usage
Usage(
provider: str,
model: str,
request_id: str | None = None,
input_tokens: int = 0,
output_tokens: int = 0,
cache_read_tokens: int = 0,
cache_write_tokens: int = 0,
total_tokens: int | None = None,
)
Cost
Cost(
input_cost: float,
output_cost: float,
total_cost: float,
cache_read_cost: float = 0.0,
cache_write_cost: float = 0.0,
source: str = "", # "provider" | "yaml" | "genai-prices"
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yuullm-0.3.1.tar.gz.
File metadata
- Download URL: yuullm-0.3.1.tar.gz
- Upload date:
- Size: 13.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
952e9606fae9875cb5174b4a2dd28982cf6678d91cb76dcf02d1ee3fe9c1a57f
|
|
| MD5 |
44f764b423a9a6564493445bf61fd93d
|
|
| BLAKE2b-256 |
f3daf4d0886fe18f926e1203a99a1cc396c5f3923df5a718fced2d4ab8b19adc
|
Provenance
The following attestation bundles were made for yuullm-0.3.1.tar.gz:
Publisher:
publish.yml on yuulabs/yuullm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yuullm-0.3.1.tar.gz -
Subject digest:
952e9606fae9875cb5174b4a2dd28982cf6678d91cb76dcf02d1ee3fe9c1a57f - Sigstore transparency entry: 928399313
- Sigstore integration time:
-
Permalink:
yuulabs/yuullm@f2fc8be0c3adc25f01cd7506dd249bb3130d44e1 -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/yuulabs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f2fc8be0c3adc25f01cd7506dd249bb3130d44e1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file yuullm-0.3.1-py3-none-any.whl.
File metadata
- Download URL: yuullm-0.3.1-py3-none-any.whl
- Upload date:
- Size: 17.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bdb55bc57bc68c2c9af4596081b8f7a28e3830be1d38db1a55535a4f5d16e2b8
|
|
| MD5 |
b2571bd5b262659a6f492e7389c44ccc
|
|
| BLAKE2b-256 |
cebb039df6a9c75f3dc4a59e19aa496f1d49de0520ba2551153bd5fe67fd2095
|
Provenance
The following attestation bundles were made for yuullm-0.3.1-py3-none-any.whl:
Publisher:
publish.yml on yuulabs/yuullm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yuullm-0.3.1-py3-none-any.whl -
Subject digest:
bdb55bc57bc68c2c9af4596081b8f7a28e3830be1d38db1a55535a4f5d16e2b8 - Sigstore transparency entry: 928399315
- Sigstore integration time:
-
Permalink:
yuulabs/yuullm@f2fc8be0c3adc25f01cd7506dd249bb3130d44e1 -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/yuulabs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f2fc8be0c3adc25f01cd7506dd249bb3130d44e1 -
Trigger Event:
push
-
Statement type: