Unified streaming LLM interface with provider-agnostic reasoning/tool-call abstraction
Project description
yuullm
Unified streaming LLM interface with provider-agnostic reasoning / tool-call abstraction.
Overview
yuullm provides a standardised streaming abstraction layer over different LLM providers. It has two core responsibilities:
- Stream standardisation — normalises differences in thinking formats (
reasoning_content/thinking/ …) and tool-call protocols across providers, outputting a uniformAsyncIterator[Reasoning | ToolCall | Response]stream. - Usage + Cost collection — after the stream ends, structured
Usage(from the API) andCost(calculated by yuullm) are available via a store dict.
yuullm is stateless — it has no session concept and does not maintain conversation history.
Installation
pip install yuullm
Quick Start
Basic Chat
import yuullm
client = yuullm.YLLMClient(
provider=yuullm.providers.OpenAIProvider(api_key="sk-..."),
default_model="gpt-4o",
)
messages = [
yuullm.SystemMessage(content="You are a helpful assistant."),
yuullm.UserMessage(content="What is 2+2?"),
]
stream, store = await client.stream(messages)
async for item in stream:
match item:
case yuullm.Reasoning(text=t):
print(f"[thinking] {t}", end="")
case yuullm.Response(text=t):
print(t, end="")
# After stream ends
usage = store["usage"]
print(f"\nTokens: {usage.input_tokens} in / {usage.output_tokens} out")
Tool Calling
Tools are defined using JSON Schema (OpenAI format) and passed at client init time:
tools = [
yuullm.ToolSpec(
name="get_weather",
description="Get current weather for a city",
parameters={
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
},
"required": ["city"],
},
),
]
client = yuullm.YLLMClient(
provider=yuullm.providers.OpenAIProvider(api_key="sk-..."),
default_model="gpt-4o",
tools=tools,
)
messages = [yuullm.UserMessage(content="What's the weather in Tokyo?")]
stream, store = await client.stream(messages)
async for item in stream:
match item:
case yuullm.ToolCall(id=tid, name=name, arguments=args):
print(f"Tool call: {name}({args})")
# Execute the tool, then continue the conversation:
# messages.append(yuullm.AssistantMessage(tool_calls=[item]))
# messages.append(yuullm.ToolResultMessage(tool_call_id=tid, content='{"temp": 22}'))
case yuullm.Response(text=t):
print(t, end="")
You can also override tools per-request:
stream, store = await client.stream(messages, tools=other_tools)
Multi-turn Conversation
yuullm is stateless — you manage the message list yourself:
messages = [
yuullm.SystemMessage(content="You are a helpful assistant."),
yuullm.UserMessage(content="Hi, my name is Alice."),
]
# First turn
stream, store = await client.stream(messages)
reply = ""
async for item in stream:
if isinstance(item, yuullm.Response):
reply += item.text
# Append assistant reply and next user message
messages.append(yuullm.AssistantMessage(content=reply))
messages.append(yuullm.UserMessage(content="What's my name?"))
# Second turn
stream, store = await client.stream(messages)
async for item in stream:
if isinstance(item, yuullm.Response):
print(item.text, end="")
Tool Call Round-trip
A full tool-use loop: model calls a tool, you execute it, then feed the result back:
import json
messages = [yuullm.UserMessage(content="What's the weather in Paris?")]
stream, store = await client.stream(messages)
tool_calls = []
async for item in stream:
match item:
case yuullm.ToolCall() as tc:
tool_calls.append(tc)
case yuullm.Response(text=t):
print(t, end="")
if tool_calls:
# Append the assistant message with tool calls
messages.append(yuullm.AssistantMessage(tool_calls=tool_calls))
# Execute each tool and append results
for tc in tool_calls:
result = execute_tool(tc.name, json.loads(tc.arguments)) # your function
messages.append(yuullm.ToolResultMessage(
tool_call_id=tc.id,
content=json.dumps(result),
))
# Continue the conversation — model will use the tool results
stream, store = await client.stream(messages)
async for item in stream:
if isinstance(item, yuullm.Response):
print(item.text, end="")
Cost Tracking
client = yuullm.YLLMClient(
provider=yuullm.providers.OpenAIProvider(api_key="sk-..."),
default_model="gpt-4o",
price_calculator=yuullm.PriceCalculator(
yaml_path="./custom_prices.yaml", # optional, for custom pricing
),
)
stream, store = await client.stream(messages)
async for item in stream:
... # consume the stream
usage: yuullm.Usage = store["usage"]
cost: yuullm.Cost | None = store["cost"]
print(f"Tokens: {usage.input_tokens} in / {usage.output_tokens} out")
print(f"Cache: {usage.cache_read_tokens} read / {usage.cache_write_tokens} write")
if cost:
print(f"Cost: ${cost.total_cost:.6f} (source: {cost.source})")
else:
print("Cost: unavailable (model price not found)")
Providers
OpenAI / OpenAI-compatible
provider = yuullm.providers.OpenAIProvider(
api_key="sk-...",
base_url="https://api.openai.com/v1", # or any compatible endpoint
provider_name="openai", # used for price lookup
)
Works with any OpenAI-compatible API (Azure, OpenRouter, vLLM, etc.) by setting base_url and provider_name.
Anthropic
provider = yuullm.providers.AnthropicProvider(
api_key="sk-ant-...",
provider_name="anthropic",
)
Handles Anthropic-specific streaming events including thinking_delta for extended thinking and tool_use content blocks.
Pricing
Cost is calculated using a three-level priority system:
| Priority | Source | Description |
|---|---|---|
| 1 (highest) | Provider-supplied | Aggregators like OpenRouter / LiteLLM return cost in the API response |
| 2 | YAML config | User-supplied price table for custom / negotiated pricing |
| 3 (lowest) | genai-prices | Community-maintained database via pydantic/genai-prices |
If none of the sources can determine the price, store["cost"] is None.
YAML Price File Format
- provider: openai
models:
- id: gpt-4o
prices:
input_mtok: 2.5 # USD per million input tokens
output_mtok: 10 # USD per million output tokens
cache_read_mtok: 1.25 # optional
- provider: anthropic
models:
- id: claude-sonnet-4-20250514
prices:
input_mtok: 3
output_mtok: 15
cache_read_mtok: 0.3
cache_write_mtok: 3.75
Matching is exact on (provider, model_id). No fuzzy matching.
API Reference
YLLMClient
YLLMClient(
provider: Provider, # LLM provider instance
default_model: str, # default model name
tools: list[ToolSpec] | None = None, # tool definitions (JSON Schema)
price_calculator: PriceCalculator | None = None,
)
client.stream(messages, *, model=None, tools=None, **kwargs)
Returns (AsyncIterator[StreamItem], store). The model and tools params override the defaults set at init.
Stream Items
| Type | Fields | Description |
|---|---|---|
Reasoning |
text: str |
Chain-of-thought / extended thinking fragment |
ToolCall |
id: str, name: str, arguments: str |
Tool invocation request (arguments is raw JSON) |
Response |
text: str |
Final text reply fragment |
Messages
| Type | Fields |
|---|---|
SystemMessage |
content: str |
UserMessage |
content: str |
AssistantMessage |
content: str | None, tool_calls: list[ToolCall] | None |
ToolResultMessage |
tool_call_id: str, content: str |
Usage
Usage(
provider: str,
model: str,
request_id: str | None = None,
input_tokens: int = 0,
output_tokens: int = 0,
cache_read_tokens: int = 0,
cache_write_tokens: int = 0,
total_tokens: int | None = None,
)
Cost
Cost(
input_cost: float,
output_cost: float,
total_cost: float,
cache_read_cost: float = 0.0,
cache_write_cost: float = 0.0,
source: str = "", # "provider" | "yaml" | "genai-prices"
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yuullm-0.1.0.tar.gz.
File metadata
- Download URL: yuullm-0.1.0.tar.gz
- Upload date:
- Size: 9.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9b41866262a9a1d1e4d52b3df13e0af48a0f1f08aeab3651ec9c7778d3b8863
|
|
| MD5 |
18fd626d2b6d61df03d04867757baadd
|
|
| BLAKE2b-256 |
edf947b3c330431d38a0ce60217e0a9c4d39ba3006bfc37478f36123d2fdf777
|
Provenance
The following attestation bundles were made for yuullm-0.1.0.tar.gz:
Publisher:
publish.yml on yuulabs/yuullm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yuullm-0.1.0.tar.gz -
Subject digest:
a9b41866262a9a1d1e4d52b3df13e0af48a0f1f08aeab3651ec9c7778d3b8863 - Sigstore transparency entry: 927313529
- Sigstore integration time:
-
Permalink:
yuulabs/yuullm@ad23f12fe5a7122ceca9137248d19902c35fc1d5 -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/yuulabs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ad23f12fe5a7122ceca9137248d19902c35fc1d5 -
Trigger Event:
push
-
Statement type:
File details
Details for the file yuullm-0.1.0-py3-none-any.whl.
File metadata
- Download URL: yuullm-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b661beabd37b04e0da7fc48d5565d3632e48fcdd014c63bf1d1f1747d3524a23
|
|
| MD5 |
584957337b6e0434e523d3dcf4f417d9
|
|
| BLAKE2b-256 |
a8b5f8db9d489a9641bc8f97c2fd6465fad0c02a4d6aa88c2fc327e5073e3ce2
|
Provenance
The following attestation bundles were made for yuullm-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on yuulabs/yuullm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yuullm-0.1.0-py3-none-any.whl -
Subject digest:
b661beabd37b04e0da7fc48d5565d3632e48fcdd014c63bf1d1f1747d3524a23 - Sigstore transparency entry: 927313531
- Sigstore integration time:
-
Permalink:
yuulabs/yuullm@ad23f12fe5a7122ceca9137248d19902c35fc1d5 -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/yuulabs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ad23f12fe5a7122ceca9137248d19902c35fc1d5 -
Trigger Event:
push
-
Statement type: