Unified streaming LLM interface with provider-agnostic reasoning/tool-call abstraction
Project description
yuullm
Unified streaming LLM interface with provider-agnostic reasoning / tool-call abstraction.
What It Does
yuullm normalises the streaming differences across LLM providers (OpenAI, Anthropic, and any OpenAI-compatible API) into a uniform AsyncIterator[Reasoning | ToolCall | Response]. It also collects Usage and Cost after the stream ends.
yuullm is stateless — no session, no history management. You own the message list.
Design in One Sentence
Messages are tuples, tools are dicts, output items are typed structs — minimal abstraction, maximum interop.
Installation
pip install yuullm
Quick Start
import yuullm
client = yuullm.YLLMClient(
provider=yuullm.providers.OpenAIProvider(api_key="sk-..."),
default_model="gpt-4o",
)
messages = [
yuullm.system("You are a helpful assistant."),
yuullm.user("What is 2+2?"),
]
stream, store = await client.stream(messages)
async for item in stream:
match item:
case yuullm.Reasoning(text=t):
print(f"[thinking] {t}", end="")
case yuullm.Response(text=t):
print(t, end="")
# After stream ends
usage = store["usage"]
cost = store["cost"] # Cost | None
Best Practice: Tool-Call Round-Trip
When you give tools to an LLM, the model may respond with ToolCall items instead of (or alongside) text. You need to execute those calls and feed results back. Here's the idiomatic pattern:
import json
import yuullm
messages = [yuullm.user("What's the weather in Paris?")]
while True:
stream, store = await client.stream(messages, tools=tools)
tool_calls: list[yuullm.ToolCall] = []
async for item in stream:
match item:
case yuullm.Reasoning(item=text) if isinstance(text, str):
print(f"[thinking] {text}", end="")
case yuullm.ToolCall() as tc:
tool_calls.append(tc)
case yuullm.Response(item=text) if isinstance(text, str):
print(text, end="")
case yuullm.Tick():
pass # heartbeat during tool-call streaming, safe to ignore
if not tool_calls:
break # model replied with text, done
# Append assistant message containing tool calls
messages.append(yuullm.assistant(
*[{"type": "tool_call", "id": tc.id,
"name": tc.name, "arguments": tc.arguments}
for tc in tool_calls]
))
# Execute each tool and append results
for tc in tool_calls:
result = execute_tool(tc.name, json.loads(tc.arguments))
messages.append(yuullm.tool(tc.id, json.dumps(result)))
Key points:
- Use
match/caseto dispatch all four stream item types.Tickcarries no payload — ignore it unless you have a reason not to. - The
while Trueloop handles multi-round tool use (the model may chain multiple tool calls before producing a final text response). yuullm.assistant(...)andyuullm.tool(...)are helpers that build the correct(role, items)tuples.
Hooks: Provider-Level Visibility
Motivation
yuullm abstracts away raw provider chunks into Reasoning | ToolCall | Response. But sometimes you need the raw chunks — for example, to forward SSE events to a frontend in real time, or to detect a specific tool call name before the full arguments finish streaming.
The on_raw_chunk hook gives you provider-level visibility without abandoning the streaming abstraction.
on_raw_chunk
Pass a callback to client.stream(). It fires on every raw provider chunk before yuullm processes it:
def forward_to_frontend(chunk):
# chunk type depends on provider:
# OpenAI: openai.types.chat.ChatCompletionChunk
# Anthropic: event object with .type attribute
sse_queue.put(chunk)
stream, store = await client.stream(
messages,
on_raw_chunk=forward_to_frontend,
)
Tick Heartbeat
Problem: During tool-call streaming, the provider accumulates argument deltas internally and yields nothing to the async-for loop. If your on_raw_chunk hook pushes SSE events into a queue, the consumer loop never gets a chance to flush them until the tool call finishes — SSE events arrive in a burst instead of in real time.
Solution: When on_raw_chunk is registered, yuullm yields Tick() items during tool-call argument accumulation. Tick carries no data; it just keeps your async-for loop spinning so side-channel work (like flushing an SSE queue) can proceed promptly.
If you don't use on_raw_chunk, no Tick is ever emitted — fully backward compatible.
on_tool_call_name Helper
Fires a callback when a specific tool call name is detected in the raw stream, useful for early UI feedback (e.g., showing a "Searching..." indicator before arguments finish streaming):
def on_search_start(index: int):
notify_ui("search_started")
stream, store = await client.stream(
messages,
on_raw_chunk=yuullm.on_tool_call_name("search", on_search_start),
)
Providers
OpenAI / OpenAI-compatible
provider = yuullm.providers.OpenAIProvider(
api_key="sk-...",
base_url="https://api.openai.com/v1", # or any compatible endpoint
provider_name="openai", # used for price lookup
)
Works with any OpenAI-compatible API (DeepSeek, OpenRouter, vLLM, etc.) by setting base_url and provider_name:
# DeepSeek
provider = yuullm.providers.OpenAIProvider(
api_key="sk-...",
base_url="https://api.deepseek.com/v1",
provider_name="deepseek",
)
Anthropic
provider = yuullm.providers.AnthropicProvider(
api_key="sk-ant-...",
provider_name="anthropic",
)
Pricing
Cost is calculated using a three-level fallback:
| Priority | Source | Description |
|---|---|---|
| 1 (highest) | Provider-supplied | Aggregators like OpenRouter return cost in the API response |
| 2 | YAML config | User-supplied price table for custom / negotiated pricing |
| 3 (lowest) | genai-prices | Community-maintained database via pydantic/genai-prices |
If none match, store["cost"] is None — never blocks.
client = yuullm.YLLMClient(
provider=...,
default_model="gpt-4o",
price_calculator=yuullm.PriceCalculator(
yaml_path="./custom_prices.yaml", # optional
),
)
YAML price file format
- provider: openai
models:
- id: gpt-4o
prices:
input_mtok: 2.5 # USD per million input tokens
output_mtok: 10
cache_read_mtok: 1.25 # optional
- provider: anthropic
models:
- id: claude-sonnet-4-20250514
prices:
input_mtok: 3
output_mtok: 15
cache_read_mtok: 0.3
cache_write_mtok: 3.75
Matching is exact on (provider, model_id). No fuzzy matching.
API Reference
Messages
Message = tuple[str, list[Item]] # (role, items)
Item = str | dict[str, Any] # text or structured content (image, audio, tool_call, ...)
Helper functions: system(content), user(*items), assistant(*items), tool(tool_call_id, content).
Messages are plain tuples — you can also write ("user", ["Hello!"]) directly without helpers.
Stream Items
| Type | Fields | Description |
|---|---|---|
Reasoning |
item: Item |
Chain-of-thought fragment |
ToolCall |
id, name, arguments |
Tool invocation (arguments is raw JSON string) |
Response |
item: Item |
Final reply fragment |
Tick |
(none) | Heartbeat during tool-call streaming (only when on_raw_chunk is set) |
YLLMClient
YLLMClient(
provider: Provider,
default_model: str,
tools: list[dict] | None = None,
price_calculator: PriceCalculator | None = None,
)
client.stream(messages, *, model=None, tools=None, on_raw_chunk=None, **kwargs)
Returns (AsyncIterator[StreamItem], store). model and tools override the defaults. After the iterator is exhausted, store["usage"] is a Usage and store["cost"] is Cost | None.
Usage & Cost
Usage(provider, model, request_id, input_tokens, output_tokens, cache_read_tokens, cache_write_tokens, total_tokens)
Cost(input_cost, output_cost, total_cost, cache_read_cost, cache_write_cost, source)
Development Setup
./scripts/setup-dev.sh
Installs git hooks (currently: pre-push tag/version validation).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yuullm-0.7.0.tar.gz.
File metadata
- Download URL: yuullm-0.7.0.tar.gz
- Upload date:
- Size: 21.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8052748c6e22e49d1a947c46343e0a426386ccf65ee66538ef0fd57ef849a357
|
|
| MD5 |
5c163117584b06b577e28b12fa7ef771
|
|
| BLAKE2b-256 |
e8a57a4f0e7b1db15e1e9e4006e0385b8c84f98f2502c644f2c052636cc4e96b
|
Provenance
The following attestation bundles were made for yuullm-0.7.0.tar.gz:
Publisher:
publish.yml on yuulabs/yuullm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yuullm-0.7.0.tar.gz -
Subject digest:
8052748c6e22e49d1a947c46343e0a426386ccf65ee66538ef0fd57ef849a357 - Sigstore transparency entry: 1204836038
- Sigstore integration time:
-
Permalink:
yuulabs/yuullm@71022d81577af442c9abac907306538124ad5bf2 -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/yuulabs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@71022d81577af442c9abac907306538124ad5bf2 -
Trigger Event:
push
-
Statement type:
File details
Details for the file yuullm-0.7.0-py3-none-any.whl.
File metadata
- Download URL: yuullm-0.7.0-py3-none-any.whl
- Upload date:
- Size: 29.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70d086761d874c433d4f491d90b10c2fd8e217d132e5c509cdaa91db6e309d92
|
|
| MD5 |
5476a7ec14f80b70cc4d3832e5b93a14
|
|
| BLAKE2b-256 |
db8d7189e03ac7911afcf21b6beadac7e03ac013951011c84fc8381b486fd7f4
|
Provenance
The following attestation bundles were made for yuullm-0.7.0-py3-none-any.whl:
Publisher:
publish.yml on yuulabs/yuullm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yuullm-0.7.0-py3-none-any.whl -
Subject digest:
70d086761d874c433d4f491d90b10c2fd8e217d132e5c509cdaa91db6e309d92 - Sigstore transparency entry: 1204836070
- Sigstore integration time:
-
Permalink:
yuulabs/yuullm@71022d81577af442c9abac907306538124ad5bf2 -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/yuulabs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@71022d81577af442c9abac907306538124ad5bf2 -
Trigger Event:
push
-
Statement type: