Unified streaming LLM interface with provider-agnostic reasoning/tool-call abstraction

Project description

yuullm

Unified streaming LLM interface with provider-agnostic reasoning / tool-call abstraction.

What It Does

yuullm normalises the streaming differences across LLM providers (OpenAI, Anthropic, and any OpenAI-compatible API) into a uniform AsyncIterator[Reasoning | ToolCall | Response]. It also collects Usage and Cost after the stream ends.

yuullm is stateless — no session, no history management. You own the message list.

Design in One Sentence

Messages are tuples, tools are dicts, output items are typed structs — minimal abstraction, maximum interop.

Installation

pip install yuullm

Quick Start

import yuullm

client = yuullm.YLLMClient(
    provider=yuullm.providers.OpenAIProvider(api_key="sk-..."),
    default_model="gpt-4o",
)

messages = [
    yuullm.system("You are a helpful assistant."),
    yuullm.user("What is 2+2?"),
]

stream, store = await client.stream(messages)
async for item in stream:
    match item:
        case yuullm.Reasoning(text=t):
            print(f"[thinking] {t}", end="")
        case yuullm.Response(text=t):
            print(t, end="")

# After stream ends
usage = store["usage"]
cost = store["cost"]  # Cost | None

Best Practice: Tool-Call Round-Trip

When you give tools to an LLM, the model may respond with ToolCall items instead of (or alongside) text. You need to execute those calls and feed results back. Here's the idiomatic pattern:

import json
import yuullm

messages = [yuullm.user("What's the weather in Paris?")]

while True:
    stream, store = await client.stream(messages, tools=tools)

    tool_calls: list[yuullm.ToolCall] = []
    async for item in stream:
        match item:
            case yuullm.Reasoning(item=text) if isinstance(text, str):
                print(f"[thinking] {text}", end="")
            case yuullm.ToolCall() as tc:
                tool_calls.append(tc)
            case yuullm.Response(item=text) if isinstance(text, str):
                print(text, end="")
            case yuullm.Tick():
                pass  # heartbeat during tool-call streaming, safe to ignore

    if not tool_calls:
        break  # model replied with text, done

    # Append assistant message containing tool calls
    messages.append(yuullm.assistant(
        *[{"type": "tool_call", "id": tc.id,
           "name": tc.name, "arguments": tc.arguments}
          for tc in tool_calls]
    ))

    # Execute each tool and append results
    for tc in tool_calls:
        result = execute_tool(tc.name, json.loads(tc.arguments))
        messages.append(yuullm.tool(tc.id, json.dumps(result)))

Key points:

Use match/case to dispatch all four stream item types. Tick carries no payload — ignore it unless you have a reason not to.
The while True loop handles multi-round tool use (the model may chain multiple tool calls before producing a final text response).
yuullm.assistant(...) and yuullm.tool(...) are helpers that build the correct (role, items) tuples.

Hooks: Provider-Level Visibility

Motivation

yuullm abstracts away raw provider chunks into Reasoning | ToolCall | Response. But sometimes you need the raw chunks — for example, to forward SSE events to a frontend in real time, or to detect a specific tool call name before the full arguments finish streaming.

The on_raw_chunk hook gives you provider-level visibility without abandoning the streaming abstraction.

`on_raw_chunk`

Pass a callback to client.stream(). It fires on every raw provider chunk before yuullm processes it:

def forward_to_frontend(chunk):
    # chunk type depends on provider:
    #   OpenAI:    openai.types.chat.ChatCompletionChunk
    #   Anthropic: event object with .type attribute
    sse_queue.put(chunk)

stream, store = await client.stream(
    messages,
    on_raw_chunk=forward_to_frontend,
)

`Tick` Heartbeat

Problem: During tool-call streaming, the provider accumulates argument deltas internally and yields nothing to the async-for loop. If your on_raw_chunk hook pushes SSE events into a queue, the consumer loop never gets a chance to flush them until the tool call finishes — SSE events arrive in a burst instead of in real time.

Solution: When on_raw_chunk is registered, yuullm yields Tick() items during tool-call argument accumulation. Tick carries no data; it just keeps your async-for loop spinning so side-channel work (like flushing an SSE queue) can proceed promptly.

If you don't use on_raw_chunk, no Tick is ever emitted — fully backward compatible.

`on_tool_call_name` Helper

Fires a callback when a specific tool call name is detected in the raw stream, useful for early UI feedback (e.g., showing a "Searching..." indicator before arguments finish streaming):

def on_search_start(index: int):
    notify_ui("search_started")

stream, store = await client.stream(
    messages,
    on_raw_chunk=yuullm.on_tool_call_name("search", on_search_start),
)

Providers

OpenAI / OpenAI-compatible

provider = yuullm.providers.OpenAIProvider(
    api_key="sk-...",
    base_url="https://api.openai.com/v1",  # or any compatible endpoint
    provider_name="openai",                 # used for price lookup
)

Works with any OpenAI-compatible API (DeepSeek, OpenRouter, vLLM, etc.) by setting base_url and provider_name:

# DeepSeek
provider = yuullm.providers.OpenAIProvider(
    api_key="sk-...",
    base_url="https://api.deepseek.com/v1",
    provider_name="deepseek",
)

Anthropic

provider = yuullm.providers.AnthropicProvider(
    api_key="sk-ant-...",
    provider_name="anthropic",
)

Pricing

Cost is calculated using a three-level fallback:

Priority	Source	Description
1 (highest)	Provider-supplied	Aggregators like OpenRouter return cost in the API response
2	YAML config	User-supplied price table for custom / negotiated pricing
3 (lowest)	genai-prices	Community-maintained database via pydantic/genai-prices

If none match, store["cost"] is None — never blocks.

client = yuullm.YLLMClient(
    provider=...,
    default_model="gpt-4o",
    price_calculator=yuullm.PriceCalculator(
        yaml_path="./custom_prices.yaml",  # optional
    ),
)

YAML price file format

- provider: openai
  models:
    - id: gpt-4o
      prices:
        input_mtok: 2.5        # USD per million input tokens
        output_mtok: 10
        cache_read_mtok: 1.25   # optional

- provider: anthropic
  models:
    - id: claude-sonnet-4-20250514
      prices:
        input_mtok: 3
        output_mtok: 15
        cache_read_mtok: 0.3
        cache_write_mtok: 3.75

Matching is exact on (provider, model_id). No fuzzy matching.

API Reference

Messages

Message = tuple[str, list[Item]]   # (role, items)
Item = str | dict[str, Any]        # text or structured content (image, audio, tool_call, ...)

Helper functions: system(content), user(*items), assistant(*items), tool(tool_call_id, content).

Messages are plain tuples — you can also write ("user", ["Hello!"]) directly without helpers.

Stream Items

Type	Fields	Description
`Reasoning`	`item: Item`	Chain-of-thought fragment
`ToolCall`	`id`, `name`, `arguments`	Tool invocation (`arguments` is raw JSON string)
`Response`	`item: Item`	Final reply fragment
`Tick`	(none)	Heartbeat during tool-call streaming (only when `on_raw_chunk` is set)

YLLMClient

YLLMClient(
    provider: Provider,
    default_model: str,
    tools: list[dict] | None = None,
    price_calculator: PriceCalculator | None = None,
)

`client.stream(messages, *, model=None, tools=None, on_raw_chunk=None, **kwargs)`

Returns (AsyncIterator[StreamItem], store). model and tools override the defaults. After the iterator is exhausted, store["usage"] is a Usage and store["cost"] is Cost | None.

Usage & Cost

Usage(provider, model, request_id, input_tokens, output_tokens, cache_read_tokens, cache_write_tokens, total_tokens)
Cost(input_cost, output_cost, total_cost, cache_read_cost, cache_write_cost, source)

Development Setup

./scripts/setup-dev.sh

Installs git hooks (currently: pre-push tag/version validation).

Project details

Release history Release notifications | RSS feed

1.0.0

Apr 24, 2026

0.7.0

Apr 1, 2026

This version

0.6.0

Mar 31, 2026

0.4.2

Mar 8, 2026

0.4.1

Feb 15, 2026

0.4.0

Feb 15, 2026

0.3.1

Feb 8, 2026

0.3.0

Feb 8, 2026

0.2.0

Feb 8, 2026

0.1.0

Feb 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yuullm-0.6.0.tar.gz (21.0 kB view details)

Uploaded Mar 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

yuullm-0.6.0-py3-none-any.whl (28.5 kB view details)

Uploaded Mar 31, 2026 Python 3

File details

Details for the file yuullm-0.6.0.tar.gz.

File metadata

Download URL: yuullm-0.6.0.tar.gz
Upload date: Mar 31, 2026
Size: 21.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yuullm-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`5fe6e710ac8fc68770ee6a75dec352db24b313257b56246cb835a2a4f7ff4dbd`
MD5	`86f020a81f0fe0be8a505ed11778aa03`
BLAKE2b-256	`28669cc27375d3df8dee00a78432ab4de32b3d129c16f2c68e302eb9db9db620`

See more details on using hashes here.

Provenance

The following attestation bundles were made for yuullm-0.6.0.tar.gz:

Publisher: publish.yml on yuulabs/yuullm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: yuullm-0.6.0.tar.gz
- Subject digest: 5fe6e710ac8fc68770ee6a75dec352db24b313257b56246cb835a2a4f7ff4dbd
- Sigstore transparency entry: 1202330541
- Sigstore integration time: Mar 31, 2026
Source repository:
- Permalink: yuulabs/yuullm@5bf2b4ff0f64a1dbd34c89de8a43a451bdaf2f6a
- Branch / Tag: refs/tags/v0.6.0
- Owner: https://github.com/yuulabs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5bf2b4ff0f64a1dbd34c89de8a43a451bdaf2f6a
- Trigger Event: push

File details

Details for the file yuullm-0.6.0-py3-none-any.whl.

File metadata

Download URL: yuullm-0.6.0-py3-none-any.whl
Upload date: Mar 31, 2026
Size: 28.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yuullm-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`89c2880cd2c982998aa56ba2aa186a07fe526c93b918edc4997e2177675d98f6`
MD5	`b184026580bf12393a0cba49b62f7af0`
BLAKE2b-256	`9cf339cf130c7c8e863d031498ab1d14d99c3a6b15fdafbe8401a43d23d470ad`

See more details on using hashes here.

Provenance

The following attestation bundles were made for yuullm-0.6.0-py3-none-any.whl:

Publisher: publish.yml on yuulabs/yuullm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: yuullm-0.6.0-py3-none-any.whl
- Subject digest: 89c2880cd2c982998aa56ba2aa186a07fe526c93b918edc4997e2177675d98f6
- Sigstore transparency entry: 1202330964
- Sigstore integration time: Mar 31, 2026
Source repository:
- Permalink: yuulabs/yuullm@5bf2b4ff0f64a1dbd34c89de8a43a451bdaf2f6a
- Branch / Tag: refs/tags/v0.6.0
- Owner: https://github.com/yuulabs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5bf2b4ff0f64a1dbd34c89de8a43a451bdaf2f6a
- Trigger Event: push

yuullm 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

yuullm

What It Does

Design in One Sentence

Installation

Quick Start

Best Practice: Tool-Call Round-Trip

Hooks: Provider-Level Visibility

Motivation

on_raw_chunk

Tick Heartbeat

on_tool_call_name Helper

Providers

OpenAI / OpenAI-compatible

Anthropic

Pricing

API Reference

Messages

Stream Items

YLLMClient

client.stream(messages, *, model=None, tools=None, on_raw_chunk=None, **kwargs)

Usage & Cost

Development Setup

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`on_raw_chunk`

`Tick` Heartbeat

`on_tool_call_name` Helper

`client.stream(messages, *, model=None, tools=None, on_raw_chunk=None, **kwargs)`