Unified streaming LLM interface with provider-agnostic reasoning/tool-call abstraction

Project description

yuullm

Unified streaming LLM interface with provider-agnostic reasoning / tool-call abstraction.

Overview

yuullm provides a standardised streaming abstraction layer over different LLM providers. It has two core responsibilities:

Stream standardisation — normalises differences in thinking formats (reasoning_content / thinking / …) and tool-call protocols across providers, outputting a uniform AsyncIterator[Reasoning | ToolCall | Response] stream.
Usage + Cost collection — after the stream ends, structured Usage (from the API) and Cost (calculated by yuullm) are available via a store dict.

yuullm is stateless — it has no session concept and does not maintain conversation history.

Installation

pip install yuullm

Quick Start

Basic Chat

import yuullm

client = yuullm.YLLMClient(
    provider=yuullm.providers.OpenAIProvider(api_key="sk-..."),
    default_model="gpt-4o",
)

messages = [
    yuullm.SystemMessage(content="You are a helpful assistant."),
    yuullm.UserMessage(content="What is 2+2?"),
]

stream, store = await client.stream(messages)
async for item in stream:
    match item:
        case yuullm.Reasoning(text=t):
            print(f"[thinking] {t}", end="")
        case yuullm.Response(text=t):
            print(t, end="")

# After stream ends
usage = store["usage"]
print(f"\nTokens: {usage.input_tokens} in / {usage.output_tokens} out")

Tool Calling

Tools are defined using JSON Schema (OpenAI format) and passed at client init time:

tools = [
    yuullm.ToolSpec(
        name="get_weather",
        description="Get current weather for a city",
        parameters={
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"},
            },
            "required": ["city"],
        },
    ),
]

client = yuullm.YLLMClient(
    provider=yuullm.providers.OpenAIProvider(api_key="sk-..."),
    default_model="gpt-4o",
    tools=tools,
)

messages = [yuullm.UserMessage(content="What's the weather in Tokyo?")]
stream, store = await client.stream(messages)

async for item in stream:
    match item:
        case yuullm.ToolCall(id=tid, name=name, arguments=args):
            print(f"Tool call: {name}({args})")
            # Execute the tool, then continue the conversation:
            # messages.append(yuullm.AssistantMessage(tool_calls=[item]))
            # messages.append(yuullm.ToolResultMessage(tool_call_id=tid, content='{"temp": 22}'))
        case yuullm.Response(text=t):
            print(t, end="")

You can also override tools per-request:

stream, store = await client.stream(messages, tools=other_tools)

Multi-turn Conversation

yuullm is stateless — you manage the message list yourself:

messages = [
    yuullm.SystemMessage(content="You are a helpful assistant."),
    yuullm.UserMessage(content="Hi, my name is Alice."),
]

# First turn
stream, store = await client.stream(messages)
reply = ""
async for item in stream:
    if isinstance(item, yuullm.Response):
        reply += item.text

# Append assistant reply and next user message
messages.append(yuullm.AssistantMessage(content=reply))
messages.append(yuullm.UserMessage(content="What's my name?"))

# Second turn
stream, store = await client.stream(messages)
async for item in stream:
    if isinstance(item, yuullm.Response):
        print(item.text, end="")

Tool Call Round-trip

A full tool-use loop: model calls a tool, you execute it, then feed the result back:

import json

messages = [yuullm.UserMessage(content="What's the weather in Paris?")]

stream, store = await client.stream(messages)
tool_calls = []
async for item in stream:
    match item:
        case yuullm.ToolCall() as tc:
            tool_calls.append(tc)
        case yuullm.Response(text=t):
            print(t, end="")

if tool_calls:
    # Append the assistant message with tool calls
    messages.append(yuullm.AssistantMessage(tool_calls=tool_calls))

    # Execute each tool and append results
    for tc in tool_calls:
        result = execute_tool(tc.name, json.loads(tc.arguments))  # your function
        messages.append(yuullm.ToolResultMessage(
            tool_call_id=tc.id,
            content=json.dumps(result),
        ))

    # Continue the conversation — model will use the tool results
    stream, store = await client.stream(messages)
    async for item in stream:
        if isinstance(item, yuullm.Response):
            print(item.text, end="")

Cost Tracking

client = yuullm.YLLMClient(
    provider=yuullm.providers.OpenAIProvider(api_key="sk-..."),
    default_model="gpt-4o",
    price_calculator=yuullm.PriceCalculator(
        yaml_path="./custom_prices.yaml",  # optional, for custom pricing
    ),
)

stream, store = await client.stream(messages)
async for item in stream:
    ...  # consume the stream

usage: yuullm.Usage = store["usage"]
cost: yuullm.Cost | None = store["cost"]

print(f"Tokens: {usage.input_tokens} in / {usage.output_tokens} out")
print(f"Cache:  {usage.cache_read_tokens} read / {usage.cache_write_tokens} write")
if cost:
    print(f"Cost: ${cost.total_cost:.6f} (source: {cost.source})")
else:
    print("Cost: unavailable (model price not found)")

Providers

OpenAI / OpenAI-compatible

provider = yuullm.providers.OpenAIProvider(
    api_key="sk-...",
    base_url="https://api.openai.com/v1",  # or any compatible endpoint
    provider_name="openai",                 # used for price lookup
)

Works with any OpenAI-compatible API (Azure, OpenRouter, vLLM, etc.) by setting base_url and provider_name.

Anthropic

provider = yuullm.providers.AnthropicProvider(
    api_key="sk-ant-...",
    provider_name="anthropic",
)

Handles Anthropic-specific streaming events including thinking_delta for extended thinking and tool_use content blocks.

Pricing

Cost is calculated using a three-level priority system:

Priority	Source	Description
1 (highest)	Provider-supplied	Aggregators like OpenRouter / LiteLLM return cost in the API response
2	YAML config	User-supplied price table for custom / negotiated pricing
3 (lowest)	genai-prices	Community-maintained database via pydantic/genai-prices

If none of the sources can determine the price, store["cost"] is None.

YAML Price File Format

- provider: openai
  models:
    - id: gpt-4o
      prices:
        input_mtok: 2.5        # USD per million input tokens
        output_mtok: 10         # USD per million output tokens
        cache_read_mtok: 1.25   # optional

- provider: anthropic
  models:
    - id: claude-sonnet-4-20250514
      prices:
        input_mtok: 3
        output_mtok: 15
        cache_read_mtok: 0.3
        cache_write_mtok: 3.75

Matching is exact on (provider, model_id). No fuzzy matching.

API Reference

YLLMClient

YLLMClient(
    provider: Provider,          # LLM provider instance
    default_model: str,          # default model name
    tools: list[ToolSpec] | None = None,           # tool definitions (JSON Schema)
    price_calculator: PriceCalculator | None = None,
)

`client.stream(messages, *, model=None, tools=None, **kwargs)`

Returns (AsyncIterator[StreamItem], store). The model and tools params override the defaults set at init.

Stream Items

Type	Fields	Description
`Reasoning`	`text: str`	Chain-of-thought / extended thinking fragment
`ToolCall`	`id: str`, `name: str`, `arguments: str`	Tool invocation request (`arguments` is raw JSON)
`Response`	`text: str`	Final text reply fragment

Messages

Type	Fields
`SystemMessage`	`content: str`
`UserMessage`	`content: str`
`AssistantMessage`	`content: str \| None`, `tool_calls: list[ToolCall] \| None`
`ToolResultMessage`	`tool_call_id: str`, `content: str`

Usage

Usage(
    provider: str,
    model: str,
    request_id: str | None = None,
    input_tokens: int = 0,
    output_tokens: int = 0,
    cache_read_tokens: int = 0,
    cache_write_tokens: int = 0,
    total_tokens: int | None = None,
)

Cost

Cost(
    input_cost: float,
    output_cost: float,
    total_cost: float,
    cache_read_cost: float = 0.0,
    cache_write_cost: float = 0.0,
    source: str = "",  # "provider" | "yaml" | "genai-prices"
)

Project details

Release history Release notifications | RSS feed

1.0.0

Apr 24, 2026

0.7.0

Apr 1, 2026

0.6.0

Mar 31, 2026

0.4.2

Mar 8, 2026

0.4.1

Feb 15, 2026

0.4.0

Feb 15, 2026

0.3.1

Feb 8, 2026

0.3.0

Feb 8, 2026

0.2.0

Feb 8, 2026

This version

0.1.0

Feb 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yuullm-0.1.0.tar.gz (9.7 kB view details)

Uploaded Feb 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

yuullm-0.1.0-py3-none-any.whl (13.9 kB view details)

Uploaded Feb 8, 2026 Python 3

File details

Details for the file yuullm-0.1.0.tar.gz.

File metadata

Download URL: yuullm-0.1.0.tar.gz
Upload date: Feb 8, 2026
Size: 9.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yuullm-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a9b41866262a9a1d1e4d52b3df13e0af48a0f1f08aeab3651ec9c7778d3b8863`
MD5	`18fd626d2b6d61df03d04867757baadd`
BLAKE2b-256	`edf947b3c330431d38a0ce60217e0a9c4d39ba3006bfc37478f36123d2fdf777`

See more details on using hashes here.

Provenance

The following attestation bundles were made for yuullm-0.1.0.tar.gz:

Publisher: publish.yml on yuulabs/yuullm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: yuullm-0.1.0.tar.gz
- Subject digest: a9b41866262a9a1d1e4d52b3df13e0af48a0f1f08aeab3651ec9c7778d3b8863
- Sigstore transparency entry: 927313529
- Sigstore integration time: Feb 8, 2026
Source repository:
- Permalink: yuulabs/yuullm@ad23f12fe5a7122ceca9137248d19902c35fc1d5
- Branch / Tag: refs/tags/v0.0.1
- Owner: https://github.com/yuulabs
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ad23f12fe5a7122ceca9137248d19902c35fc1d5
- Trigger Event: push

File details

Details for the file yuullm-0.1.0-py3-none-any.whl.

File metadata

Download URL: yuullm-0.1.0-py3-none-any.whl
Upload date: Feb 8, 2026
Size: 13.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yuullm-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b661beabd37b04e0da7fc48d5565d3632e48fcdd014c63bf1d1f1747d3524a23`
MD5	`584957337b6e0434e523d3dcf4f417d9`
BLAKE2b-256	`a8b5f8db9d489a9641bc8f97c2fd6465fad0c02a4d6aa88c2fc327e5073e3ce2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for yuullm-0.1.0-py3-none-any.whl:

Publisher: publish.yml on yuulabs/yuullm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: yuullm-0.1.0-py3-none-any.whl
- Subject digest: b661beabd37b04e0da7fc48d5565d3632e48fcdd014c63bf1d1f1747d3524a23
- Sigstore transparency entry: 927313531
- Sigstore integration time: Feb 8, 2026
Source repository:
- Permalink: yuulabs/yuullm@ad23f12fe5a7122ceca9137248d19902c35fc1d5
- Branch / Tag: refs/tags/v0.0.1
- Owner: https://github.com/yuulabs
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ad23f12fe5a7122ceca9137248d19902c35fc1d5
- Trigger Event: push

yuullm 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

yuullm

Overview

Installation

Quick Start

Basic Chat

Tool Calling

Multi-turn Conversation

Tool Call Round-trip

Cost Tracking

Providers

OpenAI / OpenAI-compatible

Anthropic

Pricing

YAML Price File Format

API Reference

YLLMClient

client.stream(messages, *, model=None, tools=None, **kwargs)

Stream Items

Messages

Usage

Cost

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`client.stream(messages, *, model=None, tools=None, **kwargs)`