Skip to main content

Unified streaming LLM interface with provider-agnostic reasoning/tool-call abstraction

Project description

yuullm

Unified streaming LLM interface with provider-agnostic reasoning / tool-call abstraction.

Overview

yuullm provides a standardised streaming abstraction layer over different LLM providers. It has two core responsibilities:

  1. Stream standardisation — normalises differences in thinking formats (reasoning_content / thinking / …) and tool-call protocols across providers, outputting a uniform AsyncIterator[Reasoning | ToolCall | Response] stream.
  2. Usage + Cost collection — after the stream ends, structured Usage (from the API) and Cost (calculated by yuullm) are available via a store dict.

yuullm is stateless — it has no session concept and does not maintain conversation history.

Design Philosophy

yuullm intentionally avoids heavy abstractions:

  • Messages are tuples, not classes. ("role", [items]) — no SystemMessage, UserMessage imports needed.
  • Tools are dicts, not a custom ToolSpec. Pass list[dict] directly — works seamlessly with yuutools.ToolManager.specs(), but with zero dependency.
  • Helper functions system(), user(), assistant(), tool() for ergonomic one-liner message construction.
  • Multimodal nativeItem = str | dict, so images, audio, and structured content are first-class.

Installation

pip install yuullm

Quick Start

Basic Chat (with helpers)

import yuullm

client = yuullm.YLLMClient(
    provider=yuullm.providers.OpenAIProvider(api_key="sk-..."),
    default_model="gpt-4o",
)

messages = [
    yuullm.system("You are a helpful assistant."),
    yuullm.user("What is 2+2?"),
]

stream, store = await client.stream(messages)
async for item in stream:
    match item:
        case yuullm.Reasoning(text=t):
            print(f"[thinking] {t}", end="")
        case yuullm.Response(text=t):
            print(t, end="")

# After stream ends
usage = store["usage"]
print(f"\nTokens: {usage.input_tokens} in / {usage.output_tokens} out")

Basic Chat (raw tuples)

Messages are just (role, items) tuples — no imports needed beyond yuullm:

import yuullm

client = yuullm.YLLMClient(
    provider=yuullm.providers.OpenAIProvider(api_key="sk-..."),
    default_model="gpt-4o",
)

messages = [
    ("system", ["You are a helpful assistant."]),
    ("user", ["What is 2+2?"]),
]

stream, store = await client.stream(messages)
async for stream_item in stream:
    match stream_item:
        case yuullm.Reasoning(item=i):
            if isinstance(i, str):
                print(f"[thinking] {i}", end="")
        case yuullm.Response(item=i):
            if isinstance(i, str):
                print(i, end="")

Multimodal (with helpers)

messages = [
    yuullm.system("You are a vision assistant."),
    yuullm.user("What is in this image?", {
        "type": "image_url",
        "image_url": {"url": "https://example.com/photo.png"},
    }),
]

Multimodal (raw tuples)

messages = [
    ("system", ["You are a vision assistant."]),
    ("user", [
        "What is in this image?",
        {"type": "image_url", "image_url": {"url": "https://example.com/photo.png"}},
    ]),
]

Tool Calling (with helpers)

Tools are plain list[dict] — pass json_schema dicts directly:

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"},
            },
            "required": ["city"],
        },
    },
}]

client = yuullm.YLLMClient(
    provider=yuullm.providers.OpenAIProvider(api_key="sk-..."),
    default_model="gpt-4o",
    tools=tools,
)

messages = [yuullm.user("What's the weather in Tokyo?")]
stream, store = await client.stream(messages)

async for stream_item in stream:
    match stream_item:
        case yuullm.Reasoning(item=i):
            if isinstance(i, str):
                print(f"[thinking] {i}", end="")
        case yuullm.Response(item=i):
            if isinstance(i, str):
                print(i, end="")

Or override tools per-request:

stream, store = await client.stream(messages, tools=other_tools)

Integration with yuutools

import yuutools as yt
import yuullm

manager = yt.ToolManager([search_tool, calculator_tool])

# manager.specs() returns list[dict] in OpenAI function-calling format
# pass directly to yuullm — no conversion needed
stream, store = await client.stream(messages, tools=manager.specs())

Multi-turn Conversation (with helpers)

yuullm is stateless — you manage the message list yourself:

messages = [
    yuullm.system("You are a helpful assistant."),
    yuullm.user("Hi, my name is Alice."),
]

# First turn
stream, store = await client.stream(messages)
reply = ""
async for stream_item in stream:
    if isinstance(stream_item, yuullm.Response):
        if isinstance(stream_item.item, str):
            reply += stream_item.item

# Append assistant reply and next user message
messages.append(yuullm.assistant(reply))
messages.append(yuullm.user("What's my name?"))

# Second turn
stream, store = await client.stream(messages)
async for stream_item in stream:
    if isinstance(stream_item, yuullm.Response):
        if isinstance(stream_item.item, str):
            print(stream_item.item, end="")

Multi-turn Conversation (raw tuples)

messages = [
    ("system", ["You are a helpful assistant."]),
    ("user", ["Hi, my name is Alice."]),
]

# First turn
stream, store = await client.stream(messages)
reply = ""
async for stream_item in stream:
    if isinstance(stream_item, yuullm.Response):
        if isinstance(stream_item.item, str):
            reply += stream_item.item

# Append assistant reply and next user message
messages.append(("assistant", [reply]))
messages.append(("user", ["What's my name?"]))

# Second turn
stream, store = await client.stream(messages)
async for stream_item in stream:
    if isinstance(stream_item, yuullm.Response):
        if isinstance(stream_item.item, str):
            print(stream_item.item, end="")

Tool Call Round-trip (with helpers)

A full tool-use loop: model calls a tool, you execute it, then feed the result back:

import json

messages = [yuullm.user("What's the weather in Paris?")]

stream, store = await client.stream(messages)
tool_calls = []
async for stream_item in stream:
    match stream_item:
        case yuullm.ToolCall() as tc:
            tool_calls.append(tc)
        case yuullm.Response(item=i):
            if isinstance(i, str):
                print(i, end="")

if tool_calls:
    # Append assistant message with tool calls as dicts
    messages.append(yuullm.assistant(
        *[{"type": "tool_call", "id": tc.id, "name": tc.name, "arguments": tc.arguments}
          for tc in tool_calls]
    ))

    # Execute each tool and append results
    for tc in tool_calls:
        result = execute_tool(tc.name, json.loads(tc.arguments))  # your function
        messages.append(yuullm.tool(tc.id, json.dumps(result)))

    # Continue the conversation
    stream, store = await client.stream(messages)
    async for stream_item in stream:
        if isinstance(stream_item, yuullm.Response):
            if isinstance(stream_item.item, str):
                print(stream_item.item, end="")

Tool Call Round-trip (raw tuples)

import json

messages = [("user", ["What's the weather in Paris?"])]

stream, store = await client.stream(messages)
tool_calls = []
async for stream_item in stream:
    match stream_item:
        case yuullm.ToolCall() as tc:
            tool_calls.append(tc)
        case yuullm.Response(item=i):
            if isinstance(i, str):
                print(i, end="")

if tool_calls:
    # Append assistant message with tool call dicts
    messages.append(("assistant", [
        {"type": "tool_call", "id": tc.id, "name": tc.name, "arguments": tc.arguments}
        for tc in tool_calls
    ]))

    # Execute each tool and append results
    for tc in tool_calls:
        result = execute_tool(tc.name, json.loads(tc.arguments))
        messages.append(("tool", [
            {"type": "tool_result", "tool_call_id": tc.id, "content": json.dumps(result)}
        ]))

    # Continue the conversation
    stream, store = await client.stream(messages)
    async for stream_item in stream:
        if isinstance(stream_item, yuullm.Response):
            if isinstance(stream_item.item, str):
                print(stream_item.item, end="")

Cost Tracking

client = yuullm.YLLMClient(
    provider=yuullm.providers.OpenAIProvider(api_key="sk-..."),
    default_model="gpt-4o",
    price_calculator=yuullm.PriceCalculator(
        yaml_path="./custom_prices.yaml",  # optional, for custom pricing
    ),
)

stream, store = await client.stream(messages)
async for item in stream:
    ...  # consume the stream

usage: yuullm.Usage = store["usage"]
cost: yuullm.Cost | None = store["cost"]

print(f"Tokens: {usage.input_tokens} in / {usage.output_tokens} out")
print(f"Cache:  {usage.cache_read_tokens} read / {usage.cache_write_tokens} write")
if cost:
    print(f"Cost: ${cost.total_cost:.6f} (source: {cost.source})")
else:
    print("Cost: unavailable (model price not found)")

Providers

OpenAI / OpenAI-compatible

provider = yuullm.providers.OpenAIProvider(
    api_key="sk-...",
    base_url="https://api.openai.com/v1",  # or any compatible endpoint
    provider_name="openai",                 # used for price lookup
)

Works with any OpenAI-compatible API (Azure, OpenRouter, vLLM, etc.) by setting base_url and provider_name.

Anthropic

provider = yuullm.providers.AnthropicProvider(
    api_key="sk-ant-...",
    provider_name="anthropic",
)

Handles Anthropic-specific streaming events including thinking_delta for extended thinking and tool_use content blocks.

Development Setup

To set up the development environment and install all project-specific git hooks:

./scripts/setup-dev.sh

This script installs git hooks for code quality and release safety. Currently includes:

  • pre-push: Validates that git tag versions match pyproject.toml version before pushing tags

Future development tools (linting hooks, commit message validation, etc.) will be added to this centralized setup script.

Pricing

Cost is calculated using a three-level priority system:

Priority Source Description
1 (highest) Provider-supplied Aggregators like OpenRouter / LiteLLM return cost in the API response
2 YAML config User-supplied price table for custom / negotiated pricing
3 (lowest) genai-prices Community-maintained database via pydantic/genai-prices

If none of the sources can determine the price, store["cost"] is None.

YAML Price File Format

- provider: openai
  models:
    - id: gpt-4o
      prices:
        input_mtok: 2.5        # USD per million input tokens
        output_mtok: 10         # USD per million output tokens
        cache_read_mtok: 1.25   # optional

- provider: anthropic
  models:
    - id: claude-sonnet-4-20250514
      prices:
        input_mtok: 3
        output_mtok: 15
        cache_read_mtok: 0.3
        cache_write_mtok: 3.75

Matching is exact on (provider, model_id). No fuzzy matching.

API Reference

YLLMClient

YLLMClient(
    provider: Provider,
    default_model: str,
    tools: list[dict] | None = None,              # json_schema tool dicts
    price_calculator: PriceCalculator | None = None,
)

client.stream(messages, *, model=None, tools=None, **kwargs)

Returns (AsyncIterator[StreamItem], store). The model and tools params override the defaults set at init.

Messages

Message = tuple[str, list[Item]]  # (role, items)
Item = str | dict[str, Any]       # text or structured content
History = list[Message]

Helper functions:

Function Signature Example
system system(content: str) system("You are helpful.")
user user(*items: Item) user("Hello!") / user("Look:", {"type": "image_url", ...})
assistant assistant(*items: Item) assistant("Sure!", {"type": "tool_call", ...})
tool tool(tool_call_id: str, content: str) tool("tc_1", '{"result": 42}')

Tool call items in assistant messages use this dict shape:

{"type": "tool_call", "id": "...", "name": "...", "arguments": "..."}

Tool result items in tool messages use this dict shape:

{"type": "tool_result", "tool_call_id": "...", "content": "..."}

Stream Items

Type Fields Description
Reasoning item: Item Chain-of-thought / extended thinking fragment (text or multimodal)
ToolCall id: str, name: str, arguments: str Tool invocation request (arguments is raw JSON)
Response item: Item Final reply fragment (text or multimodal)

Usage

Usage(
    provider: str,
    model: str,
    request_id: str | None = None,
    input_tokens: int = 0,
    output_tokens: int = 0,
    cache_read_tokens: int = 0,
    cache_write_tokens: int = 0,
    total_tokens: int | None = None,
)

Cost

Cost(
    input_cost: float,
    output_cost: float,
    total_cost: float,
    cache_read_cost: float = 0.0,
    cache_write_cost: float = 0.0,
    source: str = "",  # "provider" | "yaml" | "genai-prices"
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yuullm-0.3.1.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yuullm-0.3.1-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file yuullm-0.3.1.tar.gz.

File metadata

  • Download URL: yuullm-0.3.1.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yuullm-0.3.1.tar.gz
Algorithm Hash digest
SHA256 952e9606fae9875cb5174b4a2dd28982cf6678d91cb76dcf02d1ee3fe9c1a57f
MD5 44f764b423a9a6564493445bf61fd93d
BLAKE2b-256 f3daf4d0886fe18f926e1203a99a1cc396c5f3923df5a718fced2d4ab8b19adc

See more details on using hashes here.

Provenance

The following attestation bundles were made for yuullm-0.3.1.tar.gz:

Publisher: publish.yml on yuulabs/yuullm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yuullm-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: yuullm-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 17.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yuullm-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bdb55bc57bc68c2c9af4596081b8f7a28e3830be1d38db1a55535a4f5d16e2b8
MD5 b2571bd5b262659a6f492e7389c44ccc
BLAKE2b-256 cebb039df6a9c75f3dc4a59e19aa496f1d49de0520ba2551153bd5fe67fd2095

See more details on using hashes here.

Provenance

The following attestation bundles were made for yuullm-0.3.1-py3-none-any.whl:

Publisher: publish.yml on yuulabs/yuullm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page