Chat Completion Stream Handler

These details have not been verified by PyPI

Project links

Project description

chat-cmpl-stream-handler

You've reimplemented the tool call loop for the fifth time. So have I. Never again.

Why

OpenAI Responses API? Still evolving. Agents SDK? Promising — frameworks always are, at first. Chat Completions API? Boring, stable, everywhere.

This library does exactly two things that everyone keeps copy-pasting across projects:

Stream a chat completion and handle events
Keep looping tool calls until the model is done

That's it. No magic. No framework. Just the loop.

Installation

pip install chat-cmpl-stream-handler

Quick Start

import asyncio
import json
from openai import AsyncOpenAI
from chat_cmpl_stream_handler import stream_until_user_input

client = AsyncOpenAI(api_key="...")

GET_WEATHER_TOOL = {
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather for a given city.",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"],
            "additionalProperties": False,
        },
        "strict": True,
    },
}


async def get_weather(arguments: str, context) -> str:
    args = json.loads(arguments)
    return f"The weather in {args['city']} is sunny and 25°C."


async def main():
    result = await stream_until_user_input(
        messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
        model="gpt-4.1-nano",
        openai_client=client,
        tool_invokers={"get_weather": get_weather},
        stream_kwargs={
            "tools": [GET_WEATHER_TOOL],
            "stream_options": {"include_usage": True},
        },
    )

    # user → assistant (tool_calls) → tool → assistant (final answer)
    for msg in result.to_input_list():
        print(msg["role"], "->", msg.get("content", ""))

    for usage in result.usages:
        print(f"total tokens: {usage.total_tokens}")


asyncio.run(main())

Listening to stream events

Subclass ChatCompletionStreamHandler and override whatever you care about:

from chat_cmpl_stream_handler import ChatCompletionStreamHandler
from openai.lib.streaming.chat._events import ContentDeltaEvent, FunctionToolCallArgumentsDoneEvent


class PrintingHandler(ChatCompletionStreamHandler):
    async def on_content_delta(self, event: ContentDeltaEvent) -> None:
        print(event.delta, end="", flush=True)

    async def on_tool_calls_function_arguments_done(
        self, event: FunctionToolCallArgumentsDoneEvent
    ) -> None:
        print(f"\n[calling] {event.name}({event.arguments})")

Building tools from MCP servers

If you already expose capabilities through an MCP server, you can turn them into OpenAI-compatible tools plus tool_invokers in one step:

from chat_cmpl_stream_handler.utils.mcp import MCPServerConfig, build_mcp_tools_and_invokers


mcp_tools, mcp_tool_invokers = await build_mcp_tools_and_invokers(
    [
        MCPServerConfig(
            server_url="https://marketplace-mcp.us-east-1.api.aws/mcp",
            server_label="aws",
        )
    ]
)

result = await stream_until_user_input(
    messages=[{"role": "user", "content": "Use aws__get_cost_and_usage and summarize it."}],
    model="gpt-4.1",
    openai_client=client,
    tool_invokers=mcp_tool_invokers,
    stream_kwargs={"tools": mcp_tools},
)

Notes:

server_label="aws" prefixes discovered tools like aws__tool_name
if you pass an initialized ClientSession into MCPServerConfig(session=...), tool discovery and tool calls reuse that session without reconnecting
runtime context from stream_until_user_input(..., context=...) is forwarded into MCP meta["context"]

Building tools from Pydantic models

For local tools with typed inputs, use the Pydantic helpers directly from chat_cmpl_stream_handler.utils:

from typing import Any

from pydantic import BaseModel

from chat_cmpl_stream_handler.utils.pydantic_to_tool import (
    PydanticToolConfig,
    build_pydantic_tools_and_invokers,
)


class EchoRequest(BaseModel):
    """Echo text back to the user."""

    text: str


async def echo_tool(arguments: EchoRequest, context: Any) -> str:
    return f"{context}: {arguments.text}"


pydantic_tools, pydantic_tool_invokers = build_pydantic_tools_and_invokers(
    [
        PydanticToolConfig(
            model=EchoRequest,
            invoker=echo_tool,
        )
    ]
)

result = await stream_until_user_input(
    messages=[{"role": "user", "content": "Call echo_request with text=hello"}],
    model="gpt-4.1",
    openai_client=client,
    tool_invokers=pydantic_tool_invokers,
    stream_kwargs={"tools": pydantic_tools},
    context="demo",
)

The generated invoker validates the tool arguments with model_validate_json(...) before calling your handler.

API Reference

`stream_until_user_input`

async def stream_until_user_input(
    messages: Iterable[ChatCompletionMessageParam],
    model: str | ChatModel,
    openai_client: AsyncOpenAI,
    *,
    stream_handler: ChatCompletionStreamHandler[ResponseFormatT] | None = None,
    tool_invokers: dict[str, ToolInvokerFn] | None = None,
    stream_kwargs: dict[str, Any] | None = None,
    context: Any | None = None,
    max_iterations: int = 10,
) -> StreamResult

Streams a completion, executes tool calls, feeds results back, repeats — until the model stops asking for tools. Raises MaxIterationsReached if you've somehow ended up in an infinite tool call loop (it happens).

Parameter	Description
`messages`	Initial message list
`model`	Model name
`openai_client`	`AsyncOpenAI` instance
`stream_handler`	Receives stream events. Default: a no-op `ChatCompletionStreamHandler()`
`tool_invokers`	`{"tool_name": async_fn}` — each fn takes `(arguments: str, context)` and returns `str`
`stream_kwargs`	Passed directly to `beta.chat.completions.stream()` (e.g. `tools`, `stream_options`)
`context`	Forwarded to every tool invoker as-is
`max_iterations`	Safety cap. Default: 10

`StreamResult`

Attribute / Method	Description
`.to_input_list()`	Full message history as a JSON-serializable list, ready for the next round
`.usages`	`list[CompletionUsage]` — one per iteration, so you can watch the bill grow

`ChatCompletionStreamHandler`

All methods are no-ops by default. Override only what you need.

Method	When it fires
`on_event(event)`	Every event, before more specific hooks
`on_chunk(event)`	Every raw SSE chunk
`on_content_delta(event)`	Each content token
`on_content_done(event)`	Full content string complete
`on_refusal_delta(event)`	Each refusal token
`on_refusal_done(event)`	Full refusal string complete
`on_tool_calls_function_arguments_delta(event)`	Each incremental tool argument fragment
`on_tool_calls_function_arguments_done(event)`	Full tool argument JSON available
`on_logprobs_content_delta(event)`	Each logprobs content token
`on_logprobs_content_done(event)`	All logprobs content tokens done
`on_logprobs_refusal_delta(event)`	Each logprobs refusal token
`on_logprobs_refusal_done(event)`	All logprobs refusal tokens done

Provider Compatibility

Works with any OpenAI-compatible endpoint. Some providers are more compatible than others.

Anthropic

Anthropic's Messages API is not OpenAI-compatible. Use the included AnthropicOpenAI adapter — a drop-in AsyncOpenAI subclass that translates requests under the hood (no extra dependencies required):

from chat_cmpl_stream_handler._anthropic import AnthropicOpenAI

client = AnthropicOpenAI(api_key="sk-ant-...")
result = await stream_until_user_input(
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    model="claude-haiku-4-5-20251001",
    openai_client=client,
    tool_invokers={"get_weather": get_weather},
    stream_kwargs={"tools": [GET_WEATHER_TOOL]},
)

A few differences from OpenAI to be aware of:

Usage is always returned — no need to pass stream_options: {"include_usage": True}.
The strict field in tool definitions is silently ignored (Anthropic doesn't support it).
OpenAI-only keys (stream_options, response_format) are stripped before the request is sent.

Gemini

Gemini's streaming API sends tool_call_delta.index = None, which the OpenAI SDK does not appreciate. Apply the included patch once at startup:

from chat_cmpl_stream_handler._patch_stream_tool_call_index import apply
apply()  # safe to call multiple times

Put it at the top of main.py, or in conftest.py if you're testing. This is opt-in — the library won't silently monkey-patch anything on import.

Gemini 3 thought signatures: Gemini 3 models require a thought_signature to be echoed back during multi-turn function calling. stream_until_user_input preserves these signatures automatically — no action needed on your side.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.5.0

Apr 24, 2026

0.4.2

Apr 20, 2026

0.4.1

Apr 6, 2026

0.4.0

Apr 3, 2026

0.3.1

Apr 2, 2026

0.3.0

Apr 1, 2026

This version

0.2.2

Mar 30, 2026

0.2.1

Mar 25, 2026

0.2.0

Mar 24, 2026

0.1.0

Mar 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chat_cmpl_stream_handler-0.2.2.tar.gz (18.2 kB view details)

Uploaded Mar 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

chat_cmpl_stream_handler-0.2.2-py3-none-any.whl (19.3 kB view details)

Uploaded Mar 30, 2026 Python 3

File details

Details for the file chat_cmpl_stream_handler-0.2.2.tar.gz.

File metadata

Download URL: chat_cmpl_stream_handler-0.2.2.tar.gz
Upload date: Mar 30, 2026
Size: 18.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.3 CPython/3.12.13 Darwin/25.3.0

File hashes

Hashes for chat_cmpl_stream_handler-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`62a25c4f928be51e83100386fdcc1863c447d9a05658f6649cb20aed9c58b7a0`
MD5	`12a42c20aca77f2817c7fcbba16c7134`
BLAKE2b-256	`d2602d49df3e304600fcc8bbcbb32f88c5dfe189bdcd5039eedb6835261b0c51`

See more details on using hashes here.

File details

Details for the file chat_cmpl_stream_handler-0.2.2-py3-none-any.whl.

File metadata

Download URL: chat_cmpl_stream_handler-0.2.2-py3-none-any.whl
Upload date: Mar 30, 2026
Size: 19.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.3 CPython/3.12.13 Darwin/25.3.0

File hashes

Hashes for chat_cmpl_stream_handler-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`845c259be17cc70d028ee4e890a4ce5ddb336214ac1a073923151329ec174245`
MD5	`f0e1bf1a9c33a7c8b74d2211699143ee`
BLAKE2b-256	`f8f75f4334cd22e3c54d107d62841ee7390cb3cb638ab1977f7ba012af0a6ede`

See more details on using hashes here.

chat-cmpl-stream-handler 0.2.2

Navigation

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

chat-cmpl-stream-handler

Why

Installation

Quick Start

Listening to stream events

Building tools from MCP servers

Building tools from Pydantic models

API Reference

stream_until_user_input

StreamResult

ChatCompletionStreamHandler

Provider Compatibility

Anthropic

Gemini

License

Project details

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`stream_until_user_input`

`StreamResult`

`ChatCompletionStreamHandler`