Skip to main content

SUNWÆE gen — multi-provider LLM engine library.

Project description

Coverage Python PyPI License

All LLMs, one response format, one dependency (httpx). Supports switching model in conversations (e.g. draft with GPT, refine with Anthropic).

Handles streaming, tool calls, file attachments, prompt caching, per-model reasoning effort, and cost tracking across Anthropic, OpenAI, Google, DeepSeek, xAI, and Moonshot.


Install

pip install sunwaee
pip install "sunwaee[files]"   # pdf, docx, xlsx, pptx extraction
pip install -e ".[dev,files]"  # development

Quick start

import asyncio
from sunwaee.modules.gen.engine import get_engine
from sunwaee.modules.gen.engine.types import Message, Role

# reasoning_effort=None (default) — no override; swap to non_reasoning_id if reasoning_mode="always"
engine = get_engine("anthropic", "claude-sonnet-4-6")

# reasoning_effort="high" — pick any value from model.reasoning_efforts
engine_reasoning = get_engine("anthropic", "claude-sonnet-4-6", reasoning_effort="high")

async def main():
    messages = [Message(role=Role.USER, content="Hello")]

    response = await engine.chat(messages)
    print(response.content, response.cost.total)

    async for chunk in engine.stream(messages):
        if chunk.content:
            print(chunk.content, end="", flush=True)

asyncio.run(main())

Providers

Provider provider= Env var
Anthropic "anthropic" ANTHROPIC_API_KEY
OpenAI "openai" OPENAI_API_KEY
Google "google" GOOGLE_API_KEY
DeepSeek "deepseek" DEEPSEEK_API_KEY
xAI "xai" XAI_API_KEY
Moonshot "moonshot" MOONSHOT_API_KEY

Directory structure

sunwaee/
├── core/
│   ├── logger.py                 # get_logger(name) — scoped under "sunwaee.*"
│   └── tools.py                  # @tool decorator, ok(), err()
└── modules/gen/
    ├── __init__.py               # public re-exports (get_engine, run, stream_run, …)
    ├── agent.py                  # ReAct loop — run() + stream_run()
    ├── tools.py                  # TOOLS list
    └── engine/
        ├── __init__.py           # get_engine, Message, Response, Tool, …
        ├── base.py               # BaseEngine ABC
        ├── factory.py            # get_engine() — provider routing + connection pooling
        ├── model.py              # Model dataclass + compute_cost()
        ├── types.py              # Message, Response, ToolCall, Usage, Cost, Performance, …
        ├── models/               # model registry per provider
        │   ├── __init__.py       # get_model(), list_models()
        │   ├── anthropic.py / openai.py / google.py / deepseek.py / xai.py / moonshot.py
        └── providers/
            ├── anthropic.py      # AnthropicEngine
            ├── completions.py    # CompletionsEngine — DeepSeek, Moonshot, OpenAI-compat fallbacks (/v1/chat/completions)
            ├── responses.py      # ResponsesEngine — OpenAI + xAI (/v1/responses)
            └── google.py         # GoogleEngine

tests/gen/
├── test_agent.py / test_stream_agent.py / test_tools.py
└── engine/
    ├── test_types.py / test_factory.py / test_model.py
    ├── providers/
    │   └── test_anthropic.py / test_completions.py / test_responses.py / test_google.py
    └── live/
        ├── _shared.py            # shared config, data, helpers for all live tests
        ├── test_scenarios.py     # all providers × all scenarios × chat + stream
        ├── test_tool_call_result.py  # TOOL_CALL → execute → reply, all providers
        ├── test_attachments.py   # image attachments, vision-capable providers
        ├── test_chain.py         # three-provider conversation chain
        ├── test_caching.py       # prompt-cache hit on turn 2
        ├── test_reasoning.py     # reasoning ON / OFF per model category
        └── run/                  # JSON snapshots (gitignored)

Core types (engine/types.py)

class Role(Enum):       SYSTEM, USER, ASSISTANT, TOOL, CONTEXT
class StopReason(Enum): END_TURN, TOOL_USE, MAX_TOKENS

@dataclass class Message:
    role: Role
    content: str | None
    reasoning_content: str | None       # thinking for models that support it
    reasoning_signature: str | None     # opaque blob — echo back verbatim
    tool_call_id: str | None            # set on Role.TOOL messages
    tool_calls: list[ToolCall] | None
    attachments: list[FileAttachment] | None   # Role.USER only

@dataclass class Response:
    provider: str; model: str; streaming: bool; synthetic: bool
    content: str | None; reasoning_content: str | None; reasoning_signature: str | None
    tool_calls: list[ToolCall] | None; stop_reason: StopReason | None; error: Error | None
    usage: Usage | None; cost: Cost | None; performance: Performance | None

@dataclass class ToolCall:
    id: str; name: str; arguments: dict
    thought_signature: str | None    # Google only — echo back every subsequent turn
    error: str | None; duration: float; results: list[dict]

@dataclass class Usage:
    input_tokens: int; output_tokens: int; total_tokens: int
    cache_read_tokens: int; cache_write_tokens: int

@dataclass class Cost:
    input: float; output: float; cache_read: float; cache_write: float; total: float

@dataclass class Performance:
    latency: float            # seconds to first chunk
    reasoning_duration: float; content_duration: float; total_duration: float
    throughput: int           # output tokens / second

@dataclass class FileAttachment:
    data: bytes; filename: str; media_type: str = ""
    # text/* → <file name="…">…</file> block
    # image/jpeg|png|gif|webp → base64 inline
    # application/pdf|json + OOXML (docx/xlsx/pptx) → extracted text

get_engine() — reasoning + routing control

engine = get_engine(
    provider,
    model,
    api_key=None,           # falls back to <PROVIDER>_API_KEY env var
    max_tokens=8192,
    reasoning_effort=None,  # None | "off" | "auto" | any value in model.reasoning_efforts
)

Connection pool

get_engine() reuses a single httpx.AsyncClient per (event_loop, base_url). The pool is a WeakKeyDictionary keyed by the loop object so that dead loops (common in tests) drop their clients automatically — this avoids "Event loop is closed" errors when Python reuses an integer id() for a freshly created loop. Clients are configured with Timeout(connect=5s, read=300s, write=30s) and Limits(max_connections=50). On graceful shutdown, call:

from sunwaee.modules.gen.engine import close_all_clients

await close_all_clients()

Resolution order in get_engine()

  1. Model swapreasoning_mode="always" + reasoning_effort=None → swap to non_reasoning_id. reasoning_mode=None + non-null effort → swap to reasoning_id.
  2. Drop effort on non-reasoning models — if the resolved model has supports_reasoning=False, effort is coerced to None (silently — the caller can't be expected to know every model's status).
  3. Off-coercionreasoning_effort=None on a model with reasoning_disabled_payload and "off" in its efforts list is coerced to "off" so the disabled payload merges (kimi-k2.5).
  4. Validation — non-null effort must be in model.reasoning_efforts (raises ValueError).
  5. Routing — for OpenAI-compat providers: "responses" in model.api_typeResponsesEngine; else CompletionsEngine. Anthropic → AnthropicEngine. Google → GoogleEngine.

Per-engine behavior

Engine None / "off" / "auto" Any other effort
ResponsesEngine Omits reasoning block reasoning = {effort, summary: "auto"}
CompletionsEngine "off" + reasoning_disabled_payload → merge; else omit keys payload["reasoning_effort"] = value
AnthropicEngine None → no thinking block effort pass-through (newer) / budget (older)
GoogleEngine None → no thinking config (or thinkingBudget=0 on Gemini 2.5) thinkingLevel (Gemini 3) / budget (Gemini 2.5)

Model dataclass (engine/model.py)

@dataclass class Model:
    name: str; provider: str; display_name: str; description: str | None

    # specs
    context_window: int; max_output_tokens: int | None

    # features
    supports_vision: bool
    supports_tools: bool
    supports_reasoning: bool           # True if model has any reasoning capability

    # reasoning config
    reasoning_mode: str | None         # "always" | "dynamic" | None
    reasoning_efforts: list[str] | None  # valid effort levels (e.g. ["off","low","medium","high"])
    reasoning_uses_budget: bool        # True = effort strings map to integer token budgets in factory
    reasoning_tokens_type: str | None  # "raw" | "summary" | None
    reasoning_disabled_payload: dict | None  # merged into request when reasoning is disabled
    reasoning_id: str | None           # for non-reasoning variants: name of reasoning counterpart
    non_reasoning_id: str | None       # for reasoning models: name of non-reasoning counterpart
    api_type: list[str] | None         # ["responses"] | ["completions"] | both; None for non-OpenAI-compat providers

    cache_min_tokens: int | None

    # pricing (per million tokens)
    input_price_per_mtok: float | None; output_price_per_mtok: float | None
    cache_read_price_per_mtok: float | None; cache_write_price_per_mtok: float | None
    input_price_per_mtok_128k: float | None   # xAI only
    output_price_per_mtok_128k: float | None
    input_price_per_mtok_200k: float | None   # most providers
    output_price_per_mtok_200k: float | None; ...
    input_price_per_mtok_272k: float | None   # OpenAI only
    output_price_per_mtok_272k: float | None; ...

    release_date: str | None; deprecated_at: str | None; sunset_at: str | None

reasoning_mode:

  • "always" — model always reasons; disable by swapping to non_reasoning_id (xAI models, DeepSeek Reasoner)
  • "dynamic" — reasoning can be toggled on/off via provider-specific config (Anthropic, Google, OpenAI, Moonshot)
  • None — no reasoning capability

reasoning_tokens_type:

  • "raw" — full reasoning content returned in response.reasoning_content (Anthropic, DeepSeek, kimi-k2.5, some xAI)
  • "summary" — summarised thought returned (Google, OpenAI gpt-5.x via Responses, grok-4.20)
  • None — reasoning tokens tracked internally only; content not exposed (xAI grok-4/grok-4-1-fast/grok-4-fast/grok-3-mini). Engines emit a synthetic "Reasoning in progress…" stub so consumers aren't left in the dark.

reasoning_efforts / reasoning_uses_budget:

Schema rule: "always" models → ["auto"] only, or list of levels (no "off"). "dynamic" models → ["off", ...]. None models → reasoning_efforts=None.

  • "off" always first for dynamic models; coercion in factory sets it when reasoning_effort=None is passed.
  • "auto" means "reason at model default; no effort keyword sent to the API".
  • reasoning_uses_budget=True — the efforts list is a UI/validation construct; the factory maps "low"/"medium"/"high" to integer token budgets rather than passing the string to the API. Used for Anthropic budget models (Opus/Haiku/Sonnet 4.5) and Gemini 2.5 series (flash, flash-lite).
  • Always-reasoning models with only ["auto"] (gemini-2.5-pro, grok-4, grok-3-mini, grok-code-fast-1): effort is not sent; API reasons at its own default.
  • None for models with no reasoning capability.

api_type (OpenAI-compat only): routing hint consumed by get_engine(). ["responses", "completions"] means either endpoint works; the factory picks ResponsesEngine when "responses" is present. ["completions"] (DeepSeek, Moonshot, legacy OpenAI gpt-4.x, older xAI) forces CompletionsEngine. None on Anthropic/Google (routed by provider name).


Usage

With tools

from sunwaee.core.tools import tool, ok, err
from sunwaee.modules.gen.engine.types import Tool

@tool("Return the current UTC time.")
async def get_time() -> str:
    from datetime import datetime, timezone
    return ok({"time": datetime.now(timezone.utc).isoformat()})

response = await engine.chat(messages, tools=[get_time._tool])

Tools must be async def. The agent refuses to run synchronous tools because asyncio.wait_for cannot cancel a thread once it has entered the sync body — a hung sync tool would otherwise leak threads from the default executor.

File and image attachments

from sunwaee.modules.gen.engine.types import FileAttachment, Message, Role

with open("report.pdf", "rb") as f:
    att = FileAttachment(data=f.read(), filename="report.pdf")

response = await engine.chat([Message(role=Role.USER, content="Summarise.", attachments=[att])])

Supported: text/*, application/json, image/jpeg|png|gif|webp, application/pdf, .docx, .xlsx, .pptx

Size caps enforced at construction: 10 MB for images, 20 MB for every other supported type. Oversized payloads raise ValueError before any extraction or base64 encoding runs.

ReAct agent loop

from sunwaee.modules.gen.agent import stream_run

new_messages = []
async for chunk in stream_run(
    messages,
    tools,
    engine,
    new_messages=new_messages,
    tool_timeout=60.0,          # seconds per individual tool call (default: 60)
    max_concurrent_tools=8,     # max tools running simultaneously (default: 8)
):
    if chunk.content:
        print(chunk.content, end="", flush=True)
# new_messages has all assistant + tool turns appended during the run

Up to 10 iterations by default. Concurrent tool calls via asyncio.gather, bounded by max_concurrent_tools. Tools must be async def — sync callables are rejected because asyncio.wait_for cannot cancel a thread once it has entered the sync body, which would leak executor threads on timeout. Unknown keyword arguments supplied by the model are silently filtered before calling the tool function.

Error types

All provider errors subclass EngineError(RuntimeError), so existing except RuntimeError handlers continue to work. Import subclasses to handle specific cases:

from sunwaee.modules.gen.engine import EngineError, RateLimitError, AuthError, TransientError

try:
    response = await engine.chat(messages)
except RateLimitError as e:   # 429 — back off and retry
    ...
except AuthError as e:        # 401 / 403 — invalid key
    ...
except TransientError as e:   # 5xx — server-side; may be retried
    ...
except EngineError as e:      # other 4xx
    print(e.status_code)

Listing models

from sunwaee.modules.gen.engine.models import list_models, get_model

all_models = list_models()              # list[Model]
model = get_model("claude-sonnet-4-6")  # Model | None

Testing

pytest tests/gen/ -m "not live"                                        # unit (no keys needed)
pytest tests/gen/ -m live                                              # live (real API calls)
pytest tests/gen/ -m "not live" --cov=sunwaee --cov-report=term-missing

# run a single live test file
pytest -m live tests/gen/engine/live/test_caching.py
pytest -m live tests/gen/engine/live/test_reasoning.py

Unit test conventions:

  • Mock httpx.AsyncClient — never make real HTTP calls
  • Assert response.cost, response.usage, response.performance populated on final chunk
  • For streaming, use an async generator as mock transport

Live test files and what they cover:

File What it tests
test_scenarios.py 6 scenarios × 6 providers × chat + stream (72 tests)
test_tool_call_result.py Full TOOL_CALL → execute → reply loop, all providers
test_attachments.py PNG image attachment, vision-capable providers
test_chain.py Three-provider conversation chain with shared history
test_caching.py Prompt-cache hit on turn 2, static system prompt
test_reasoning.py enable_reasoning ON / OFF per model category

Live scenarios:

Scenario What it tests
ONLY_SYSTEM System-only input edge case; lenient assertions
ONLY_USER Single user message
SYSTEM_AND_USER System prompt respected in response
TOOL_CALL Model must issue at least one tool call
TOOL_CALL_RESULT Full multi-turn with real tool IDs/signatures
FILE_ATTACHMENT Text file attached; asserts content populated
CONTEXT_ROLE Role.CONTEXT message handled without errors

All live tests default to enable_reasoning=False. test_reasoning.py is the only file that explicitly passes enable_reasoning=True.


How to add a model

File: sunwaee/modules/gen/engine/models/<provider>.py

Model(
    name="provider-model-name",
    display_name="Human Readable Name",
    provider="anthropic",
    context_window=200_000,
    max_output_tokens=64_000,
    input_price_per_mtok=3.0,
    output_price_per_mtok=15.0,
    cache_read_price_per_mtok=0.3,
    cache_write_price_per_mtok=3.75,
    input_price_per_mtok_200k=6.0,       # omit if no >200k tier
    output_price_per_mtok_200k=22.5,
    supports_vision=True,
    supports_tools=True,
    supports_reasoning=True,
    reasoning_mode="dynamic",             # "always" | "dynamic" | None
    reasoning_efforts=["off", "low", "medium", "high", "max"],  # dynamic: starts with "off"
    reasoning_uses_budget=False,          # True only for Anthropic 4.5 and Gemini 2.5 budget models
    reasoning_tokens_type="raw",          # "raw" | "summary" | None
    non_reasoning_id="model-non-reasoning",  # omit if no non-reasoning variant
    cache_min_tokens=1_024,              # omit (None) if caching is undocumented
    release_date="2025-01-01",
)

For a non-reasoning variant that pairs with a reasoning model:

Model(
    name="model-non-reasoning",
    ...same pricing...,
    supports_reasoning=False,
    reasoning_id="model",                 # points to the reasoning counterpart
)

Pricing tiers (engine/model.py): base required; _128k when input_tokens > 128_000 (xAI only); _200k when > 200_000; _272k when > 272_000 (OpenAI only). Thresholds are strict > — exactly at the boundary uses the lower tier.

cache_min_tokens — minimum tokens required at a cache breakpoint for prompt caching to activate. None = no caching. 0 = no minimum (caches everything). Known values:

Provider Minimum Models
Anthropic 4,096 Opus 4.7, Opus 4.6, Opus 4.5, Haiku 4.5
Anthropic 2,048 Sonnet 4.6
Anthropic 1,024 Sonnet 4.5
OpenAI 1,024 All models (automatic prefix caching)
Google 1,024 All models (explicit context caching)
xAI 0 All models (automatic, no minimum)
DeepSeek 64 All models (automatic prefix caching)
Moonshot 0 All models (automatic, no minimum)

How to add an OpenAI-compatible provider

  1. engine/models/<provider>.pyMODELS list
  2. engine/models/__init__.py — import + add to _ALL
  3. engine/factory.py — add to _OPENAI_COMPATIBLE: dict[str, str] (env var auto-derived as PROVIDER_API_KEY)
  4. tests/gen/engine/live/_shared.py — add ("provider", "cheapest-model") to ENGINES

How to add a provider with a custom API

  1. engine/models/<provider>.py + register in __init__.py
  2. engine/providers/<provider>.py — implement BaseEngine:
    • async def chat(self, messages, tools=None) -> Response
    • async def stream(self, messages, tools=None) -> AsyncIterator[Response]
    • Accept client: httpx.AsyncClient | None = None
    • Call resolve_tokens(usage) before compute_cost
    • Strip reasoning_content/reasoning_signature from all but the last assistant turn
    • Handle system-only input: promote to Role.USER
    • On 4xx/5xx in streaming: read full body before raising
    • Buffer tool call JSON across SSE chunks; parse only on stop
  3. engine/factory.py — wire into get_engine(), handle enable_reasoning for the new provider
  4. Tests: unit (providers/test_<provider>.py) + live entry in _shared.py

How to add a tool to the agent

from typing import Annotated
from sunwaee.core.tools import tool, ok, err

@tool("Search the web for current information.")
def web_search(
    query: Annotated[str, "The search query"],
    num_results: Annotated[int, "Number of results"] = 5,
) -> str:
    try:
        return ok(_do_search(query, num_results))
    except Exception as e:
        return err(str(e))

Register: add web_search._tool to TOOLS in sunwaee/modules/gen/tools.py.

Tests: tests/gen/test_<tool_name>.py — call directly, assert JSON output shape, test error path. Never call real external APIs.


@tool decorator

Introspects signature to build JSON Schema parameters automatically.

Supports: str, int, float, bool, list[T], Literal[...], Optional[T], Annotated[T, "description"]

  • Parameters with defaults → not required
  • Both sync and async supported
  • Must return JSON string: ok(data) / err(message) / json.dumps(...)
ok({"id": "123"})   # '{"ok": true, "data": {"id": "123"}}'
err("Not found")    # '{"ok": false, "error": "Not found"}'

Provider-specific quirks

# Rule
1 resolve_tokens() before compute_cost() — xAI/Google exclude reasoning tokens from output_tokens; resolve_tokens back-calculates from total_tokens. Always called unconditionally — it's a no-op when counts already match.
2 Strip reasoning from all but last assistant turn — stale reasoning_signature breaks APIs and blocks mid-session provider switches.
3 OpenAI uses max_completion_tokens, not max_tokens.
4 OpenAI reasoning models: yield synthetic chunk immediately — stream is silent during thinking; Response(reasoning_content="Reasoning in progress…", synthetic=True). Never treat synthetic=True as real content.
5 Google: thoughtSignature on functionCall partToolCall.thought_signature; echo every subsequent turn.
6 Google: no tool call IDs — use function name as correlation ID.
7 Google streaming: ?alt=sse required on streamGenerateContent.
8 System-only input — promote system message to Role.USER (Anthropic + Google).
9 Anthropic reasoning: two paths. Newer models (Opus 4.7/4.6, Sonnet 4.6) use output_config: {effort: X} + thinking: {type: "adaptive"}. Older models (Opus 4.5, Haiku 4.5, Sonnet 4.5) use thinking: {type: "enabled", budget_tokens: N} with 1024 ≤ budget < max_tokens. The factory selects the path via model.reasoning_uses_budget (True = budget path).
10 Connection pooling: one httpx.AsyncClient per (event_loop_id, base_url) in factory.py.
11 Role.CONTEXT mapping: all providers wrap content in <context> tags automatically — Anthropic → {"role":"user","content":"<context>…</context>"}; OpenAI → {"role":"system","content":"<context>…</context>"}; Google → {"role":"user","parts":[{"text":"<context>…</context>"}]}.
12 Google Gemini 3 uses thinkingLevel (string: "minimal"/"low"/"medium"/"high"); Gemini 2.5 flash/flash-lite use thinkingBudget (int: 0 = off, N = fixed). Factory selects via model.reasoning_uses_budget. Gemini 2.5 Pro is always-reasoning (["auto"]) — no thinking params sent. Gemini 3.1 Pro always-reasoning with level control — thinkingLevel set when effort specified.
13 kimi-k2.5 (Moonshot) reasons by default — disabling thinking requires an explicit payload {"thinking": {"type": "disabled"}}. Set via Model.reasoning_disabled_payload; the OpenAI engine merges it when reasoning_effort is None.
14 xAI always-reasoning models (grok-4.20, grok-4-1-fast, grok-4-fast) route to a non-reasoning variant on enable_reasoning=False via non_reasoning_id. Models without a non_reasoning_id (grok-4, grok-3-mini, grok-code-fast-1) cannot have reasoning disabled.
15 grok-4.20 returns reasoning_content on chat/completionsreasoning_tokens_type="summary" refers to the /v1/responses endpoint only; on chat/completions the field carries full raw reasoning text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sunwaee-1.7.2.tar.gz (88.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sunwaee-1.7.2-py3-none-any.whl (52.4 kB view details)

Uploaded Python 3

File details

Details for the file sunwaee-1.7.2.tar.gz.

File metadata

  • Download URL: sunwaee-1.7.2.tar.gz
  • Upload date:
  • Size: 88.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for sunwaee-1.7.2.tar.gz
Algorithm Hash digest
SHA256 6fb44c4d21ec8cd84c2d43b2411bd8e8d949caf9ae7f4821d2621d93c75d6cde
MD5 e9a77d9713251d96f4eb6e2cdf91315f
BLAKE2b-256 d40c1bcd939e5d03255703af85b54ee3eee308b80d8b6013afd0f1e036419ecb

See more details on using hashes here.

File details

Details for the file sunwaee-1.7.2-py3-none-any.whl.

File metadata

  • Download URL: sunwaee-1.7.2-py3-none-any.whl
  • Upload date:
  • Size: 52.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for sunwaee-1.7.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d8b268d4e4bc6493a73511de0ae4015352be59959380050885d95cc1bee65329
MD5 d553f990b36c9d9e59fd7a9ccd863a8f
BLAKE2b-256 d7567b9087d9b364f3a75ea80101926ae63035313b75f03bbcd01a13f2de60c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page