Skip to main content

SUNWÆE gen — multi-provider LLM engine library.

Project description

Coverage Python PyPI License

All LLMs, one response format, one dependency (httpx). Supports switching model in conversations (e.g. draft with GPT, refine with Anthropic).

Handles streaming, tool calls, file attachments, prompt caching, per-model reasoning effort, and cost tracking across Anthropic, OpenAI, Google, DeepSeek, xAI, and Moonshot.


Install

pip install sunwaee
pip install "sunwaee[files]"   # pdf, docx, xlsx, pptx extraction
pip install -e ".[dev,files]"  # development

Quick start

import asyncio
from sunwaee.modules.gen.engine import get_engine
from sunwaee.modules.gen.engine.types import Message, Role

# reasoning_effort=None (default) — no override; swap to non_reasoning_id if reasoning_mode="always"
engine = get_engine("anthropic", "claude-sonnet-4-6")

# reasoning_effort="high" — pick any value from model.reasoning_efforts
engine_reasoning = get_engine("anthropic", "claude-sonnet-4-6", reasoning_effort="high")

async def main():
    messages = [Message(role=Role.USER, content="Hello")]

    response = await engine.chat(messages)
    print(response.content, response.cost.total)

    async for chunk in engine.stream(messages):
        if chunk.content:
            print(chunk.content, end="", flush=True)

asyncio.run(main())

Providers

Provider provider= Env var
Anthropic "anthropic" ANTHROPIC_API_KEY
OpenAI "openai" OPENAI_API_KEY
Google "google" GOOGLE_API_KEY
DeepSeek "deepseek" DEEPSEEK_API_KEY
xAI "xai" XAI_API_KEY
Moonshot "moonshot" MOONSHOT_API_KEY

Directory structure

sunwaee/
├── core/
│   ├── logger.py                 # get_logger(name) — scoped under "sunwaee.*"
│   └── tools.py                  # @tool decorator, ok(), err()
└── modules/gen/
    ├── __init__.py               # public re-exports (get_engine, run, stream_run, …)
    ├── agent.py                  # ReAct loop — run() + stream_run()
    ├── tools.py                  # TOOLS list
    └── engine/
        ├── __init__.py           # get_engine, Message, Response, Tool, …
        ├── base.py               # BaseEngine ABC
        ├── factory.py            # get_engine() — provider routing + connection pooling
        ├── model.py              # Model dataclass + compute_cost()
        ├── types.py              # Message, Response, ToolCall, Usage, Cost, Performance, …
        ├── models/               # model registry per provider
        │   ├── __init__.py       # get_model(), list_models()
        │   ├── anthropic.py / openai.py / google.py / deepseek.py / xai.py / moonshot.py
        └── providers/
            ├── anthropic.py      # AnthropicEngine
            ├── completions.py    # CompletionsEngine — DeepSeek, Moonshot, OpenAI-compat fallbacks (/v1/chat/completions)
            ├── responses.py      # ResponsesEngine — OpenAI + xAI (/v1/responses)
            └── google.py         # GoogleEngine

tests/gen/
├── test_agent.py / test_stream_agent.py / test_tools.py
└── engine/
    ├── test_types.py / test_factory.py / test_model.py
    ├── providers/
    │   └── test_anthropic.py / test_completions.py / test_responses.py / test_google.py
    └── live/
        ├── _shared.py            # shared config, data, helpers for all live tests
        ├── test_scenarios.py     # all providers × all scenarios × chat + stream
        ├── test_tool_call_result.py  # TOOL_CALL → execute → reply, all providers
        ├── test_attachments.py   # image attachments, vision-capable providers
        ├── test_chain.py         # three-provider conversation chain
        ├── test_caching.py       # prompt-cache hit on turn 2
        ├── test_reasoning.py     # reasoning ON / OFF per model category
        └── run/                  # JSON snapshots (gitignored)

Core types (engine/types.py)

class Role(Enum):       SYSTEM, USER, ASSISTANT, TOOL, CONTEXT
class StopReason(Enum): END_TURN, TOOL_USE, MAX_TOKENS

@dataclass class Message:
    role: Role
    content: str | None
    reasoning_content: str | None       # thinking for models that support it
    reasoning_signature: str | None     # opaque blob — echo back verbatim
    tool_call_id: str | None            # set on Role.TOOL messages
    tool_calls: list[ToolCall] | None
    attachments: list[FileAttachment] | None   # Role.USER only

@dataclass class Response:
    provider: str; model: str; streaming: bool; synthetic: bool
    content: str | None; reasoning_content: str | None; reasoning_signature: str | None
    tool_calls: list[ToolCall] | None; stop_reason: StopReason | None; error: Error | None
    usage: Usage | None; cost: Cost | None; performance: Performance | None

@dataclass class ToolCall:
    id: str; name: str; arguments: dict
    thought_signature: str | None    # Google only — echo back every subsequent turn
    error: str | None; duration: float; results: list[dict]

@dataclass class Usage:
    input_tokens: int; output_tokens: int; total_tokens: int
    cache_read_tokens: int; cache_write_tokens: int

@dataclass class Cost:
    input: float; output: float; cache_read: float; cache_write: float; total: float

@dataclass class Performance:
    latency: float            # seconds to first chunk
    reasoning_duration: float; content_duration: float; total_duration: float
    throughput: int           # output tokens / second

@dataclass class FileAttachment:
    data: bytes; filename: str; media_type: str = ""
    # text/* → <file name="…">…</file> block
    # image/jpeg|png|gif|webp → base64 inline
    # application/pdf|json + OOXML (docx/xlsx/pptx) → extracted text

get_engine() — reasoning + routing control

engine = get_engine(
    provider,
    model,
    api_key=None,           # falls back to <PROVIDER>_API_KEY env var
    max_tokens=8192,
    reasoning_effort=None,  # None | "off" | "auto" | any value in model.reasoning_efforts
)

Connection pool

get_engine() reuses a single httpx.AsyncClient per (event_loop, base_url). The pool is a WeakKeyDictionary keyed by the loop object so that dead loops (common in tests) drop their clients automatically — this avoids "Event loop is closed" errors when Python reuses an integer id() for a freshly created loop. Clients are configured with Timeout(connect=5s, read=300s, write=30s) and Limits(max_connections=50). On graceful shutdown, call:

from sunwaee.modules.gen.engine import close_all_clients

await close_all_clients()

Resolution order in get_engine()

  1. Model swapreasoning_mode="always" + reasoning_effort=None → swap to non_reasoning_id. reasoning_mode=None + non-null effort → swap to reasoning_id.
  2. Drop effort on non-reasoning models — if the resolved model has supports_reasoning=False, effort is coerced to None (silently — the caller can't be expected to know every model's status).
  3. Off-coercionreasoning_effort=None on a model with reasoning_disabled_payload and "off" in its efforts list is coerced to "off" so the disabled payload merges (kimi-k2.5).
  4. Validation — non-null effort must be in model.reasoning_efforts (raises ValueError).
  5. Routing — for OpenAI-compat providers: "responses" in model.api_typeResponsesEngine; else CompletionsEngine. Anthropic → AnthropicEngine. Google → GoogleEngine.

Per-engine behavior

Engine None / "off" / "auto" Any other effort
ResponsesEngine Omits reasoning block reasoning = {effort, summary: "auto"}
CompletionsEngine "off" + reasoning_disabled_payload → merge; else omit keys payload["reasoning_effort"] = value
AnthropicEngine None → no thinking block effort pass-through (newer) / budget (older)
GoogleEngine None → no thinking config (or thinkingBudget=0 on Gemini 2.5) thinkingLevel (Gemini 3) / budget (Gemini 2.5)

Model dataclass (engine/model.py)

@dataclass class Model:
    name: str; provider: str; display_name: str; description: str | None

    # specs
    context_window: int; max_output_tokens: int | None

    # features
    supports_vision: bool
    supports_tools: bool
    supports_reasoning: bool           # True if model has any reasoning capability

    # reasoning config
    reasoning_mode: str | None         # "always" | "dynamic" | None
    reasoning_efforts: list[str] | None  # valid effort levels (e.g. ["low","medium","high"])
    reasoning_tokens_type: str | None  # "raw" | "summary" | None
    reasoning_disabled_payload: dict | None  # merged into request when reasoning is disabled
    reasoning_id: str | None           # for non-reasoning variants: name of reasoning counterpart
    non_reasoning_id: str | None       # for reasoning models: name of non-reasoning counterpart
    api_type: list[str] | None         # ["responses"] | ["completions"] | both; None for non-OpenAI-compat providers

    cache_min_tokens: int | None

    # pricing (per million tokens)
    input_price_per_mtok: float | None; output_price_per_mtok: float | None
    cache_read_price_per_mtok: float | None; cache_write_price_per_mtok: float | None
    input_price_per_mtok_128k: float | None   # xAI only
    output_price_per_mtok_128k: float | None
    input_price_per_mtok_200k: float | None   # most providers
    output_price_per_mtok_200k: float | None; ...
    input_price_per_mtok_272k: float | None   # OpenAI only
    output_price_per_mtok_272k: float | None; ...

    release_date: str | None; deprecated_at: str | None; sunset_at: str | None

reasoning_mode:

  • "always" — model always reasons; disable by swapping to non_reasoning_id (xAI models, DeepSeek Reasoner)
  • "dynamic" — reasoning can be toggled on/off via provider-specific config (Anthropic, Google, OpenAI, Moonshot)
  • None — no reasoning capability

reasoning_tokens_type:

  • "raw" — full reasoning content returned in response.reasoning_content (Anthropic, DeepSeek, kimi-k2.5, some xAI)
  • "summary" — summarised thought returned (Google, OpenAI gpt-5.x via Responses, grok-4.20)
  • None — reasoning tokens tracked internally only; content not exposed (xAI grok-4/grok-4-1-fast/grok-4-fast/grok-3-mini). Engines emit a synthetic "Reasoning in progress…" stub so consumers aren't left in the dark.

reasoning_efforts:

  • List of valid effort strings accepted by get_engine(reasoning_effort=...).
  • Anthropic: ["low","medium","high","max"] or ["low","medium","high","xhigh","max"] for Opus 4.7.
  • Google Gemini 3: ["minimal","low","medium","high"] or ["low","medium","high"] depending on model.
  • OpenAI gpt-5.x: ["off","low","medium","high","xhigh"] (reasoning models) or ["minimal","low","medium","high"] (gpt-5 / gpt-5-mini / gpt-5-nano).
  • xAI always-reasoning (grok-4.20, grok-4-1-fast, grok-4-fast, grok-4, grok-3-mini, grok-code-fast-1): ["auto"]"auto" means "reason at the model's default, no keyword sent" (always-reasoning models reject the param).
  • Moonshot (kimi-k2.5): ["off","auto"]"off" triggers reasoning_disabled_payload; "auto" lets it reason by default.
  • None for Gemini 2.5 (uses integer budgets internally) or models with no reasoning capability.

api_type (OpenAI-compat only): routing hint consumed by get_engine(). ["responses", "completions"] means either endpoint works; the factory picks ResponsesEngine when "responses" is present. ["completions"] (DeepSeek, Moonshot, legacy OpenAI gpt-4.x, older xAI) forces CompletionsEngine. None on Anthropic/Google (routed by provider name).


Usage

With tools

from sunwaee.core.tools import tool, ok, err
from sunwaee.modules.gen.engine.types import Tool

@tool("Return the current UTC time.")
async def get_time() -> str:
    from datetime import datetime, timezone
    return ok({"time": datetime.now(timezone.utc).isoformat()})

response = await engine.chat(messages, tools=[get_time._tool])

Tools must be async def. The agent refuses to run synchronous tools because asyncio.wait_for cannot cancel a thread once it has entered the sync body — a hung sync tool would otherwise leak threads from the default executor.

File and image attachments

from sunwaee.modules.gen.engine.types import FileAttachment, Message, Role

with open("report.pdf", "rb") as f:
    att = FileAttachment(data=f.read(), filename="report.pdf")

response = await engine.chat([Message(role=Role.USER, content="Summarise.", attachments=[att])])

Supported: text/*, application/json, image/jpeg|png|gif|webp, application/pdf, .docx, .xlsx, .pptx

Size caps enforced at construction: 10 MB for images, 20 MB for every other supported type. Oversized payloads raise ValueError before any extraction or base64 encoding runs.

ReAct agent loop

from sunwaee.modules.gen.agent import stream_run

new_messages = []
async for chunk in stream_run(
    messages,
    tools,
    engine,
    new_messages=new_messages,
    tool_timeout=60.0,          # seconds per individual tool call (default: 60)
    max_concurrent_tools=8,     # max tools running simultaneously (default: 8)
):
    if chunk.content:
        print(chunk.content, end="", flush=True)
# new_messages has all assistant + tool turns appended during the run

Up to 10 iterations by default. Concurrent tool calls via asyncio.gather, bounded by max_concurrent_tools. Tools must be async def — sync callables are rejected because asyncio.wait_for cannot cancel a thread once it has entered the sync body, which would leak executor threads on timeout. Unknown keyword arguments supplied by the model are silently filtered before calling the tool function.

Error types

All provider errors subclass EngineError(RuntimeError), so existing except RuntimeError handlers continue to work. Import subclasses to handle specific cases:

from sunwaee.modules.gen.engine import EngineError, RateLimitError, AuthError, TransientError

try:
    response = await engine.chat(messages)
except RateLimitError as e:   # 429 — back off and retry
    ...
except AuthError as e:        # 401 / 403 — invalid key
    ...
except TransientError as e:   # 5xx — server-side; may be retried
    ...
except EngineError as e:      # other 4xx
    print(e.status_code)

Listing models

from sunwaee.modules.gen.engine.models import list_models, get_model

all_models = list_models()              # list[Model]
model = get_model("claude-sonnet-4-6")  # Model | None

Testing

pytest tests/gen/ -m "not live"                                        # unit (no keys needed)
pytest tests/gen/ -m live                                              # live (real API calls)
pytest tests/gen/ -m "not live" --cov=sunwaee --cov-report=term-missing

# run a single live test file
pytest -m live tests/gen/engine/live/test_caching.py
pytest -m live tests/gen/engine/live/test_reasoning.py

Unit test conventions:

  • Mock httpx.AsyncClient — never make real HTTP calls
  • Assert response.cost, response.usage, response.performance populated on final chunk
  • For streaming, use an async generator as mock transport

Live test files and what they cover:

File What it tests
test_scenarios.py 6 scenarios × 6 providers × chat + stream (72 tests)
test_tool_call_result.py Full TOOL_CALL → execute → reply loop, all providers
test_attachments.py PNG image attachment, vision-capable providers
test_chain.py Three-provider conversation chain with shared history
test_caching.py Prompt-cache hit on turn 2, static system prompt
test_reasoning.py enable_reasoning ON / OFF per model category

Live scenarios:

Scenario What it tests
ONLY_SYSTEM System-only input edge case; lenient assertions
ONLY_USER Single user message
SYSTEM_AND_USER System prompt respected in response
TOOL_CALL Model must issue at least one tool call
TOOL_CALL_RESULT Full multi-turn with real tool IDs/signatures
FILE_ATTACHMENT Text file attached; asserts content populated
CONTEXT_ROLE Role.CONTEXT message handled without errors

All live tests default to enable_reasoning=False. test_reasoning.py is the only file that explicitly passes enable_reasoning=True.


How to add a model

File: sunwaee/modules/gen/engine/models/<provider>.py

Model(
    name="provider-model-name",
    display_name="Human Readable Name",
    provider="anthropic",
    context_window=200_000,
    max_output_tokens=64_000,
    input_price_per_mtok=3.0,
    output_price_per_mtok=15.0,
    cache_read_price_per_mtok=0.3,
    cache_write_price_per_mtok=3.75,
    input_price_per_mtok_200k=6.0,       # omit if no >200k tier
    output_price_per_mtok_200k=22.5,
    supports_vision=True,
    supports_tools=True,
    supports_reasoning=True,
    reasoning_mode="dynamic",             # "always" | "dynamic" | None
    reasoning_efforts=["low", "medium", "high", "max"],  # omit if not applicable
    reasoning_tokens_type="raw",          # "raw" | "summary" | None
    non_reasoning_id="model-non-reasoning",  # omit if no non-reasoning variant
    cache_min_tokens=1_024,              # omit (None) if caching is undocumented
    release_date="2025-01-01",
)

For a non-reasoning variant that pairs with a reasoning model:

Model(
    name="model-non-reasoning",
    ...same pricing...,
    supports_reasoning=False,
    reasoning_id="model",                 # points to the reasoning counterpart
)

Pricing tiers (engine/model.py): base required; _128k when input_tokens > 128_000 (xAI only); _200k when > 200_000; _272k when > 272_000 (OpenAI only). Thresholds are strict > — exactly at the boundary uses the lower tier.

cache_min_tokens — minimum tokens required at a cache breakpoint for prompt caching to activate. None = no caching. 0 = no minimum (caches everything). Known values:

Provider Minimum Models
Anthropic 4,096 Opus 4.7, Opus 4.6, Opus 4.5, Haiku 4.5
Anthropic 2,048 Sonnet 4.6
Anthropic 1,024 Sonnet 4.5
OpenAI 1,024 All models (automatic prefix caching)
Google 1,024 All models (explicit context caching)
xAI 0 All models (automatic, no minimum)
DeepSeek 64 All models (automatic prefix caching)
Moonshot 0 All models (automatic, no minimum)

How to add an OpenAI-compatible provider

  1. engine/models/<provider>.pyMODELS list
  2. engine/models/__init__.py — import + add to _ALL
  3. engine/factory.py — add to _OPENAI_COMPATIBLE: dict[str, str] (env var auto-derived as PROVIDER_API_KEY)
  4. tests/gen/engine/live/_shared.py — add ("provider", "cheapest-model") to ENGINES

How to add a provider with a custom API

  1. engine/models/<provider>.py + register in __init__.py
  2. engine/providers/<provider>.py — implement BaseEngine:
    • async def chat(self, messages, tools=None) -> Response
    • async def stream(self, messages, tools=None) -> AsyncIterator[Response]
    • Accept client: httpx.AsyncClient | None = None
    • Call resolve_tokens(usage) before compute_cost
    • Strip reasoning_content/reasoning_signature from all but the last assistant turn
    • Handle system-only input: promote to Role.USER
    • On 4xx/5xx in streaming: read full body before raising
    • Buffer tool call JSON across SSE chunks; parse only on stop
  3. engine/factory.py — wire into get_engine(), handle enable_reasoning for the new provider
  4. Tests: unit (providers/test_<provider>.py) + live entry in _shared.py

How to add a tool to the agent

from typing import Annotated
from sunwaee.core.tools import tool, ok, err

@tool("Search the web for current information.")
def web_search(
    query: Annotated[str, "The search query"],
    num_results: Annotated[int, "Number of results"] = 5,
) -> str:
    try:
        return ok(_do_search(query, num_results))
    except Exception as e:
        return err(str(e))

Register: add web_search._tool to TOOLS in sunwaee/modules/gen/tools.py.

Tests: tests/gen/test_<tool_name>.py — call directly, assert JSON output shape, test error path. Never call real external APIs.


@tool decorator

Introspects signature to build JSON Schema parameters automatically.

Supports: str, int, float, bool, list[T], Literal[...], Optional[T], Annotated[T, "description"]

  • Parameters with defaults → not required
  • Both sync and async supported
  • Must return JSON string: ok(data) / err(message) / json.dumps(...)
ok({"id": "123"})   # '{"ok": true, "data": {"id": "123"}}'
err("Not found")    # '{"ok": false, "error": "Not found"}'

Provider-specific quirks

# Rule
1 resolve_tokens() before compute_cost() — xAI/Google exclude reasoning tokens from output_tokens; resolve_tokens back-calculates from total_tokens. Always called unconditionally — it's a no-op when counts already match.
2 Strip reasoning from all but last assistant turn — stale reasoning_signature breaks APIs and blocks mid-session provider switches.
3 OpenAI uses max_completion_tokens, not max_tokens.
4 OpenAI reasoning models: yield synthetic chunk immediately — stream is silent during thinking; Response(reasoning_content="Reasoning in progress…", synthetic=True). Never treat synthetic=True as real content.
5 Google: thoughtSignature on functionCall partToolCall.thought_signature; echo every subsequent turn.
6 Google: no tool call IDs — use function name as correlation ID.
7 Google streaming: ?alt=sse required on streamGenerateContent.
8 System-only input — promote system message to Role.USER (Anthropic + Google).
9 Anthropic reasoning: two paths. Newer models (Opus 4.7/4.6, Sonnet 4.6) use output_config: {effort: X} + thinking: {type: "adaptive"}. Older models (Opus 4.5, Haiku 4.5, Sonnet 4.5) use thinking: {type: "enabled", budget_tokens: N} with 1024 ≤ budget < max_tokens. The factory selects the path based on whether the model has reasoning_efforts.
10 Connection pooling: one httpx.AsyncClient per (event_loop_id, base_url) in factory.py.
11 Role.CONTEXT mapping: all providers wrap content in <context> tags automatically — Anthropic → {"role":"user","content":"<context>…</context>"}; OpenAI → {"role":"system","content":"<context>…</context>"}; Google → {"role":"user","parts":[{"text":"<context>…</context>"}]}.
12 Google Gemini 3 uses thinkingLevel (string: "minimal"/"low"/"medium"/"high"); Gemini 2.5 uses thinkingBudget (int: -1 = dynamic, 0 = off, N = fixed). The engine selects based on whether the model has reasoning_efforts. Gemini 3.1 Pro and 2.5 Pro cannot disable thinking (reasoning_mode="always").
13 kimi-k2.5 (Moonshot) reasons by default — disabling thinking requires an explicit payload {"thinking": {"type": "disabled"}}. Set via Model.reasoning_disabled_payload; the OpenAI engine merges it when reasoning_effort is None.
14 xAI always-reasoning models (grok-4.20, grok-4-1-fast, grok-4-fast) route to a non-reasoning variant on enable_reasoning=False via non_reasoning_id. Models without a non_reasoning_id (grok-4, grok-3-mini, grok-code-fast-1) cannot have reasoning disabled.
15 grok-4.20 returns reasoning_content on chat/completionsreasoning_tokens_type="summary" refers to the /v1/responses endpoint only; on chat/completions the field carries full raw reasoning text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sunwaee-1.6.3.tar.gz (90.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sunwaee-1.6.3-py3-none-any.whl (53.6 kB view details)

Uploaded Python 3

File details

Details for the file sunwaee-1.6.3.tar.gz.

File metadata

  • Download URL: sunwaee-1.6.3.tar.gz
  • Upload date:
  • Size: 90.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for sunwaee-1.6.3.tar.gz
Algorithm Hash digest
SHA256 e874ec7dd450c79a2a145f166b5595628c38310ba8fbc9e66126845657e5b8fb
MD5 460d5477b3235456e7bf0d6b986487ae
BLAKE2b-256 4c407644e20874d94f2cc3133f0d2c1393c11556585d065d46e05163f4f8b853

See more details on using hashes here.

File details

Details for the file sunwaee-1.6.3-py3-none-any.whl.

File metadata

  • Download URL: sunwaee-1.6.3-py3-none-any.whl
  • Upload date:
  • Size: 53.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for sunwaee-1.6.3-py3-none-any.whl
Algorithm Hash digest
SHA256 094a985d773c467188c2aed2e47ed49fb9342137138a8fc84ec6c67f5c17f059
MD5 1f1476c0f86656eee0295e10e2b97da5
BLAKE2b-256 7e81247c06494436029fe80ce7ef69810ed7ab7c862101e48fc8168ccd97fd8e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page