sunwaee

SUNWÆE gen — multi-provider LLM engine library.

Project description

Coverage Python PyPI License

All LLMs, one response format, one dependency (httpx). Supports switching providers mid-conversation.

Handles streaming, tool calls, file attachments, prompt caching, per-model reasoning effort, and cost tracking across Anthropic, OpenAI, Google, DeepSeek, xAI, and Moonshot.

Install

pip install sunwaee
pip install "sunwaee[files]"   # adds pdf, docx, xlsx, pptx extraction
pip install -e ".[dev,files]"  # development

Quick start

import asyncio
from sunwaee.modules.gen.engine.factory import get_engine
from sunwaee.modules.gen.engine.types import Message, Role

engine = get_engine("anthropic", "claude-sonnet-4-6")
# or with explicit reasoning effort:
engine = get_engine("anthropic", "claude-sonnet-4-6", reasoning_effort="high")

async def main():
    messages = [Message(role=Role.USER, content="Hello")]

    # non-streaming
    response = await engine.chat(messages)
    print(response.content, response.cost.total)

    # streaming
    async for chunk in engine.stream(messages):
        if chunk.content:
            print(chunk.content, end="", flush=True)

asyncio.run(main())

Providers

Provider	`provider=`	Env var
Anthropic	`"anthropic"`	`ANTHROPIC_API_KEY`
OpenAI	`"openai"`	`OPENAI_API_KEY`
Google	`"google"`	`GOOGLE_API_KEY`
DeepSeek	`"deepseek"`	`DEEPSEEK_API_KEY`
xAI	`"xai"`	`XAI_API_KEY`
Moonshot	`"moonshot"`	`MOONSHOT_API_KEY`

API key falls back to the env var when api_key= is not passed.

Directory structure

sunwaee/
├── utils/
│   └── logger.py                 # get_logger(name) — scoped under "sunwaee.*"
└── modules/gen/
    └── engine/
        ├── base.py               # BaseEngine ABC — chat() + stream()
        ├── factory.py            # get_engine(), close_all_clients(), connection pool
        ├── model.py              # Model dataclass + compute_cost()
        ├── types.py              # Message, Response, Tool, ToolCall, Usage, Cost, Performance, FileAttachment
        ├── errors.py             # EngineError hierarchy
        ├── models/               # per-provider model registries
        │   ├── __init__.py       # get_model(), list_models()
        │   └── anthropic.py / openai.py / google.py / deepseek.py / xai.py / moonshot.py
        └── providers/
            ├── anthropic.py      # AnthropicEngine
            ├── completions.py    # CompletionsEngine  (/v1/chat/completions)
            ├── responses.py      # ResponsesEngine    (/v1/responses)
            └── google.py         # GoogleEngine

tests/gen/
└── engine/
    ├── test_types.py / test_factory.py / test_model.py / test_errors.py
    ├── providers/
    │   └── test_anthropic.py / test_completions.py / test_responses.py / test_google.py
    └── live/                     # real API calls, excluded from CI (-m live)
        ├── _shared.py            # engines, fixtures, system prompt shared across files
        ├── test_scenarios.py     # all providers x scenarios x chat + stream
        ├── test_tool_call_result.py
        ├── test_attachments.py
        ├── test_chain.py         # three-provider conversation chain with shared history
        ├── test_caching.py
        └── test_reasoning.py

Core types

All types are defined in engine/types.py. Key ones:

Message — one turn in a conversation. role is a Role enum (SYSTEM, USER, ASSISTANT, TOOL, CONTEXT). attachments only applies to Role.USER. reasoning_content / reasoning_signature are provider-opaque — echo them back verbatim.

Response — what chat() returns and what stream() yields per chunk. Text arrives in content; reasoning in reasoning_content. The final streaming chunk carries stop_reason, usage, cost, and performance. Chunks with synthetic=True are engine-generated stubs (e.g. silent-reasoning placeholder) — never treat them as real model output.

Tool — a function the model can call. name, description, and parameters (JSON Schema object) are sent to the provider. The optional fn field is not used by the engine itself.

FileAttachment — wraps bytes + filename. Supported types: text/*, application/json, images (jpeg/png/gif/webp), and documents (pdf/docx/xlsx/pptx, requires [files] extra). Size caps enforced at construction: 10 MB for images, 20 MB for documents. See types.py for the full list of accepted MIME types.

Usage / Cost / Performance — token counts, dollar cost, and timing (latency, throughput, reasoning vs content split). Field names are in types.py.

`get_engine()`

from sunwaee.modules.gen.engine.factory import get_engine, close_all_clients

engine = get_engine(
    provider,           # "anthropic" | "openai" | "google" | "deepseek" | "xai" | "moonshot"
    model,              # model name string
    api_key=None,       # falls back to <PROVIDER>_API_KEY env var
    max_tokens=8192,
    reasoning_effort=None,  # None | "off" | "auto" | any value in model.reasoning_efforts
)

# call once on graceful shutdown to drain all pooled connections
await close_all_clients()

get_engine() reuses a single httpx.AsyncClient per (event_loop, base_url) (WeakKeyDictionary — dead loops drop their clients automatically). See factory.py for timeout and pool limits.

Resolution order

Effort coercion — reasoning_effort=None on a dynamic model that lists "off" is coerced to "off" (e.g. kimi-k2.5; coercion merges reasoning_disabled_payload to disable thinking). Models that use "none" as the disable wire value (gpt-5.x) do not coerce — None leaves the reasoning block absent, which lets the model use its default.
Wire-model swap — for reasoning_mode="dynamic" models: effort in (None, "off") swaps to non_reasoning_id; any other effort swaps to reasoning_id. No swap occurs when the target variant is not defined.
Validation — non-null effort must appear in model.reasoning_efforts (raises ValueError).
Routing — OpenAI-compat: "responses" in model.api_type -> ResponsesEngine, else CompletionsEngine. Anthropic -> AnthropicEngine. Google -> GoogleEngine.

`Model` dataclass

Defined in engine/model.py. Reasoning-relevant fields:

Field	Meaning
`reasoning_mode`	`"always"` / `"dynamic"` / `None`
`reasoning_efforts`	valid effort strings; `"always"` models have no `"off"`; `"dynamic"` models that disable via model swap start with `"off"`; OpenAI gpt-5.x use `"none"` as the wire disable value
`reasoning_uses_budget`	`True` = factory maps effort strings to integer token budgets (Anthropic 4.5, Gemini 2.5 flash)
`reasoning_tokens_type`	`"raw"` / `"summary"` / `None` (silent -- engine emits a synthetic stub)
`reasoning_disabled_payload`	merged into request when reasoning is explicitly disabled
`reasoning_id` / `non_reasoning_id`	paired variant names for model swapping
`api_type`	`["responses"]` / `["completions"]` / both -- routing hint for OpenAI-compat providers

Pricing fields and the full field list are in engine/model.py.

Usage

Tool calls

Construct Tool objects with a JSON Schema parameters dict and pass them to chat() / stream():

from sunwaee.modules.gen.engine.types import Tool

weather_tool = Tool(
    name="get_weather",
    description="Return current weather for a location.",
    parameters={
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City name"},
        },
        "required": ["location"],
    },
)

response = await engine.chat(messages, tools=[weather_tool])
if response.tool_calls:
    for tc in response.tool_calls:
        print(tc.name, tc.arguments)

File attachments

from sunwaee.modules.gen.engine.types import FileAttachment, Message, Role

with open("report.pdf", "rb") as f:
    att = FileAttachment(data=f.read(), filename="report.pdf")

response = await engine.chat([
    Message(role=Role.USER, content="Summarise this.", attachments=[att])
])

Error handling

All provider errors subclass EngineError(RuntimeError). Import from engine/errors.py:

from sunwaee.modules.gen.engine.errors import EngineError, RateLimitError, AuthError, TransientError

try:
    response = await engine.chat(messages)
except RateLimitError:   # 429
    ...
except AuthError:        # 401 / 403
    ...
except TransientError:   # 5xx
    ...
except EngineError as e:
    print(e.status_code)

Listing models

from sunwaee.modules.gen.engine.models import list_models, get_model

all_models = list_models()              # list[Model]
model = get_model("claude-sonnet-4-6")  # Model | None

Logging

Set SUNWAEE_LOG_LEVEL=debug (or info / warning / error) to enable logs. All engine logs are at DEBUG -- request start/completion, model resolution decisions. See utils/logger.py.

Testing

venv/bin/pytest                 # unit tests (no API keys needed)
venv/bin/pytest -m live         # live tests (real API calls)
venv/bin/pytest -m "not live"   # explicit unit-only

Unit test conventions: mock httpx.AsyncClient, never make real HTTP calls. Assert usage, cost, and performance are populated on the final streaming chunk.

Live test files:

File	What it covers
`test_scenarios.py`	all providers x scenarios x `chat()` + `stream()`
`test_tool_call_result.py`	full tool call -> execute -> reply loop
`test_attachments.py`	image attachments, vision-capable providers
`test_chain.py`	three-provider conversation chain with shared history
`test_caching.py`	prompt-cache hit on turn 2
`test_reasoning.py`	`reasoning_effort` on/off per model category

How to add a model

Add a Model(...) entry to engine/models/<provider>.py and ensure it is included in that file's MODELS list (imported by engine/models/__init__.py). Field reference is in engine/model.py. Then run psql/scripts/sync_models.py to mirror the change to the database.

Key rules:

reasoning_mode="dynamic" models that disable reasoning by swapping to a non-reasoning variant list "off" first in reasoning_efforts (e.g. kimi-k2.5). OpenAI gpt-5.x models that disable reasoning via {"reasoning": {"effort": "none"}} on the same model list "none" first instead — do NOT use "off" for these.
reasoning_uses_budget=True only for Anthropic 4.5 series and Gemini 2.5 flash/flash-lite.
api_type=["responses", "completions"] for OpenAI models that support both endpoints; ["completions"] for OpenAI-compat providers (xAI, DeepSeek, Moonshot). Omit for Anthropic and Google.
Pricing tiers: base required; _200k when context > 200k tokens; _128k for xAI; _272k for OpenAI. Thresholds are strict >.

How to add an OpenAI-compatible provider

engine/models/<provider>.py -- MODELS list.
engine/models/__init__.py -- import and add to _ALL.
engine/factory.py -- add to _OPENAI_COMPATIBLE dict ("provider": "https://base-url/v1"). The env var is derived automatically as PROVIDER_API_KEY.
tests/gen/engine/live/_shared.py -- add ("provider", "model-name") to ENGINES.

How to add a provider with a custom API

engine/models/<provider>.py + register in __init__.py.
engine/providers/<provider>.py -- implement BaseEngine:
- async def chat(messages, tools=None) -> Response
- async def stream(messages, tools=None) -> AsyncIterator[Response]
- Accept client: httpx.AsyncClient | None = None -- factory.py injects a pooled client.
- Call resolve_tokens() before compute_cost() -- some providers exclude reasoning tokens from output_tokens.
- Strip reasoning_content / reasoning_signature from all but the last assistant turn.
- Promote system-only input to Role.USER if the provider rejects system-only requests.
- On 4xx/5xx during streaming: read the full body before raising.
- Buffer tool call JSON across SSE chunks; parse only on stop.
engine/factory.py -- wire into get_engine().
Tests: unit (providers/test_<provider>.py) + live entry in _shared.py.

Provider-specific notes

resolve_tokens() before compute_cost() -- xAI and Google exclude reasoning tokens from output_tokens; resolve_tokens back-calculates from total_tokens.
Strip reasoning from all but the last assistant turn -- stale reasoning_signature breaks APIs.
OpenAI uses max_completion_tokens, not max_tokens (CompletionsEngine); max_output_tokens for ResponsesEngine.
Silent-reasoning models (grok-4, grok-4-1-fast, grok-4-fast, grok-3-mini) -- stream is silent during thinking; engines yield a synthetic Response(reasoning_content="Reasoning in progress...", synthetic=True) immediately.
Google: no tool call IDs -- function name used as correlation ID. thoughtSignature on functionCall parts must be echoed back on every subsequent assistant turn.
Google streaming -- ?alt=sse required on streamGenerateContent.
Anthropic reasoning: two paths -- newer models (Opus 4.7/4.6, Sonnet 4.6) use output_config: {effort} + thinking: {type: "adaptive"}; older budget models use thinking: {type: "enabled", budget_tokens: N}. Selected via model.reasoning_uses_budget.
Anthropic top-level cache_control -- payload["cache_control"] = {"type": "ephemeral"} at request root enables auto-caching. Do not remove.
Foreign reasoning_signature detection -- Anthropic and Google drop signatures that start with [ (ResponsesEngine JSON list format). Echoing them causes base64 decode failures.
OpenAI ResponsesEngine caching -- the Responses API does not do automatic prefix caching by server-side routing without a hint. ResponsesEngine computes a prompt_cache_key as SHA-256[:32] of the system prompt content and sends it with every request. This pins all requests sharing the same system prompt to the same cache server, enabling prefix-cache hits without previous_response_id or store=True.
DeepSeek cache tokens -- DeepSeek exposes prompt_cache_hit_tokens at the top level of the usage object instead of prompt_tokens_details.cached_tokens. CompletionsEngine reads both fields, preferring the standard OpenAI field.
OpenAI gpt-5.x reasoning effort -- "none" is the wire value to disable reasoning (not "off"). Sending no reasoning block defaults to the model's built-in default effort. "off" is rejected by these models' reasoning_efforts list and raises ValueError at the factory.
OpenAI reasoning effort: xhigh -- gpt-5.x and gpt-5.4.x support "none" | "low" | "medium" | "high" | "xhigh". The effort is forwarded verbatim in {"reasoning": {"effort": value}} by ResponsesEngine.

Project details

Release history Release notifications | RSS feed

This version

1.7.11

May 8, 2026

1.7.10

May 8, 2026

1.7.9

May 5, 2026

1.7.8

May 2, 2026

1.7.7

Apr 24, 2026

1.7.6

Apr 24, 2026

1.7.5

Apr 24, 2026

1.7.4

Apr 24, 2026

1.7.3

Apr 22, 2026

1.7.2

Apr 19, 2026

1.7.1

Apr 19, 2026

1.7.0

Apr 19, 2026

1.6.3

Apr 18, 2026

1.6.2

Apr 18, 2026

1.6.1

Apr 18, 2026

1.6.0

Apr 18, 2026

1.5.1

Apr 17, 2026

1.5.0

Apr 17, 2026

1.4.1

Apr 17, 2026

1.4.0

Apr 17, 2026

1.3.1

Apr 11, 2026

1.3.0

Apr 11, 2026

1.2.0

Apr 3, 2026

1.1.11

Mar 24, 2026

1.1.10

Mar 20, 2026

1.1.9

Mar 18, 2026

1.1.8

Mar 18, 2026

1.1.7

Mar 18, 2026

1.1.6

Mar 16, 2026

1.1.5

Mar 14, 2026

1.1.4

Mar 14, 2026

1.1.3

Mar 12, 2026

1.1.2

Mar 12, 2026

1.1.1

Mar 10, 2026

1.1.0

Mar 10, 2026

1.0.1

Mar 2, 2026

1.0.0

Feb 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sunwaee-1.7.11.tar.gz (75.2 kB view details)

Uploaded May 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sunwaee-1.7.11-py3-none-any.whl (45.9 kB view details)

Uploaded May 8, 2026 Python 3

File details

Details for the file sunwaee-1.7.11.tar.gz.

File metadata

Download URL: sunwaee-1.7.11.tar.gz
Upload date: May 8, 2026
Size: 75.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for sunwaee-1.7.11.tar.gz
Algorithm	Hash digest
SHA256	`27abedcfb8f4ffa7f91a862644b5ae3c274dfabc1a1c9a829b24cfe8a3e94a3f`
MD5	`fe059e597fafee817374f264907db9ff`
BLAKE2b-256	`fd729a6217003614acec8b98e43d52c9eadffca33a46db10e841d7449585396e`

See more details on using hashes here.

File details

Details for the file sunwaee-1.7.11-py3-none-any.whl.

File metadata

Download URL: sunwaee-1.7.11-py3-none-any.whl
Upload date: May 8, 2026
Size: 45.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for sunwaee-1.7.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b434cd14c55390031703ff493ce2c5821ca9503819b04fa3a9daae7d374f968c`
MD5	`9dfc036a1423b8724e431f5924a81e10`
BLAKE2b-256	`4bb0536ba2159f4685bd5c50bb571cc657cca1cf67ff5bc87d7b1e48f6e47b3e`

See more details on using hashes here.

sunwaee 1.7.11

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Install

Quick start

Providers

Directory structure

Core types

get_engine()

Resolution order

Model dataclass

Usage

Tool calls

File attachments

Error handling

Listing models

Logging

Testing

How to add a model

How to add an OpenAI-compatible provider

How to add a provider with a custom API

Provider-specific notes

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`get_engine()`

`Model` dataclass