Skip to main content

SUNWÆE gen — multi-provider LLM engine library.

Project description

Coverage Python PyPI License

All LLMs, one response format, one dependency (httpx). Supports switching providers mid-conversation.

Handles streaming, tool calls, file attachments, prompt caching, per-model reasoning effort, and cost tracking across Anthropic, OpenAI, Google, DeepSeek, xAI, and Moonshot.


Install

pip install sunwaee
pip install "sunwaee[files]"   # adds pdf, docx, xlsx, pptx extraction
pip install -e ".[dev,files]"  # development

Quick start

import asyncio
from sunwaee.modules.gen.engine.factory import get_engine
from sunwaee.modules.gen.engine.types import Message, Role

engine = get_engine("anthropic", "claude-sonnet-4-6")
# or with explicit reasoning effort:
engine = get_engine("anthropic", "claude-sonnet-4-6", reasoning_effort="high")

async def main():
    messages = [Message(role=Role.USER, content="Hello")]

    # non-streaming
    response = await engine.chat(messages)
    print(response.content, response.cost.total)

    # streaming
    async for chunk in engine.stream(messages):
        if chunk.content:
            print(chunk.content, end="", flush=True)

asyncio.run(main())

Providers

Provider provider= Env var
Anthropic "anthropic" ANTHROPIC_API_KEY
OpenAI "openai" OPENAI_API_KEY
Google "google" GOOGLE_API_KEY
DeepSeek "deepseek" DEEPSEEK_API_KEY
xAI "xai" XAI_API_KEY
Moonshot "moonshot" MOONSHOT_API_KEY

API key falls back to the env var when api_key= is not passed.


Directory structure

sunwaee/
├── utils/
│   └── logger.py                 # get_logger(name) — scoped under "sunwaee.*"
└── modules/gen/
    └── engine/
        ├── base.py               # BaseEngine ABC — chat() + stream()
        ├── factory.py            # get_engine(), close_all_clients(), connection pool
        ├── model.py              # Model dataclass + compute_cost()
        ├── types.py              # Message, Response, Tool, ToolCall, Usage, Cost, Performance, FileAttachment
        ├── errors.py             # EngineError hierarchy
        ├── models/               # per-provider model registries
        │   ├── __init__.py       # get_model(), list_models()
        │   └── anthropic.py / openai.py / google.py / deepseek.py / xai.py / moonshot.py
        └── providers/
            ├── anthropic.py      # AnthropicEngine
            ├── completions.py    # CompletionsEngine  (/v1/chat/completions)
            ├── responses.py      # ResponsesEngine    (/v1/responses)
            └── google.py         # GoogleEngine

tests/gen/
└── engine/
    ├── test_types.py / test_factory.py / test_model.py / test_errors.py
    ├── providers/
    │   └── test_anthropic.py / test_completions.py / test_responses.py / test_google.py
    └── live/                     # real API calls, excluded from CI (-m live)
        ├── _shared.py            # engines, fixtures, system prompt shared across files
        ├── test_scenarios.py     # all providers x scenarios x chat + stream
        ├── test_tool_call_result.py
        ├── test_attachments.py
        ├── test_chain.py         # three-provider conversation chain with shared history
        ├── test_caching.py
        └── test_reasoning.py

Core types

All types are defined in engine/types.py. Key ones:

Message — one turn in a conversation. role is a Role enum (SYSTEM, USER, ASSISTANT, TOOL, CONTEXT). attachments only applies to Role.USER. reasoning_content / reasoning_signature are provider-opaque — echo them back verbatim.

Response — what chat() returns and what stream() yields per chunk. Text arrives in content; reasoning in reasoning_content. The final streaming chunk carries stop_reason, usage, cost, and performance. Chunks with synthetic=True are engine-generated stubs (e.g. silent-reasoning placeholder) — never treat them as real model output.

Tool — a function the model can call. name, description, and parameters (JSON Schema object) are sent to the provider. The optional fn field is not used by the engine itself.

FileAttachment — wraps bytes + filename. Supported types: text/*, application/json, images (jpeg/png/gif/webp), and documents (pdf/docx/xlsx/pptx, requires [files] extra). Size caps enforced at construction: 10 MB for images, 20 MB for documents. See types.py for the full list of accepted MIME types.

Usage / Cost / Performance — token counts, dollar cost, and timing (latency, throughput, reasoning vs content split). Field names are in types.py.


get_engine()

from sunwaee.modules.gen.engine.factory import get_engine, close_all_clients

engine = get_engine(
    provider,           # "anthropic" | "openai" | "google" | "deepseek" | "xai" | "moonshot"
    model,              # model name string
    api_key=None,       # falls back to <PROVIDER>_API_KEY env var
    max_tokens=8192,
    reasoning_effort=None,  # None | "off" | "auto" | any value in model.reasoning_efforts
)

# call once on graceful shutdown to drain all pooled connections
await close_all_clients()

get_engine() reuses a single httpx.AsyncClient per (event_loop, base_url) (WeakKeyDictionary — dead loops drop their clients automatically). See factory.py for timeout and pool limits.

Resolution order

  1. Effort coercionreasoning_effort=None on a dynamic model that lists "off" is coerced to "off" (e.g. kimi-k2.5; coercion merges reasoning_disabled_payload to disable thinking). Models that use "none" as the disable wire value (gpt-5.x) do not coerce — None leaves the reasoning block absent, which lets the model use its default.
  2. Wire-model swap — for reasoning_mode="dynamic" models: effort in (None, "off") swaps to non_reasoning_id; any other effort swaps to reasoning_id. No swap occurs when the target variant is not defined.
  3. Validation — non-null effort must appear in model.reasoning_efforts (raises ValueError).
  4. Routing — OpenAI-compat: "responses" in model.api_type -> ResponsesEngine, else CompletionsEngine. Anthropic -> AnthropicEngine. Google -> GoogleEngine.

Model dataclass

Defined in engine/model.py. Reasoning-relevant fields:

Field Meaning
reasoning_mode "always" / "dynamic" / None
reasoning_efforts valid effort strings; "always" models have no "off"; "dynamic" models that disable via model swap start with "off"; OpenAI gpt-5.x use "none" as the wire disable value
reasoning_uses_budget True = factory maps effort strings to integer token budgets (Anthropic 4.5, Gemini 2.5 flash)
reasoning_tokens_type "raw" / "summary" / None (silent -- engine emits a synthetic stub)
reasoning_disabled_payload merged into request when reasoning is explicitly disabled
reasoning_id / non_reasoning_id paired variant names for model swapping
api_type ["responses"] / ["completions"] / both -- routing hint for OpenAI-compat providers

Pricing fields and the full field list are in engine/model.py.


Usage

Tool calls

Construct Tool objects with a JSON Schema parameters dict and pass them to chat() / stream():

from sunwaee.modules.gen.engine.types import Tool

weather_tool = Tool(
    name="get_weather",
    description="Return current weather for a location.",
    parameters={
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City name"},
        },
        "required": ["location"],
    },
)

response = await engine.chat(messages, tools=[weather_tool])
if response.tool_calls:
    for tc in response.tool_calls:
        print(tc.name, tc.arguments)

File attachments

from sunwaee.modules.gen.engine.types import FileAttachment, Message, Role

with open("report.pdf", "rb") as f:
    att = FileAttachment(data=f.read(), filename="report.pdf")

response = await engine.chat([
    Message(role=Role.USER, content="Summarise this.", attachments=[att])
])

Error handling

All provider errors subclass EngineError(RuntimeError). Import from engine/errors.py:

from sunwaee.modules.gen.engine.errors import EngineError, RateLimitError, AuthError, TransientError

try:
    response = await engine.chat(messages)
except RateLimitError:   # 429
    ...
except AuthError:        # 401 / 403
    ...
except TransientError:   # 5xx
    ...
except EngineError as e:
    print(e.status_code)

Listing models

from sunwaee.modules.gen.engine.models import list_models, get_model

all_models = list_models()              # list[Model]
model = get_model("claude-sonnet-4-6")  # Model | None

Logging

Set SUNWAEE_LOG_LEVEL=debug (or info / warning / error) to enable logs. All engine logs are at DEBUG -- request start/completion, model resolution decisions. See utils/logger.py.


Testing

venv/bin/pytest                 # unit tests (no API keys needed)
venv/bin/pytest -m live         # live tests (real API calls)
venv/bin/pytest -m "not live"   # explicit unit-only

Unit test conventions: mock httpx.AsyncClient, never make real HTTP calls. Assert usage, cost, and performance are populated on the final streaming chunk.

Live test files:

File What it covers
test_scenarios.py all providers x scenarios x chat() + stream()
test_tool_call_result.py full tool call -> execute -> reply loop
test_attachments.py image attachments, vision-capable providers
test_chain.py three-provider conversation chain with shared history
test_caching.py prompt-cache hit on turn 2
test_reasoning.py reasoning_effort on/off per model category

How to add a model

Add a Model(...) entry to engine/models/<provider>.py and ensure it is included in that file's MODELS list (imported by engine/models/__init__.py). Field reference is in engine/model.py. Then run psql/scripts/sync_models.py to mirror the change to the database.

Key rules:

  • reasoning_mode="dynamic" models that disable reasoning by swapping to a non-reasoning variant list "off" first in reasoning_efforts (e.g. kimi-k2.5). OpenAI gpt-5.x models that disable reasoning via {"reasoning": {"effort": "none"}} on the same model list "none" first instead — do NOT use "off" for these.
  • reasoning_uses_budget=True only for Anthropic 4.5 series and Gemini 2.5 flash/flash-lite.
  • api_type=["responses", "completions"] for OpenAI models that support both endpoints; ["completions"] for OpenAI-compat providers (xAI, DeepSeek, Moonshot). Omit for Anthropic and Google.
  • Pricing tiers: base required; _200k when context > 200k tokens; _128k for xAI; _272k for OpenAI. Thresholds are strict >.

How to add an OpenAI-compatible provider

  1. engine/models/<provider>.py -- MODELS list.
  2. engine/models/__init__.py -- import and add to _ALL.
  3. engine/factory.py -- add to _OPENAI_COMPATIBLE dict ("provider": "https://base-url/v1"). The env var is derived automatically as PROVIDER_API_KEY.
  4. tests/gen/engine/live/_shared.py -- add ("provider", "model-name") to ENGINES.

How to add a provider with a custom API

  1. engine/models/<provider>.py + register in __init__.py.
  2. engine/providers/<provider>.py -- implement BaseEngine:
    • async def chat(messages, tools=None) -> Response
    • async def stream(messages, tools=None) -> AsyncIterator[Response]
    • Accept client: httpx.AsyncClient | None = None -- factory.py injects a pooled client.
    • Call resolve_tokens() before compute_cost() -- some providers exclude reasoning tokens from output_tokens.
    • Strip reasoning_content / reasoning_signature from all but the last assistant turn.
    • Promote system-only input to Role.USER if the provider rejects system-only requests.
    • On 4xx/5xx during streaming: read the full body before raising.
    • Buffer tool call JSON across SSE chunks; parse only on stop.
  3. engine/factory.py -- wire into get_engine().
  4. Tests: unit (providers/test_<provider>.py) + live entry in _shared.py.

Provider-specific notes

  • resolve_tokens() before compute_cost() -- xAI and Google exclude reasoning tokens from output_tokens; resolve_tokens back-calculates from total_tokens.
  • Strip reasoning from all but the last assistant turn -- stale reasoning_signature breaks APIs.
  • OpenAI uses max_completion_tokens, not max_tokens (CompletionsEngine); max_output_tokens for ResponsesEngine.
  • Silent-reasoning models (grok-4, grok-4-1-fast, grok-4-fast, grok-3-mini) -- stream is silent during thinking; engines yield a synthetic Response(reasoning_content="Reasoning in progress...", synthetic=True) immediately.
  • Google: no tool call IDs -- function name used as correlation ID. thoughtSignature on functionCall parts must be echoed back on every subsequent assistant turn.
  • Google streaming -- ?alt=sse required on streamGenerateContent.
  • Anthropic reasoning: two paths -- newer models (Opus 4.7/4.6, Sonnet 4.6) use output_config: {effort} + thinking: {type: "adaptive"}; older budget models use thinking: {type: "enabled", budget_tokens: N}. Selected via model.reasoning_uses_budget.
  • Anthropic top-level cache_control -- payload["cache_control"] = {"type": "ephemeral"} at request root enables auto-caching. Do not remove.
  • Foreign reasoning_signature detection -- Anthropic and Google drop signatures that start with [ (ResponsesEngine JSON list format). Echoing them causes base64 decode failures.
  • OpenAI ResponsesEngine caching -- the Responses API does not do automatic prefix caching by server-side routing without a hint. ResponsesEngine computes a prompt_cache_key as SHA-256[:32] of the system prompt content and sends it with every request. This pins all requests sharing the same system prompt to the same cache server, enabling prefix-cache hits without previous_response_id or store=True.
  • DeepSeek cache tokens -- DeepSeek exposes prompt_cache_hit_tokens at the top level of the usage object instead of prompt_tokens_details.cached_tokens. CompletionsEngine reads both fields, preferring the standard OpenAI field.
  • OpenAI gpt-5.x reasoning effort -- "none" is the wire value to disable reasoning (not "off"). Sending no reasoning block defaults to the model's built-in default effort. "off" is rejected by these models' reasoning_efforts list and raises ValueError at the factory.
  • OpenAI reasoning effort: xhigh -- gpt-5.x and gpt-5.4.x support "none" | "low" | "medium" | "high" | "xhigh". The effort is forwarded verbatim in {"reasoning": {"effort": value}} by ResponsesEngine.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sunwaee-1.7.11.tar.gz (75.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sunwaee-1.7.11-py3-none-any.whl (45.9 kB view details)

Uploaded Python 3

File details

Details for the file sunwaee-1.7.11.tar.gz.

File metadata

  • Download URL: sunwaee-1.7.11.tar.gz
  • Upload date:
  • Size: 75.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for sunwaee-1.7.11.tar.gz
Algorithm Hash digest
SHA256 27abedcfb8f4ffa7f91a862644b5ae3c274dfabc1a1c9a829b24cfe8a3e94a3f
MD5 fe059e597fafee817374f264907db9ff
BLAKE2b-256 fd729a6217003614acec8b98e43d52c9eadffca33a46db10e841d7449585396e

See more details on using hashes here.

File details

Details for the file sunwaee-1.7.11-py3-none-any.whl.

File metadata

  • Download URL: sunwaee-1.7.11-py3-none-any.whl
  • Upload date:
  • Size: 45.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for sunwaee-1.7.11-py3-none-any.whl
Algorithm Hash digest
SHA256 b434cd14c55390031703ff493ce2c5821ca9503819b04fa3a9daae7d374f968c
MD5 9dfc036a1423b8724e431f5924a81e10
BLAKE2b-256 4bb0536ba2159f4685bd5c50bb571cc657cca1cf67ff5bc87d7b1e48f6e47b3e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page