Skip to main content

SUNWÆE gen — multi-provider LLM engine library.

Project description

Coverage Python PyPI License

All LLMs, one response format, one dependency (httpx). Supports switching providers mid-conversation.

Handles streaming, tool calls, file attachments, prompt caching, per-model reasoning effort, and cost tracking across Anthropic, OpenAI, Google, DeepSeek, xAI, and Moonshot.


Install

pip install sunwaee
pip install "sunwaee[files]"   # adds pdf, docx, xlsx, pptx extraction
pip install -e ".[dev,files]"  # development

Quick start

import asyncio
from sunwaee.modules.gen.engine.factory import get_engine
from sunwaee.modules.gen.engine.types import Message, Role

engine = get_engine("anthropic", "claude-sonnet-4-6")
# or with explicit reasoning effort:
engine = get_engine("anthropic", "claude-sonnet-4-6", reasoning_effort="high")

async def main():
    messages = [Message(role=Role.USER, content="Hello")]

    # non-streaming
    response = await engine.chat(messages)
    print(response.content, response.cost.total)

    # streaming
    async for chunk in engine.stream(messages):
        if chunk.content:
            print(chunk.content, end="", flush=True)

asyncio.run(main())

Providers

Provider provider= Env var
Anthropic "anthropic" ANTHROPIC_API_KEY
OpenAI "openai" OPENAI_API_KEY
Google "google" GOOGLE_API_KEY
DeepSeek "deepseek" DEEPSEEK_API_KEY
xAI "xai" XAI_API_KEY
Moonshot "moonshot" MOONSHOT_API_KEY

API key falls back to the env var when api_key= is not passed.


Directory structure

sunwaee/
├── utils/
│   └── logger.py                 # get_logger(name) — scoped under "sunwaee.*"
└── modules/gen/
    └── engine/
        ├── base.py               # BaseEngine ABC — chat() + stream()
        ├── factory.py            # get_engine(), close_all_clients(), connection pool
        ├── model.py              # Model dataclass + compute_cost()
        ├── types.py              # Message, Response, Tool, ToolCall, Usage, Cost, Performance, FileAttachment
        ├── errors.py             # EngineError hierarchy
        ├── models/               # per-provider model registries
        │   ├── __init__.py       # get_model(), list_models()
        │   └── anthropic.py / openai.py / google.py / deepseek.py / xai.py / moonshot.py
        └── providers/
            ├── anthropic.py      # AnthropicEngine
            ├── completions.py    # CompletionsEngine  (/v1/chat/completions)
            ├── responses.py      # ResponsesEngine    (/v1/responses)
            └── google.py         # GoogleEngine

tests/gen/
└── engine/
    ├── test_types.py / test_factory.py / test_model.py / test_errors.py
    ├── providers/
    │   └── test_anthropic.py / test_completions.py / test_responses.py / test_google.py
    └── live/                     # real API calls, excluded from CI (-m live)
        ├── _shared.py            # engines, fixtures, system prompt shared across files
        ├── test_scenarios.py     # all providers x scenarios x chat + stream
        ├── test_tool_call_result.py
        ├── test_attachments.py
        ├── test_chain.py         # three-provider conversation chain with shared history
        ├── test_caching.py
        └── test_reasoning.py

Core types

All types are defined in engine/types.py. Key ones:

Message — one turn in a conversation. role is a Role enum (SYSTEM, USER, ASSISTANT, TOOL, CONTEXT). attachments only applies to Role.USER. reasoning_content / reasoning_signature are provider-opaque — echo them back verbatim.

Response — what chat() returns and what stream() yields per chunk. Text arrives in content; reasoning in reasoning_content. The final streaming chunk carries stop_reason, usage, cost, and performance. Chunks with synthetic=True are engine-generated stubs (e.g. silent-reasoning placeholder) — never treat them as real model output.

Tool — a function the model can call. name, description, and parameters (JSON Schema object) are sent to the provider. The optional fn field is not used by the engine itself.

FileAttachment — wraps bytes + filename. Supported types: text/*, application/json, images (jpeg/png/gif/webp), and documents (pdf/docx/xlsx/pptx, requires [files] extra). Size caps enforced at construction: 10 MB for images, 20 MB for documents. See types.py for the full list of accepted MIME types.

Usage / Cost / Performance — token counts, dollar cost, and timing (latency, throughput, reasoning vs content split). Field names are in types.py.


get_engine()

from sunwaee.modules.gen.engine.factory import get_engine, close_all_clients

engine = get_engine(
    provider,           # "anthropic" | "openai" | "google" | "deepseek" | "xai" | "moonshot"
    model,              # model name string
    api_key=None,       # falls back to <PROVIDER>_API_KEY env var
    max_tokens=8192,
    reasoning_effort=None,  # None | "off" | "auto" | any value in model.reasoning_efforts
)

# call once on graceful shutdown to drain all pooled connections
await close_all_clients()

get_engine() reuses a single httpx.AsyncClient per (event_loop, base_url) (WeakKeyDictionary — dead loops drop their clients automatically). See factory.py for timeout and pool limits.

Resolution order

  1. Effort coercionreasoning_effort=None on a dynamic model that lists "off" is coerced to "off" (e.g. kimi-k2.5, deepseek-v4-flash, deepseek-v4-pro; coercion merges reasoning_disabled_payload to disable thinking).
  2. Wire-model swap — for reasoning_mode="dynamic" models: effort in (None, "off") swaps to non_reasoning_id; any other effort swaps to reasoning_id.
  3. Validation — non-null effort must appear in model.reasoning_efforts (raises ValueError).
  4. Routing — OpenAI-compat: "responses" in model.api_type -> ResponsesEngine, else CompletionsEngine. Anthropic -> AnthropicEngine. Google -> GoogleEngine.

Model dataclass

Defined in engine/model.py. Reasoning-relevant fields:

Field Meaning
reasoning_mode "always" / "dynamic" / None
reasoning_efforts valid effort strings; "always" models have no "off"; "dynamic" models start with "off"
reasoning_uses_budget True = factory maps effort strings to integer token budgets (Anthropic 4.5, Gemini 2.5 flash)
reasoning_tokens_type "raw" / "summary" / None (silent -- engine emits a synthetic stub)
reasoning_disabled_payload merged into request when reasoning is explicitly disabled
reasoning_id / non_reasoning_id paired variant names for model swapping
api_type ["responses"] / ["completions"] / both -- routing hint for OpenAI-compat providers

Pricing fields and the full field list are in engine/model.py.


Usage

Tool calls

Construct Tool objects with a JSON Schema parameters dict and pass them to chat() / stream():

from sunwaee.modules.gen.engine.types import Tool

weather_tool = Tool(
    name="get_weather",
    description="Return current weather for a location.",
    parameters={
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City name"},
        },
        "required": ["location"],
    },
)

response = await engine.chat(messages, tools=[weather_tool])
if response.tool_calls:
    for tc in response.tool_calls:
        print(tc.name, tc.arguments)

File attachments

from sunwaee.modules.gen.engine.types import FileAttachment, Message, Role

with open("report.pdf", "rb") as f:
    att = FileAttachment(data=f.read(), filename="report.pdf")

response = await engine.chat([
    Message(role=Role.USER, content="Summarise this.", attachments=[att])
])

Error handling

All provider errors subclass EngineError(RuntimeError). Import from engine/errors.py:

from sunwaee.modules.gen.engine.errors import EngineError, RateLimitError, AuthError, TransientError

try:
    response = await engine.chat(messages)
except RateLimitError:   # 429
    ...
except AuthError:        # 401 / 403
    ...
except TransientError:   # 5xx
    ...
except EngineError as e:
    print(e.status_code)

Listing models

from sunwaee.modules.gen.engine.models import list_models, get_model

all_models = list_models()              # list[Model]
model = get_model("claude-sonnet-4-6")  # Model | None

Logging

Set SUNWAEE_LOG_LEVEL=debug (or info / warning / error) to enable logs. All engine logs are at DEBUG -- request start/completion, model resolution decisions. See utils/logger.py.


Testing

venv/bin/pytest                 # unit tests (no API keys needed)
venv/bin/pytest -m live         # live tests (real API calls)
venv/bin/pytest -m "not live"   # explicit unit-only

Unit test conventions: mock httpx.AsyncClient, never make real HTTP calls. Assert usage, cost, and performance are populated on the final streaming chunk.

Live test files:

File What it covers
test_scenarios.py all providers x scenarios x chat() + stream()
test_tool_call_result.py full tool call -> execute -> reply loop
test_attachments.py image attachments, vision-capable providers
test_chain.py three-provider conversation chain with shared history
test_caching.py prompt-cache hit on turn 2
test_reasoning.py reasoning_effort on/off per model category

How to add a model

Add a Model(...) entry to engine/models/<provider>.py and ensure it is included in that file's MODELS list (imported by engine/models/__init__.py). Field reference is in engine/model.py. Then run psql/scripts/sync_models.py to mirror the change to the database.

Key rules:

  • reasoning_mode="dynamic" models must list "off" first in reasoning_efforts.
  • reasoning_uses_budget=True only for Anthropic 4.5 series and Gemini 2.5 flash/flash-lite.
  • api_type=["responses"] for models that use /v1/responses; ["completions"] for /v1/chat/completions. Omit for Anthropic and Google.
  • Pricing tiers: base required; _200k when context > 200k tokens; _128k for xAI; _272k for OpenAI. Thresholds are strict >.

How to add an OpenAI-compatible provider

  1. engine/models/<provider>.py -- MODELS list.
  2. engine/models/__init__.py -- import and add to _ALL.
  3. engine/factory.py -- add to _OPENAI_COMPATIBLE dict ("provider": "https://base-url/v1"). The env var is derived automatically as PROVIDER_API_KEY.
  4. tests/gen/engine/live/_shared.py -- add ("provider", "model-name") to ENGINES.

How to add a provider with a custom API

  1. engine/models/<provider>.py + register in __init__.py.
  2. engine/providers/<provider>.py -- implement BaseEngine:
    • async def chat(messages, tools=None) -> Response
    • async def stream(messages, tools=None) -> AsyncIterator[Response]
    • Accept client: httpx.AsyncClient | None = None -- factory.py injects a pooled client.
    • Call resolve_tokens() before compute_cost() -- some providers exclude reasoning tokens from output_tokens.
    • Strip reasoning_content / reasoning_signature from all but the last assistant turn.
    • Promote system-only input to Role.USER if the provider rejects system-only requests.
    • On 4xx/5xx during streaming: read the full body before raising.
    • Buffer tool call JSON across SSE chunks; parse only on stop.
  3. engine/factory.py -- wire into get_engine().
  4. Tests: unit (providers/test_<provider>.py) + live entry in _shared.py.

Provider-specific notes

  • resolve_tokens() before compute_cost() -- xAI and Google exclude reasoning tokens from output_tokens; resolve_tokens back-calculates from total_tokens.
  • Strip reasoning from all but the last assistant turn -- stale reasoning_signature breaks APIs.
  • OpenAI uses max_completion_tokens, not max_tokens.
  • Silent-reasoning models (grok-4, grok-4-1-fast, grok-4-fast, grok-3-mini) -- stream is silent during thinking; engines yield a synthetic Response(reasoning_content="Reasoning in progress...", synthetic=True) immediately.
  • Google: no tool call IDs -- function name used as correlation ID. thoughtSignature on functionCall parts must be echoed back on every subsequent assistant turn.
  • Google streaming -- ?alt=sse required on streamGenerateContent.
  • Anthropic reasoning: two paths -- newer models (Opus 4.7/4.6, Sonnet 4.6) use output_config: {effort} + thinking: {type: "adaptive"}; older budget models use thinking: {type: "enabled", budget_tokens: N}. Selected via model.reasoning_uses_budget.
  • Anthropic top-level cache_control -- payload["cache_control"] = {"type": "ephemeral"} at request root enables auto-caching. Do not remove.
  • Foreign reasoning_signature detection -- Anthropic and Google drop signatures that start with [ (ResponsesEngine JSON list format). Echoing them causes base64 decode failures.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sunwaee-1.7.4.tar.gz (72.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sunwaee-1.7.4-py3-none-any.whl (44.7 kB view details)

Uploaded Python 3

File details

Details for the file sunwaee-1.7.4.tar.gz.

File metadata

  • Download URL: sunwaee-1.7.4.tar.gz
  • Upload date:
  • Size: 72.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for sunwaee-1.7.4.tar.gz
Algorithm Hash digest
SHA256 f1a7efaf9cab013d0641535ff7069a697647d70aebca057b9c3d6622ddc8ae63
MD5 6d7ff6d0a980b5c531c780caeffe27b5
BLAKE2b-256 d388fa8be0eb3a1897a621a0d434ebdf1b2a637e2d0eb7bbd380f9413263d108

See more details on using hashes here.

File details

Details for the file sunwaee-1.7.4-py3-none-any.whl.

File metadata

  • Download URL: sunwaee-1.7.4-py3-none-any.whl
  • Upload date:
  • Size: 44.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for sunwaee-1.7.4-py3-none-any.whl
Algorithm Hash digest
SHA256 b932a7197574efba2cd01fb29ca4fe99559f094ca02f927fe9130c7239d18174
MD5 8d79e01f40eaf3bad281622ef00a30a9
BLAKE2b-256 fae8402d3e4658d9ebc75651250b4b59574b34e5c5dd8eb1504ee3767f783333

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page