Skip to main content

SUNWÆE gen — multi-provider LLM engine library.

Project description

Coverage Python PyPI License

All LLMs, one response format, one dependency (httpx). Supports switching model in conversations (e.g. draft with GPT, refine with Anthropic).

Handles streaming, tool calls, file attachments, prompt caching, reasoning on/off, and cost tracking across Anthropic, OpenAI, Google, DeepSeek, xAI, and Moonshot.


Install

pip install sunwaee
pip install "sunwaee[files]"   # pdf, docx, xlsx, pptx extraction
pip install -e ".[dev,files]"  # development

Quick start

import asyncio
from sunwaee.modules.gen.engine import get_engine
from sunwaee.modules.gen.engine.types import Message, Role

# enable_reasoning=False (default) — reasoning disabled / non-reasoning variant used
engine = get_engine("anthropic", "claude-sonnet-4-6")

# enable_reasoning=True — activates thinking for all providers
engine_reasoning = get_engine("anthropic", "claude-sonnet-4-6", enable_reasoning=True)

async def main():
    messages = [Message(role=Role.USER, content="Hello")]

    response = await engine.chat(messages)
    print(response.content, response.cost.total)

    async for chunk in engine.stream(messages):
        if chunk.content:
            print(chunk.content, end="", flush=True)

asyncio.run(main())

Providers

Provider provider= Env var
Anthropic "anthropic" ANTHROPIC_API_KEY
OpenAI "openai" OPENAI_API_KEY
Google "google" GOOGLE_API_KEY
DeepSeek "deepseek" DEEPSEEK_API_KEY
xAI "xai" XAI_API_KEY
Moonshot "moonshot" MOONSHOT_API_KEY

Directory structure

sunwaee/
├── core/
│   ├── logger.py                 # get_logger(name) — scoped under "sunwaee.*"
│   └── tools.py                  # @tool decorator, ok(), err()
└── modules/gen/
    ├── __init__.py               # public re-exports (get_engine, run, stream_run, …)
    ├── agent.py                  # ReAct loop — run() + stream_run()
    ├── tools.py                  # TOOLS list
    └── engine/
        ├── __init__.py           # get_engine, Message, Response, Tool, …
        ├── base.py               # BaseEngine ABC
        ├── factory.py            # get_engine() — provider routing + connection pooling
        ├── model.py              # Model dataclass + compute_cost()
        ├── types.py              # Message, Response, ToolCall, Usage, Cost, Performance, …
        ├── models/               # model registry per provider
        │   ├── __init__.py       # get_model(), list_models()
        │   ├── anthropic.py / openai.py / google.py / deepseek.py / xai.py / moonshot.py
        └── providers/
            ├── anthropic.py      # AnthropicEngine
            ├── openai.py         # OpenAIEngine (also used by DeepSeek, xAI, Moonshot)
            └── google.py         # GoogleEngine

tests/gen/
├── test_agent.py / test_stream_agent.py / test_tools.py
└── engine/
    ├── test_types.py / test_factory.py / test_model.py
    ├── providers/
    │   └── test_anthropic.py / test_openai.py / test_google.py
    └── live/
        ├── _shared.py            # shared config, data, helpers for all live tests
        ├── test_scenarios.py     # all providers × all scenarios × chat + stream
        ├── test_tool_call_result.py  # TOOL_CALL → execute → reply, all providers
        ├── test_attachments.py   # image attachments, vision-capable providers
        ├── test_chain.py         # three-provider conversation chain
        ├── test_caching.py       # prompt-cache hit on turn 2
        ├── test_reasoning.py     # reasoning ON / OFF per model category
        └── run/                  # JSON snapshots (gitignored)

Core types (engine/types.py)

class Role(Enum):       SYSTEM, USER, ASSISTANT, TOOL, CONTEXT
class StopReason(Enum): END_TURN, TOOL_USE, MAX_TOKENS

@dataclass class Message:
    role: Role
    content: str | None
    reasoning_content: str | None       # thinking for models that support it
    reasoning_signature: str | None     # opaque blob — echo back verbatim
    tool_call_id: str | None            # set on Role.TOOL messages
    tool_calls: list[ToolCall] | None
    attachments: list[FileAttachment] | None   # Role.USER only

@dataclass class Response:
    provider: str; model: str; streaming: bool; synthetic: bool
    content: str | None; reasoning_content: str | None; reasoning_signature: str | None
    tool_calls: list[ToolCall] | None; stop_reason: StopReason | None; error: Error | None
    usage: Usage | None; cost: Cost | None; performance: Performance | None

@dataclass class ToolCall:
    id: str; name: str; arguments: dict
    thought_signature: str | None    # Google only — echo back every subsequent turn
    error: str | None; duration: float; results: list[dict]

@dataclass class Usage:
    input_tokens: int; output_tokens: int; total_tokens: int
    cache_read_tokens: int; cache_write_tokens: int

@dataclass class Cost:
    input: float; output: float; cache_read: float; cache_write: float; total: float

@dataclass class Performance:
    latency: float            # seconds to first chunk
    reasoning_duration: float; content_duration: float; total_duration: float
    throughput: int           # output tokens / second

@dataclass class FileAttachment:
    data: bytes; filename: str; media_type: str = ""
    # text/* → <file name="…">…</file> block
    # image/jpeg|png|gif|webp → base64 inline
    # application/pdf|json + OOXML (docx/xlsx/pptx) → extracted text

get_engine() — reasoning control

engine = get_engine(
    provider,
    model,
    api_key=None,          # falls back to <PROVIDER>_API_KEY env var
    max_tokens=8192,
    enable_reasoning=False, # True activates thinking/reasoning for all providers
)

enable_reasoning resolves all provider complexity automatically:

Model reasoning_mode enable_reasoning=True enable_reasoning=False
"dynamic" Sends provider thinking config No thinking config sent
"always" + non_reasoning_id Uses the model as-is Swaps to non_reasoning_id variant
"always" + no swap Uses the model as-is (always reasons) Uses the model as-is (cannot disable)
None (no reasoning) + reasoning_id Swaps to reasoning_id variant Uses the model as-is

Default reasoning configs when enable_reasoning=True:

Provider / series Mechanism Value
Anthropic (Opus 4.7/4.6, Sonnet 4.6) output_config.effort + thinking: {type: "adaptive"} effort="high"
Anthropic (Opus 4.5, Haiku 4.5, Sonnet 4.5) thinking: {type: "enabled", budget_tokens: N} max(1024, max_tokens - 1024)
Google Gemini 3 (has reasoning_efforts) thinkingConfig.thinkingLevel "high"
Google Gemini 2.5 (no reasoning_efforts) thinkingConfig.thinkingBudget -1 (dynamic)
OpenAI-compat (has reasoning_efforts) reasoning_effort "high"

When enable_reasoning=False for dynamic models: Anthropic effort models use effort="low"; Gemini 3 uses the lowest effort in reasoning_efforts[0]; Gemini 2.5 uses thinkingBudget=0.


Model dataclass (engine/model.py)

@dataclass class Model:
    name: str; provider: str; display_name: str; description: str | None

    # specs
    context_window: int; max_output_tokens: int | None

    # features
    supports_vision: bool
    supports_tools: bool
    supports_reasoning: bool           # True if model has any reasoning capability

    # reasoning config
    reasoning_mode: str | None         # "always" | "dynamic" | None
    reasoning_efforts: list[str] | None  # valid effort levels (e.g. ["low","medium","high"])
    reasoning_tokens_type: str | None  # "raw" | "summary" | None
    reasoning_disabled_payload: dict | None  # merged into request when reasoning is disabled
    reasoning_id: str | None           # for non-reasoning variants: name of reasoning counterpart
    non_reasoning_id: str | None       # for reasoning models: name of non-reasoning counterpart

    cache_min_tokens: int | None

    # pricing (per million tokens)
    input_price_per_mtok: float | None; output_price_per_mtok: float | None
    cache_read_price_per_mtok: float | None; cache_write_price_per_mtok: float | None
    input_price_per_mtok_128k: float | None   # xAI only
    output_price_per_mtok_128k: float | None
    input_price_per_mtok_200k: float | None   # most providers
    output_price_per_mtok_200k: float | None; ...
    input_price_per_mtok_272k: float | None   # OpenAI only
    output_price_per_mtok_272k: float | None; ...

    release_date: str | None; deprecated_at: str | None; sunset_at: str | None

reasoning_mode:

  • "always" — model always reasons; disable by swapping to non_reasoning_id (xAI models, DeepSeek Reasoner)
  • "dynamic" — reasoning can be toggled on/off via provider-specific config (Anthropic, Google, OpenAI, Moonshot)
  • None — no reasoning capability

reasoning_tokens_type:

  • "raw" — full reasoning content returned in response.reasoning_content (Anthropic, DeepSeek, grok-4.20, kimi-k2.5)
  • "summary" — summarised thought returned (Google, grok-4.20 on /v1/responses endpoint)
  • None — reasoning tokens tracked internally only; content not exposed (OpenAI, most xAI)

reasoning_efforts:

  • List of valid named effort levels for the model's reasoning API parameter.
  • Anthropic: ["low","medium","high","max"] or ["low","medium","high","xhigh","max"] for Opus 4.7.
  • Google Gemini 3: ["minimal","low","medium","high"] or ["low","medium","high"] depending on model.
  • OpenAI: ["none","low","medium","high","xhigh"] or similar per-model.
  • None for models that use integer budgets (Gemini 2.5) or have no effort control.

Usage

With tools

from sunwaee.core.tools import tool, ok, err
from sunwaee.modules.gen.engine.types import Tool

@tool("Return the current UTC time.")
def get_time() -> str:
    from datetime import datetime, timezone
    return ok({"time": datetime.now(timezone.utc).isoformat()})

response = await engine.chat(messages, tools=[get_time._tool])

File and image attachments

from sunwaee.modules.gen.engine.types import FileAttachment, Message, Role

with open("report.pdf", "rb") as f:
    att = FileAttachment(data=f.read(), filename="report.pdf")

response = await engine.chat([Message(role=Role.USER, content="Summarise.", attachments=[att])])

Supported: text/*, application/json, image/jpeg|png|gif|webp, application/pdf, .docx, .xlsx, .pptx

ReAct agent loop

from sunwaee.modules.gen.agent import stream_run

new_messages = []
async for chunk in stream_run(messages, tools, engine, new_messages=new_messages):
    if chunk.content:
        print(chunk.content, end="", flush=True)
# new_messages has all assistant + tool turns appended during the run

Up to 10 iterations by default. Concurrent tool calls via asyncio.gather. Sync tools dispatched via run_in_executor.

Listing models

from sunwaee.modules.gen.engine.models import list_models, get_model

all_models = list_models()              # list[Model]
model = get_model("claude-sonnet-4-6")  # Model | None

Testing

pytest tests/gen/ -m "not live"                                        # unit (no keys needed)
pytest tests/gen/ -m live                                              # live (real API calls)
pytest tests/gen/ -m "not live" --cov=sunwaee --cov-report=term-missing

# run a single live test file
pytest -m live tests/gen/engine/live/test_caching.py
pytest -m live tests/gen/engine/live/test_reasoning.py

Unit test conventions:

  • Mock httpx.AsyncClient — never make real HTTP calls
  • Assert response.cost, response.usage, response.performance populated on final chunk
  • For streaming, use an async generator as mock transport

Live test files and what they cover:

File What it tests
test_scenarios.py 6 scenarios × 6 providers × chat + stream (72 tests)
test_tool_call_result.py Full TOOL_CALL → execute → reply loop, all providers
test_attachments.py PNG image attachment, vision-capable providers
test_chain.py Three-provider conversation chain with shared history
test_caching.py Prompt-cache hit on turn 2, static system prompt
test_reasoning.py enable_reasoning ON / OFF per model category

Live scenarios:

Scenario What it tests
ONLY_SYSTEM System-only input edge case; lenient assertions
ONLY_USER Single user message
SYSTEM_AND_USER System prompt respected in response
TOOL_CALL Model must issue at least one tool call
TOOL_CALL_RESULT Full multi-turn with real tool IDs/signatures
FILE_ATTACHMENT Text file attached; asserts content populated
CONTEXT_ROLE Role.CONTEXT message handled without errors

All live tests default to enable_reasoning=False. test_reasoning.py is the only file that explicitly passes enable_reasoning=True.


How to add a model

File: sunwaee/modules/gen/engine/models/<provider>.py

Model(
    name="provider-model-name",
    display_name="Human Readable Name",
    provider="anthropic",
    context_window=200_000,
    max_output_tokens=64_000,
    input_price_per_mtok=3.0,
    output_price_per_mtok=15.0,
    cache_read_price_per_mtok=0.3,
    cache_write_price_per_mtok=3.75,
    input_price_per_mtok_200k=6.0,       # omit if no >200k tier
    output_price_per_mtok_200k=22.5,
    supports_vision=True,
    supports_tools=True,
    supports_reasoning=True,
    reasoning_mode="dynamic",             # "always" | "dynamic" | None
    reasoning_efforts=["low", "medium", "high", "max"],  # omit if not applicable
    reasoning_tokens_type="raw",          # "raw" | "summary" | None
    non_reasoning_id="model-non-reasoning",  # omit if no non-reasoning variant
    cache_min_tokens=1_024,              # omit (None) if caching is undocumented
    release_date="2025-01-01",
)

For a non-reasoning variant that pairs with a reasoning model:

Model(
    name="model-non-reasoning",
    ...same pricing...,
    supports_reasoning=False,
    reasoning_id="model",                 # points to the reasoning counterpart
)

Pricing tiers (engine/model.py): base required; _128k when input_tokens > 128_000 (xAI only); _200k when > 200_000; _272k when > 272_000 (OpenAI only). Thresholds are strict > — exactly at the boundary uses the lower tier.

cache_min_tokens — minimum tokens required at a cache breakpoint for prompt caching to activate. None = no caching. 0 = no minimum (caches everything). Known values:

Provider Minimum Models
Anthropic 4,096 Opus 4.7, Opus 4.6, Opus 4.5, Haiku 4.5
Anthropic 2,048 Sonnet 4.6
Anthropic 1,024 Sonnet 4.5
OpenAI 1,024 All models (automatic prefix caching)
Google 1,024 All models (explicit context caching)
xAI 0 All models (automatic, no minimum)
DeepSeek 64 All models (automatic prefix caching)
Moonshot 0 All models (automatic, no minimum)

How to add an OpenAI-compatible provider

  1. engine/models/<provider>.pyMODELS list
  2. engine/models/__init__.py — import + add to _ALL
  3. engine/factory.py — add to _OPENAI_COMPATIBLE: dict[str, str] (env var auto-derived as PROVIDER_API_KEY)
  4. tests/gen/engine/live/_shared.py — add ("provider", "cheapest-model") to ENGINES

How to add a provider with a custom API

  1. engine/models/<provider>.py + register in __init__.py
  2. engine/providers/<provider>.py — implement BaseEngine:
    • async def chat(self, messages, tools=None) -> Response
    • async def stream(self, messages, tools=None) -> AsyncIterator[Response]
    • Accept client: httpx.AsyncClient | None = None
    • Call resolve_tokens(usage) before compute_cost
    • Strip reasoning_content/reasoning_signature from all but the last assistant turn
    • Handle system-only input: promote to Role.USER
    • On 4xx/5xx in streaming: read full body before raising
    • Buffer tool call JSON across SSE chunks; parse only on stop
  3. engine/factory.py — wire into get_engine(), handle enable_reasoning for the new provider
  4. Tests: unit (providers/test_<provider>.py) + live entry in _shared.py

How to add a tool to the agent

from typing import Annotated
from sunwaee.core.tools import tool, ok, err

@tool("Search the web for current information.")
def web_search(
    query: Annotated[str, "The search query"],
    num_results: Annotated[int, "Number of results"] = 5,
) -> str:
    try:
        return ok(_do_search(query, num_results))
    except Exception as e:
        return err(str(e))

Register: add web_search._tool to TOOLS in sunwaee/modules/gen/tools.py.

Tests: tests/gen/test_<tool_name>.py — call directly, assert JSON output shape, test error path. Never call real external APIs.


@tool decorator

Introspects signature to build JSON Schema parameters automatically.

Supports: str, int, float, bool, list[T], Literal[...], Optional[T], Annotated[T, "description"]

  • Parameters with defaults → not required
  • Both sync and async supported
  • Must return JSON string: ok(data) / err(message) / json.dumps(...)
ok({"id": "123"})   # '{"ok": true, "data": {"id": "123"}}'
err("Not found")    # '{"ok": false, "error": "Not found"}'

Provider-specific quirks

# Rule
1 resolve_tokens() before compute_cost() — xAI/Google exclude reasoning tokens from output_tokens; resolve_tokens back-calculates from total_tokens. Always called unconditionally — it's a no-op when counts already match.
2 Strip reasoning from all but last assistant turn — stale reasoning_signature breaks APIs and blocks mid-session provider switches.
3 OpenAI uses max_completion_tokens, not max_tokens.
4 OpenAI reasoning models: yield synthetic chunk immediately — stream is silent during thinking; Response(reasoning_content="Reasoning in progress…", synthetic=True). Never treat synthetic=True as real content.
5 Google: thoughtSignature on functionCall partToolCall.thought_signature; echo every subsequent turn.
6 Google: no tool call IDs — use function name as correlation ID.
7 Google streaming: ?alt=sse required on streamGenerateContent.
8 System-only input — promote system message to Role.USER (Anthropic + Google).
9 Anthropic reasoning: two paths. Newer models (Opus 4.7/4.6, Sonnet 4.6) use output_config: {effort: X} + thinking: {type: "adaptive"}. Older models (Opus 4.5, Haiku 4.5, Sonnet 4.5) use thinking: {type: "enabled", budget_tokens: N} with 1024 ≤ budget < max_tokens. The factory selects the path based on whether the model has reasoning_efforts.
10 Connection pooling: one httpx.AsyncClient per (event_loop_id, base_url) in factory.py.
11 Role.CONTEXT mapping: all providers wrap content in <context> tags automatically — Anthropic → {"role":"user","content":"<context>…</context>"}; OpenAI → {"role":"system","content":"<context>…</context>"}; Google → {"role":"user","parts":[{"text":"<context>…</context>"}]}.
12 Google Gemini 3 uses thinkingLevel (string: "minimal"/"low"/"medium"/"high"); Gemini 2.5 uses thinkingBudget (int: -1 = dynamic, 0 = off, N = fixed). The engine selects based on whether the model has reasoning_efforts. Gemini 3.1 Pro and 2.5 Pro cannot disable thinking (reasoning_mode="always").
13 kimi-k2.5 (Moonshot) reasons by default — disabling thinking requires an explicit payload {"thinking": {"type": "disabled"}}. Set via Model.reasoning_disabled_payload; the OpenAI engine merges it when reasoning_effort is None.
14 xAI always-reasoning models (grok-4.20, grok-4-1-fast, grok-4-fast) route to a non-reasoning variant on enable_reasoning=False via non_reasoning_id. Models without a non_reasoning_id (grok-4, grok-3-mini, grok-code-fast-1) cannot have reasoning disabled.
15 grok-4.20 returns reasoning_content on chat/completionsreasoning_tokens_type="summary" refers to the /v1/responses endpoint only; on chat/completions the field carries full raw reasoning text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sunwaee-1.4.1.tar.gz (73.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sunwaee-1.4.1-py3-none-any.whl (43.6 kB view details)

Uploaded Python 3

File details

Details for the file sunwaee-1.4.1.tar.gz.

File metadata

  • Download URL: sunwaee-1.4.1.tar.gz
  • Upload date:
  • Size: 73.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for sunwaee-1.4.1.tar.gz
Algorithm Hash digest
SHA256 06553478751b6740a7304bb836adba7c204ff673ace99fda040d8d93b14896b1
MD5 f22a42daeedf84da61529ff202b9d9e0
BLAKE2b-256 95761883f096e08c4b7dd830ceec1d5f3a04ae110793dd016713cec59f9c409f

See more details on using hashes here.

File details

Details for the file sunwaee-1.4.1-py3-none-any.whl.

File metadata

  • Download URL: sunwaee-1.4.1-py3-none-any.whl
  • Upload date:
  • Size: 43.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for sunwaee-1.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b6fbc566fdcc80c522fd61ccb89ab0c1618fd30abdc9f75e79931d2ae1c142a0
MD5 66ee5346d62699eb3d493ec84cac4ac9
BLAKE2b-256 319ca07f50521e3caf015632a410864c35c9eb2270b639ebd4984a95f243e7b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page