SUNWÆE gen — multi-provider LLM engine library.
Project description
All LLMs, one response format, one dependency (httpx). Supports switching providers mid-conversation.
Handles streaming, tool calls, file attachments, prompt caching, per-model reasoning effort, and cost tracking across Anthropic, OpenAI, Google, DeepSeek, xAI, and Moonshot.
Install
pip install sunwaee
pip install "sunwaee[files]" # adds pdf, docx, xlsx, pptx extraction
pip install -e ".[dev,files]" # development
Quick start
import asyncio
from sunwaee.modules.gen.engine.factory import get_engine
from sunwaee.modules.gen.engine.types import Message, Role
engine = get_engine("anthropic", "claude-sonnet-4-6")
# or with explicit reasoning effort:
engine = get_engine("anthropic", "claude-sonnet-4-6", reasoning_effort="high")
async def main():
messages = [Message(role=Role.USER, content="Hello")]
# non-streaming
response = await engine.chat(messages)
print(response.content, response.cost.total)
# streaming
async for chunk in engine.stream(messages):
if chunk.content:
print(chunk.content, end="", flush=True)
asyncio.run(main())
Providers
| Provider | provider= |
Env var |
|---|---|---|
| Anthropic | "anthropic" |
ANTHROPIC_API_KEY |
| OpenAI | "openai" |
OPENAI_API_KEY |
"google" |
GOOGLE_API_KEY |
|
| DeepSeek | "deepseek" |
DEEPSEEK_API_KEY |
| xAI | "xai" |
XAI_API_KEY |
| Moonshot | "moonshot" |
MOONSHOT_API_KEY |
API key falls back to the env var when api_key= is not passed.
Directory structure
sunwaee/
├── utils/
│ └── logger.py # get_logger(name) — scoped under "sunwaee.*"
└── modules/gen/
└── engine/
├── base.py # BaseEngine ABC — chat() + stream()
├── factory.py # get_engine(), close_all_clients(), connection pool
├── model.py # Model dataclass + compute_cost()
├── types.py # Message, Response, Tool, ToolCall, Usage, Cost, Performance, FileAttachment
├── errors.py # EngineError hierarchy
├── models/ # per-provider model registries
│ ├── __init__.py # get_model(), list_models()
│ └── anthropic.py / openai.py / google.py / deepseek.py / xai.py / moonshot.py
└── providers/
├── anthropic.py # AnthropicEngine
├── completions.py # CompletionsEngine (/v1/chat/completions)
├── responses.py # ResponsesEngine (/v1/responses)
└── google.py # GoogleEngine
tests/gen/
└── engine/
├── test_types.py / test_factory.py / test_model.py / test_errors.py
├── providers/
│ └── test_anthropic.py / test_completions.py / test_responses.py / test_google.py
└── live/ # real API calls, excluded from CI (-m live)
├── _shared.py # engines, fixtures, system prompt shared across files
├── test_scenarios.py # all providers x scenarios x chat + stream
├── test_tool_call_result.py
├── test_attachments.py
├── test_chain.py # three-provider conversation chain with shared history
├── test_caching.py
└── test_reasoning.py
Core types
All types are defined in engine/types.py. Key ones:
Message — one turn in a conversation. role is a Role enum (SYSTEM, USER, ASSISTANT, TOOL, CONTEXT). attachments only applies to Role.USER. reasoning_content / reasoning_signature are provider-opaque — echo them back verbatim.
Response — what chat() returns and what stream() yields per chunk. Text arrives in content; reasoning in reasoning_content. The final streaming chunk carries stop_reason, usage, cost, and performance. Chunks with synthetic=True are engine-generated stubs (e.g. silent-reasoning placeholder) — never treat them as real model output.
Tool — a function the model can call. name, description, and parameters (JSON Schema object) are sent to the provider. The optional fn field is not used by the engine itself.
FileAttachment — wraps bytes + filename. Supported types: text/*, application/json, images (jpeg/png/gif/webp), and documents (pdf/docx/xlsx/pptx, requires [files] extra). Size caps enforced at construction: 10 MB for images, 20 MB for documents. See types.py for the full list of accepted MIME types.
Usage / Cost / Performance — token counts, dollar cost, and timing (latency, throughput, reasoning vs content split). Field names are in types.py.
get_engine()
from sunwaee.modules.gen.engine.factory import get_engine, close_all_clients
engine = get_engine(
provider, # "anthropic" | "openai" | "google" | "deepseek" | "xai" | "moonshot"
model, # model name string
api_key=None, # falls back to <PROVIDER>_API_KEY env var
max_tokens=8192,
reasoning_effort=None, # None | "off" | "auto" | any value in model.reasoning_efforts
)
# call once on graceful shutdown to drain all pooled connections
await close_all_clients()
get_engine() reuses a single httpx.AsyncClient per (event_loop, base_url) (WeakKeyDictionary — dead loops drop their clients automatically). See factory.py for timeout and pool limits.
Resolution order
- Effort coercion —
reasoning_effort=Noneon a dynamic model that lists"off"is coerced to"off"(e.g. kimi-k2.5; coercion mergesreasoning_disabled_payloadto disable thinking). Models that use"none"as the disable wire value (gpt-5.x) do not coerce —Noneleaves the reasoning block absent, which lets the model use its default. - Wire-model swap — for
reasoning_mode="dynamic"models:effort in (None, "off")swaps tonon_reasoning_id; any other effort swaps toreasoning_id. No swap occurs when the target variant is not defined. - Validation — non-null effort must appear in
model.reasoning_efforts(raisesValueError). - Routing — OpenAI-compat:
"responses" in model.api_type->ResponsesEngine, elseCompletionsEngine. Anthropic ->AnthropicEngine. Google ->GoogleEngine.
Model dataclass
Defined in engine/model.py. Reasoning-relevant fields:
| Field | Meaning |
|---|---|
reasoning_mode |
"always" / "dynamic" / None |
reasoning_efforts |
valid effort strings; "always" models have no "off"; "dynamic" models that disable via model swap start with "off"; OpenAI gpt-5.x use "none" as the wire disable value |
reasoning_uses_budget |
True = factory maps effort strings to integer token budgets (Anthropic 4.5, Gemini 2.5 flash) |
reasoning_tokens_type |
"raw" / "summary" / None (silent -- engine emits a synthetic stub) |
reasoning_disabled_payload |
merged into request when reasoning is explicitly disabled |
reasoning_id / non_reasoning_id |
paired variant names for model swapping |
api_type |
["responses"] / ["completions"] / both -- routing hint for OpenAI-compat providers |
Pricing fields and the full field list are in engine/model.py.
Usage
Tool calls
Construct Tool objects with a JSON Schema parameters dict and pass them to chat() / stream():
from sunwaee.modules.gen.engine.types import Tool
weather_tool = Tool(
name="get_weather",
description="Return current weather for a location.",
parameters={
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
},
"required": ["location"],
},
)
response = await engine.chat(messages, tools=[weather_tool])
if response.tool_calls:
for tc in response.tool_calls:
print(tc.name, tc.arguments)
File attachments
from sunwaee.modules.gen.engine.types import FileAttachment, Message, Role
with open("report.pdf", "rb") as f:
att = FileAttachment(data=f.read(), filename="report.pdf")
response = await engine.chat([
Message(role=Role.USER, content="Summarise this.", attachments=[att])
])
Error handling
All provider errors subclass EngineError(RuntimeError). Import from engine/errors.py:
from sunwaee.modules.gen.engine.errors import EngineError, RateLimitError, AuthError, TransientError
try:
response = await engine.chat(messages)
except RateLimitError: # 429
...
except AuthError: # 401 / 403
...
except TransientError: # 5xx
...
except EngineError as e:
print(e.status_code)
Listing models
from sunwaee.modules.gen.engine.models import list_models, get_model
all_models = list_models() # list[Model]
model = get_model("claude-sonnet-4-6") # Model | None
Logging
Set SUNWAEE_LOG_LEVEL=debug (or info / warning / error) to enable logs. All engine logs are at DEBUG -- request start/completion, model resolution decisions. See utils/logger.py.
Testing
venv/bin/pytest # unit tests (no API keys needed)
venv/bin/pytest -m live # live tests (real API calls)
venv/bin/pytest -m "not live" # explicit unit-only
Unit test conventions: mock httpx.AsyncClient, never make real HTTP calls. Assert usage, cost, and performance are populated on the final streaming chunk.
Live test files:
| File | What it covers |
|---|---|
test_scenarios.py |
all providers x scenarios x chat() + stream() |
test_tool_call_result.py |
full tool call -> execute -> reply loop |
test_attachments.py |
image attachments, vision-capable providers |
test_chain.py |
three-provider conversation chain with shared history |
test_caching.py |
prompt-cache hit on turn 2 |
test_reasoning.py |
reasoning_effort on/off per model category |
How to add a model
Add a Model(...) entry to engine/models/<provider>.py and ensure it is included in that file's MODELS list (imported by engine/models/__init__.py). Field reference is in engine/model.py. Then run psql/scripts/sync_models.py to mirror the change to the database.
Key rules:
reasoning_mode="dynamic"models that disable reasoning by swapping to a non-reasoning variant list"off"first inreasoning_efforts(e.g. kimi-k2.5). OpenAI gpt-5.x models that disable reasoning via{"reasoning": {"effort": "none"}}on the same model list"none"first instead — do NOT use"off"for these.reasoning_uses_budget=Trueonly for Anthropic 4.5 series and Gemini 2.5 flash/flash-lite.api_type=["responses", "completions"]for OpenAI models that support both endpoints;["completions"]for OpenAI-compat providers (xAI, DeepSeek, Moonshot). Omit for Anthropic and Google.- Pricing tiers: base required;
_200kwhen context > 200k tokens;_128kfor xAI;_272kfor OpenAI. Thresholds are strict>.
How to add an OpenAI-compatible provider
engine/models/<provider>.py--MODELSlist.engine/models/__init__.py-- import and add to_ALL.engine/factory.py-- add to_OPENAI_COMPATIBLEdict ("provider": "https://base-url/v1"). The env var is derived automatically asPROVIDER_API_KEY.tests/gen/engine/live/_shared.py-- add("provider", "model-name")toENGINES.
How to add a provider with a custom API
engine/models/<provider>.py+ register in__init__.py.engine/providers/<provider>.py-- implementBaseEngine:async def chat(messages, tools=None) -> Responseasync def stream(messages, tools=None) -> AsyncIterator[Response]- Accept
client: httpx.AsyncClient | None = None--factory.pyinjects a pooled client. - Call
resolve_tokens()beforecompute_cost()-- some providers exclude reasoning tokens fromoutput_tokens. - Strip
reasoning_content/reasoning_signaturefrom all but the last assistant turn. - Promote system-only input to
Role.USERif the provider rejects system-only requests. - On 4xx/5xx during streaming: read the full body before raising.
- Buffer tool call JSON across SSE chunks; parse only on stop.
engine/factory.py-- wire intoget_engine().- Tests: unit (
providers/test_<provider>.py) + live entry in_shared.py.
Provider-specific notes
resolve_tokens()beforecompute_cost()-- xAI and Google exclude reasoning tokens fromoutput_tokens;resolve_tokensback-calculates fromtotal_tokens.- Strip reasoning from all but the last assistant turn -- stale
reasoning_signaturebreaks APIs. - OpenAI uses
max_completion_tokens, notmax_tokens(CompletionsEngine);max_output_tokensfor ResponsesEngine. - Silent-reasoning models (
grok-4,grok-4-1-fast,grok-4-fast,grok-3-mini) -- stream is silent during thinking; engines yield a syntheticResponse(reasoning_content="Reasoning in progress...", synthetic=True)immediately. - Google: no tool call IDs -- function name used as correlation ID.
thoughtSignatureonfunctionCallparts must be echoed back on every subsequent assistant turn. - Google streaming --
?alt=sserequired onstreamGenerateContent. - Anthropic reasoning: two paths -- newer models (Opus 4.7/4.6, Sonnet 4.6) use
output_config: {effort}+thinking: {type: "adaptive"}; older budget models usethinking: {type: "enabled", budget_tokens: N}. Selected viamodel.reasoning_uses_budget. - Anthropic top-level
cache_control--payload["cache_control"] = {"type": "ephemeral"}at request root enables auto-caching. Do not remove. - Foreign
reasoning_signaturedetection -- Anthropic and Google drop signatures that start with[(ResponsesEngine JSON list format). Echoing them causes base64 decode failures. - OpenAI ResponsesEngine caching -- the Responses API does not do automatic prefix caching by server-side routing without a hint.
ResponsesEnginecomputes aprompt_cache_keyas SHA-256[:32] of the system prompt content and sends it with every request. This pins all requests sharing the same system prompt to the same cache server, enabling prefix-cache hits withoutprevious_response_idorstore=True. - DeepSeek cache tokens -- DeepSeek exposes
prompt_cache_hit_tokensat the top level of theusageobject instead ofprompt_tokens_details.cached_tokens.CompletionsEnginereads both fields, preferring the standard OpenAI field. - OpenAI gpt-5.x reasoning effort --
"none"is the wire value to disable reasoning (not"off"). Sending noreasoningblock defaults to the model's built-in default effort."off"is rejected by these models'reasoning_effortslist and raisesValueErrorat the factory. - OpenAI reasoning effort:
xhigh-- gpt-5.x and gpt-5.4.x support"none" | "low" | "medium" | "high" | "xhigh". The effort is forwarded verbatim in{"reasoning": {"effort": value}}by ResponsesEngine.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sunwaee-1.7.11.tar.gz.
File metadata
- Download URL: sunwaee-1.7.11.tar.gz
- Upload date:
- Size: 75.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
27abedcfb8f4ffa7f91a862644b5ae3c274dfabc1a1c9a829b24cfe8a3e94a3f
|
|
| MD5 |
fe059e597fafee817374f264907db9ff
|
|
| BLAKE2b-256 |
fd729a6217003614acec8b98e43d52c9eadffca33a46db10e841d7449585396e
|
File details
Details for the file sunwaee-1.7.11-py3-none-any.whl.
File metadata
- Download URL: sunwaee-1.7.11-py3-none-any.whl
- Upload date:
- Size: 45.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b434cd14c55390031703ff493ce2c5821ca9503819b04fa3a9daae7d374f968c
|
|
| MD5 |
9dfc036a1423b8724e431f5924a81e10
|
|
| BLAKE2b-256 |
4bb0536ba2159f4685bd5c50bb571cc657cca1cf67ff5bc87d7b1e48f6e47b3e
|