SUNWÆE gen — multi-provider LLM engine library.
Project description
All LLMs, one response format, one dependency (httpx). Supports switching model in conversations (e.g. draft with GPT, refine with Anthropic).
Handles streaming, tool calls, file attachments, prompt caching, reasoning on/off, and cost tracking across Anthropic, OpenAI, Google, DeepSeek, xAI, and Moonshot.
Install
pip install sunwaee
pip install "sunwaee[files]" # pdf, docx, xlsx, pptx extraction
pip install -e ".[dev,files]" # development
Quick start
import asyncio
from sunwaee.modules.gen.engine import get_engine
from sunwaee.modules.gen.engine.types import Message, Role
# enable_reasoning=False (default) — reasoning disabled / non-reasoning variant used
engine = get_engine("anthropic", "claude-sonnet-4-6")
# enable_reasoning=True — activates thinking for all providers
engine_reasoning = get_engine("anthropic", "claude-sonnet-4-6", enable_reasoning=True)
async def main():
messages = [Message(role=Role.USER, content="Hello")]
response = await engine.chat(messages)
print(response.content, response.cost.total)
async for chunk in engine.stream(messages):
if chunk.content:
print(chunk.content, end="", flush=True)
asyncio.run(main())
Providers
| Provider | provider= |
Env var |
|---|---|---|
| Anthropic | "anthropic" |
ANTHROPIC_API_KEY |
| OpenAI | "openai" |
OPENAI_API_KEY |
"google" |
GOOGLE_API_KEY |
|
| DeepSeek | "deepseek" |
DEEPSEEK_API_KEY |
| xAI | "xai" |
XAI_API_KEY |
| Moonshot | "moonshot" |
MOONSHOT_API_KEY |
Directory structure
sunwaee/
├── core/
│ ├── logger.py # get_logger(name) — scoped under "sunwaee.*"
│ └── tools.py # @tool decorator, ok(), err()
└── modules/gen/
├── __init__.py # public re-exports (get_engine, run, stream_run, …)
├── agent.py # ReAct loop — run() + stream_run()
├── tools.py # TOOLS list
└── engine/
├── __init__.py # get_engine, Message, Response, Tool, …
├── base.py # BaseEngine ABC
├── factory.py # get_engine() — provider routing + connection pooling
├── model.py # Model dataclass + compute_cost()
├── types.py # Message, Response, ToolCall, Usage, Cost, Performance, …
├── models/ # model registry per provider
│ ├── __init__.py # get_model(), list_models()
│ ├── anthropic.py / openai.py / google.py / deepseek.py / xai.py / moonshot.py
└── providers/
├── anthropic.py # AnthropicEngine
├── openai.py # OpenAIEngine (also used by DeepSeek, xAI, Moonshot)
└── google.py # GoogleEngine
tests/gen/
├── test_agent.py / test_stream_agent.py / test_tools.py
└── engine/
├── test_types.py / test_factory.py / test_model.py
├── providers/
│ └── test_anthropic.py / test_openai.py / test_google.py
└── live/
├── _shared.py # shared config, data, helpers for all live tests
├── test_scenarios.py # all providers × all scenarios × chat + stream
├── test_tool_call_result.py # TOOL_CALL → execute → reply, all providers
├── test_attachments.py # image attachments, vision-capable providers
├── test_chain.py # three-provider conversation chain
├── test_caching.py # prompt-cache hit on turn 2
├── test_reasoning.py # reasoning ON / OFF per model category
└── run/ # JSON snapshots (gitignored)
Core types (engine/types.py)
class Role(Enum): SYSTEM, USER, ASSISTANT, TOOL, CONTEXT
class StopReason(Enum): END_TURN, TOOL_USE, MAX_TOKENS
@dataclass class Message:
role: Role
content: str | None
reasoning_content: str | None # thinking for models that support it
reasoning_signature: str | None # opaque blob — echo back verbatim
tool_call_id: str | None # set on Role.TOOL messages
tool_calls: list[ToolCall] | None
attachments: list[FileAttachment] | None # Role.USER only
@dataclass class Response:
provider: str; model: str; streaming: bool; synthetic: bool
content: str | None; reasoning_content: str | None; reasoning_signature: str | None
tool_calls: list[ToolCall] | None; stop_reason: StopReason | None; error: Error | None
usage: Usage | None; cost: Cost | None; performance: Performance | None
@dataclass class ToolCall:
id: str; name: str; arguments: dict
thought_signature: str | None # Google only — echo back every subsequent turn
error: str | None; duration: float; results: list[dict]
@dataclass class Usage:
input_tokens: int; output_tokens: int; total_tokens: int
cache_read_tokens: int; cache_write_tokens: int
@dataclass class Cost:
input: float; output: float; cache_read: float; cache_write: float; total: float
@dataclass class Performance:
latency: float # seconds to first chunk
reasoning_duration: float; content_duration: float; total_duration: float
throughput: int # output tokens / second
@dataclass class FileAttachment:
data: bytes; filename: str; media_type: str = ""
# text/* → <file name="…">…</file> block
# image/jpeg|png|gif|webp → base64 inline
# application/pdf|json + OOXML (docx/xlsx/pptx) → extracted text
get_engine() — reasoning control
engine = get_engine(
provider,
model,
api_key=None, # falls back to <PROVIDER>_API_KEY env var
max_tokens=8192,
enable_reasoning=False, # True activates thinking/reasoning for all providers
)
Connection pool
get_engine() reuses a single httpx.AsyncClient per (event_loop, base_url). Clients are configured with Timeout(connect=5s, read=300s, write=30s) and Limits(max_connections=50). On graceful shutdown, call:
from sunwaee.modules.gen.engine import close_all_clients
await close_all_clients()
enable_reasoning resolves all provider complexity automatically:
Model reasoning_mode |
enable_reasoning=True |
enable_reasoning=False |
|---|---|---|
"dynamic" |
Sends provider thinking config | No thinking config sent |
"always" + non_reasoning_id |
Uses the model as-is | Swaps to non_reasoning_id variant |
"always" + no swap |
Uses the model as-is (always reasons) | Uses the model as-is (cannot disable) |
None (no reasoning) + reasoning_id |
Swaps to reasoning_id variant |
Uses the model as-is |
Default reasoning configs when enable_reasoning=True:
| Provider / series | Mechanism | Value |
|---|---|---|
| Anthropic (Opus 4.7/4.6, Sonnet 4.6) | output_config.effort + thinking: {type: "adaptive"} |
effort="high" |
| Anthropic (Opus 4.5, Haiku 4.5, Sonnet 4.5) | thinking: {type: "enabled", budget_tokens: N} |
max(1024, max_tokens - 1024) |
Google Gemini 3 (has reasoning_efforts) |
thinkingConfig.thinkingLevel |
"high" |
Google Gemini 2.5 (no reasoning_efforts) |
thinkingConfig.thinkingBudget |
-1 (dynamic) |
OpenAI-compat (has reasoning_efforts) |
reasoning_effort |
"high" |
When enable_reasoning=False for dynamic models: Anthropic effort models use effort="low"; Gemini 3 uses the lowest effort in reasoning_efforts[0]; Gemini 2.5 uses thinkingBudget=0.
Model dataclass (engine/model.py)
@dataclass class Model:
name: str; provider: str; display_name: str; description: str | None
# specs
context_window: int; max_output_tokens: int | None
# features
supports_vision: bool
supports_tools: bool
supports_reasoning: bool # True if model has any reasoning capability
# reasoning config
reasoning_mode: str | None # "always" | "dynamic" | None
reasoning_efforts: list[str] | None # valid effort levels (e.g. ["low","medium","high"])
reasoning_tokens_type: str | None # "raw" | "summary" | None
reasoning_disabled_payload: dict | None # merged into request when reasoning is disabled
reasoning_id: str | None # for non-reasoning variants: name of reasoning counterpart
non_reasoning_id: str | None # for reasoning models: name of non-reasoning counterpart
cache_min_tokens: int | None
# pricing (per million tokens)
input_price_per_mtok: float | None; output_price_per_mtok: float | None
cache_read_price_per_mtok: float | None; cache_write_price_per_mtok: float | None
input_price_per_mtok_128k: float | None # xAI only
output_price_per_mtok_128k: float | None
input_price_per_mtok_200k: float | None # most providers
output_price_per_mtok_200k: float | None; ...
input_price_per_mtok_272k: float | None # OpenAI only
output_price_per_mtok_272k: float | None; ...
release_date: str | None; deprecated_at: str | None; sunset_at: str | None
reasoning_mode:
"always"— model always reasons; disable by swapping tonon_reasoning_id(xAI models, DeepSeek Reasoner)"dynamic"— reasoning can be toggled on/off via provider-specific config (Anthropic, Google, OpenAI, Moonshot)None— no reasoning capability
reasoning_tokens_type:
"raw"— full reasoning content returned inresponse.reasoning_content(Anthropic, DeepSeek, grok-4.20, kimi-k2.5)"summary"— summarised thought returned (Google, grok-4.20 on/v1/responsesendpoint)None— reasoning tokens tracked internally only; content not exposed (OpenAI, most xAI)
reasoning_efforts:
- List of valid named effort levels for the model's reasoning API parameter.
- Anthropic:
["low","medium","high","max"]or["low","medium","high","xhigh","max"]for Opus 4.7. - Google Gemini 3:
["minimal","low","medium","high"]or["low","medium","high"]depending on model. - OpenAI:
["none","low","medium","high","xhigh"]or similar per-model. Nonefor models that use integer budgets (Gemini 2.5) or have no effort control.
Usage
With tools
from sunwaee.core.tools import tool, ok, err
from sunwaee.modules.gen.engine.types import Tool
@tool("Return the current UTC time.")
def get_time() -> str:
from datetime import datetime, timezone
return ok({"time": datetime.now(timezone.utc).isoformat()})
response = await engine.chat(messages, tools=[get_time._tool])
File and image attachments
from sunwaee.modules.gen.engine.types import FileAttachment, Message, Role
with open("report.pdf", "rb") as f:
att = FileAttachment(data=f.read(), filename="report.pdf")
response = await engine.chat([Message(role=Role.USER, content="Summarise.", attachments=[att])])
Supported: text/*, application/json, image/jpeg|png|gif|webp, application/pdf, .docx, .xlsx, .pptx
ReAct agent loop
from sunwaee.modules.gen.agent import stream_run
new_messages = []
async for chunk in stream_run(
messages,
tools,
engine,
new_messages=new_messages,
tool_timeout=60.0, # seconds per individual tool call (default: 60)
max_concurrent_tools=8, # max tools running simultaneously (default: 8)
):
if chunk.content:
print(chunk.content, end="", flush=True)
# new_messages has all assistant + tool turns appended during the run
Up to 10 iterations by default. Concurrent tool calls via asyncio.gather, bounded by max_concurrent_tools. Sync tools dispatched via run_in_executor. Unknown keyword arguments supplied by the model are silently filtered before calling the tool function.
Error types
All provider errors subclass EngineError(RuntimeError), so existing except RuntimeError handlers continue to work. Import subclasses to handle specific cases:
from sunwaee.modules.gen.engine import EngineError, RateLimitError, AuthError, TransientError
try:
response = await engine.chat(messages)
except RateLimitError as e: # 429 — back off and retry
...
except AuthError as e: # 401 / 403 — invalid key
...
except TransientError as e: # 5xx — server-side; may be retried
...
except EngineError as e: # other 4xx
print(e.status_code)
Listing models
from sunwaee.modules.gen.engine.models import list_models, get_model
all_models = list_models() # list[Model]
model = get_model("claude-sonnet-4-6") # Model | None
Testing
pytest tests/gen/ -m "not live" # unit (no keys needed)
pytest tests/gen/ -m live # live (real API calls)
pytest tests/gen/ -m "not live" --cov=sunwaee --cov-report=term-missing
# run a single live test file
pytest -m live tests/gen/engine/live/test_caching.py
pytest -m live tests/gen/engine/live/test_reasoning.py
Unit test conventions:
- Mock
httpx.AsyncClient— never make real HTTP calls - Assert
response.cost,response.usage,response.performancepopulated on final chunk - For streaming, use an async generator as mock transport
Live test files and what they cover:
| File | What it tests |
|---|---|
test_scenarios.py |
6 scenarios × 6 providers × chat + stream (72 tests) |
test_tool_call_result.py |
Full TOOL_CALL → execute → reply loop, all providers |
test_attachments.py |
PNG image attachment, vision-capable providers |
test_chain.py |
Three-provider conversation chain with shared history |
test_caching.py |
Prompt-cache hit on turn 2, static system prompt |
test_reasoning.py |
enable_reasoning ON / OFF per model category |
Live scenarios:
| Scenario | What it tests |
|---|---|
ONLY_SYSTEM |
System-only input edge case; lenient assertions |
ONLY_USER |
Single user message |
SYSTEM_AND_USER |
System prompt respected in response |
TOOL_CALL |
Model must issue at least one tool call |
TOOL_CALL_RESULT |
Full multi-turn with real tool IDs/signatures |
FILE_ATTACHMENT |
Text file attached; asserts content populated |
CONTEXT_ROLE |
Role.CONTEXT message handled without errors |
All live tests default to enable_reasoning=False. test_reasoning.py is the only file that explicitly passes enable_reasoning=True.
How to add a model
File: sunwaee/modules/gen/engine/models/<provider>.py
Model(
name="provider-model-name",
display_name="Human Readable Name",
provider="anthropic",
context_window=200_000,
max_output_tokens=64_000,
input_price_per_mtok=3.0,
output_price_per_mtok=15.0,
cache_read_price_per_mtok=0.3,
cache_write_price_per_mtok=3.75,
input_price_per_mtok_200k=6.0, # omit if no >200k tier
output_price_per_mtok_200k=22.5,
supports_vision=True,
supports_tools=True,
supports_reasoning=True,
reasoning_mode="dynamic", # "always" | "dynamic" | None
reasoning_efforts=["low", "medium", "high", "max"], # omit if not applicable
reasoning_tokens_type="raw", # "raw" | "summary" | None
non_reasoning_id="model-non-reasoning", # omit if no non-reasoning variant
cache_min_tokens=1_024, # omit (None) if caching is undocumented
release_date="2025-01-01",
)
For a non-reasoning variant that pairs with a reasoning model:
Model(
name="model-non-reasoning",
...same pricing...,
supports_reasoning=False,
reasoning_id="model", # points to the reasoning counterpart
)
Pricing tiers (engine/model.py): base required; _128k when input_tokens > 128_000 (xAI only); _200k when > 200_000; _272k when > 272_000 (OpenAI only). Thresholds are strict > — exactly at the boundary uses the lower tier.
cache_min_tokens — minimum tokens required at a cache breakpoint for prompt caching to activate. None = no caching. 0 = no minimum (caches everything). Known values:
| Provider | Minimum | Models |
|---|---|---|
| Anthropic | 4,096 | Opus 4.7, Opus 4.6, Opus 4.5, Haiku 4.5 |
| Anthropic | 2,048 | Sonnet 4.6 |
| Anthropic | 1,024 | Sonnet 4.5 |
| OpenAI | 1,024 | All models (automatic prefix caching) |
| 1,024 | All models (explicit context caching) | |
| xAI | 0 | All models (automatic, no minimum) |
| DeepSeek | 64 | All models (automatic prefix caching) |
| Moonshot | 0 | All models (automatic, no minimum) |
How to add an OpenAI-compatible provider
engine/models/<provider>.py—MODELSlistengine/models/__init__.py— import + add to_ALLengine/factory.py— add to_OPENAI_COMPATIBLE: dict[str, str](env var auto-derived asPROVIDER_API_KEY)tests/gen/engine/live/_shared.py— add("provider", "cheapest-model")toENGINES
How to add a provider with a custom API
engine/models/<provider>.py+ register in__init__.pyengine/providers/<provider>.py— implementBaseEngine:async def chat(self, messages, tools=None) -> Responseasync def stream(self, messages, tools=None) -> AsyncIterator[Response]- Accept
client: httpx.AsyncClient | None = None - Call
resolve_tokens(usage)beforecompute_cost - Strip
reasoning_content/reasoning_signaturefrom all but the last assistant turn - Handle system-only input: promote to
Role.USER - On 4xx/5xx in streaming: read full body before raising
- Buffer tool call JSON across SSE chunks; parse only on stop
engine/factory.py— wire intoget_engine(), handleenable_reasoningfor the new provider- Tests: unit (
providers/test_<provider>.py) + live entry in_shared.py
How to add a tool to the agent
from typing import Annotated
from sunwaee.core.tools import tool, ok, err
@tool("Search the web for current information.")
def web_search(
query: Annotated[str, "The search query"],
num_results: Annotated[int, "Number of results"] = 5,
) -> str:
try:
return ok(_do_search(query, num_results))
except Exception as e:
return err(str(e))
Register: add web_search._tool to TOOLS in sunwaee/modules/gen/tools.py.
Tests: tests/gen/test_<tool_name>.py — call directly, assert JSON output shape, test error path. Never call real external APIs.
@tool decorator
Introspects signature to build JSON Schema parameters automatically.
Supports: str, int, float, bool, list[T], Literal[...], Optional[T], Annotated[T, "description"]
- Parameters with defaults → not
required - Both sync and async supported
- Must return JSON string:
ok(data)/err(message)/json.dumps(...)
ok({"id": "123"}) # '{"ok": true, "data": {"id": "123"}}'
err("Not found") # '{"ok": false, "error": "Not found"}'
Provider-specific quirks
| # | Rule |
|---|---|
| 1 | resolve_tokens() before compute_cost() — xAI/Google exclude reasoning tokens from output_tokens; resolve_tokens back-calculates from total_tokens. Always called unconditionally — it's a no-op when counts already match. |
| 2 | Strip reasoning from all but last assistant turn — stale reasoning_signature breaks APIs and blocks mid-session provider switches. |
| 3 | OpenAI uses max_completion_tokens, not max_tokens. |
| 4 | OpenAI reasoning models: yield synthetic chunk immediately — stream is silent during thinking; Response(reasoning_content="Reasoning in progress…", synthetic=True). Never treat synthetic=True as real content. |
| 5 | Google: thoughtSignature on functionCall part → ToolCall.thought_signature; echo every subsequent turn. |
| 6 | Google: no tool call IDs — use function name as correlation ID. |
| 7 | Google streaming: ?alt=sse required on streamGenerateContent. |
| 8 | System-only input — promote system message to Role.USER (Anthropic + Google). |
| 9 | Anthropic reasoning: two paths. Newer models (Opus 4.7/4.6, Sonnet 4.6) use output_config: {effort: X} + thinking: {type: "adaptive"}. Older models (Opus 4.5, Haiku 4.5, Sonnet 4.5) use thinking: {type: "enabled", budget_tokens: N} with 1024 ≤ budget < max_tokens. The factory selects the path based on whether the model has reasoning_efforts. |
| 10 | Connection pooling: one httpx.AsyncClient per (event_loop_id, base_url) in factory.py. |
| 11 | Role.CONTEXT mapping: all providers wrap content in <context> tags automatically — Anthropic → {"role":"user","content":"<context>…</context>"}; OpenAI → {"role":"system","content":"<context>…</context>"}; Google → {"role":"user","parts":[{"text":"<context>…</context>"}]}. |
| 12 | Google Gemini 3 uses thinkingLevel (string: "minimal"/"low"/"medium"/"high"); Gemini 2.5 uses thinkingBudget (int: -1 = dynamic, 0 = off, N = fixed). The engine selects based on whether the model has reasoning_efforts. Gemini 3.1 Pro and 2.5 Pro cannot disable thinking (reasoning_mode="always"). |
| 13 | kimi-k2.5 (Moonshot) reasons by default — disabling thinking requires an explicit payload {"thinking": {"type": "disabled"}}. Set via Model.reasoning_disabled_payload; the OpenAI engine merges it when reasoning_effort is None. |
| 14 | xAI always-reasoning models (grok-4.20, grok-4-1-fast, grok-4-fast) route to a non-reasoning variant on enable_reasoning=False via non_reasoning_id. Models without a non_reasoning_id (grok-4, grok-3-mini, grok-code-fast-1) cannot have reasoning disabled. |
| 15 | grok-4.20 returns reasoning_content on chat/completions — reasoning_tokens_type="summary" refers to the /v1/responses endpoint only; on chat/completions the field carries full raw reasoning text. |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sunwaee-1.5.0.tar.gz.
File metadata
- Download URL: sunwaee-1.5.0.tar.gz
- Upload date:
- Size: 77.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f973b8bc84e86286d19c5d749e5d93d91fa3924e0f939878f1f218c5092523d
|
|
| MD5 |
55dbb45ef35c886ae8eec691d0652372
|
|
| BLAKE2b-256 |
1a947635090df47c1f204d425cb4c1a7bb101bf6104d89cf7ac6114bac5c8fd3
|
File details
Details for the file sunwaee-1.5.0-py3-none-any.whl.
File metadata
- Download URL: sunwaee-1.5.0-py3-none-any.whl
- Upload date:
- Size: 46.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52c7a6294fd9da5ec35a7395631293c3b2a19fe5fa3f05913cc82c9e7d070bfa
|
|
| MD5 |
15ad6bd1f062f97b17da882bc07478ac
|
|
| BLAKE2b-256 |
6de06cf60165ad3af7a4442f06b3e097afa8af5fa0a59e6de2c75b8143722d4c
|