Skip to main content

SUNWÆE gen — multi-provider LLM engine library.

Project description

Coverage Python PyPI License

All LLMs, one response format, one dependency (httpx). Supports switching model in conversations (e.g. draft with GPT, refine with Anthropic).

Handles streaming, tool calls, file attachments, prompt caching, extended thinking, and cost tracking across Anthropic, OpenAI, Google, DeepSeek, xAI, and Moonshot.


Install

pip install sunwaee
pip install "sunwaee[files]"   # pdf, docx, xlsx, pptx extraction
pip install -e ".[dev,files]"  # development

Quick start

import asyncio
from sunwaee.modules.gen.engine import get_engine
from sunwaee.modules.gen.engine.types import Message, Role

engine = get_engine("anthropic", "claude-sonnet-4-6")  # reads ANTHROPIC_API_KEY

async def main():
    messages = [Message(role=Role.USER, content="Hello")]

    response = await engine.chat(messages)
    print(response.content, response.cost.total)

    async for chunk in engine.stream(messages):
        if chunk.content:
            print(chunk.content, end="", flush=True)

asyncio.run(main())

Providers

Provider provider= Env var
Anthropic "anthropic" ANTHROPIC_API_KEY
OpenAI "openai" OPENAI_API_KEY
Google "google" GOOGLE_API_KEY
DeepSeek "deepseek" DEEPSEEK_API_KEY
xAI "xai" XAI_API_KEY
Moonshot "moonshot" MOONSHOT_API_KEY

Directory structure

sunwaee/
├── core/
│   ├── logger.py                 # get_logger(name) — scoped under "sunwaee.*"
│   └── tools.py                  # @tool decorator, ok(), err()
└── modules/gen/
    ├── __init__.py               # public re-exports (get_engine, run, stream_run, …)
    ├── agent.py                  # ReAct loop — run() + stream_run()
    ├── tools.py                  # TOOLS list
    └── engine/
        ├── __init__.py           # get_engine, Message, Response, Tool, …
        ├── base.py               # BaseEngine ABC
        ├── factory.py            # get_engine() — provider routing + connection pooling
        ├── model.py              # Model dataclass + compute_cost()
        ├── types.py              # Message, Response, ToolCall, Usage, Cost, Performance, …
        ├── models/               # model registry per provider
        │   ├── __init__.py       # get_model(), list_models()
        │   ├── anthropic.py / openai.py / google.py / deepseek.py / xai.py / moonshot.py
        └── providers/
            ├── anthropic.py      # AnthropicEngine
            ├── openai.py         # OpenAIEngine (also used by DeepSeek, xAI, Moonshot)
            └── google.py         # GoogleEngine

tests/gen/
├── test_agent.py / test_stream_agent.py / test_tools.py
└── engine/
    ├── test_types.py / test_factory.py / test_model.py
    ├── providers/
    │   └── test_anthropic.py / test_openai.py / test_google.py
    └── live/
        ├── test_providers.py     # real API calls, all providers × all scenarios
        └── run/                  # JSON snapshots (gitignored)

Core types (engine/types.py)

class Role(Enum):       SYSTEM, USER, ASSISTANT, TOOL, CONTEXT
class StopReason(Enum): END_TURN, TOOL_USE, MAX_TOKENS

@dataclass class Message:
    role: Role
    content: str | None
    reasoning_content: str | None       # thinking for models that support it
    reasoning_signature: str | None     # opaque blob — echo back verbatim
    tool_call_id: str | None            # set on Role.TOOL messages
    tool_calls: list[ToolCall] | None
    attachments: list[FileAttachment] | None   # Role.USER only

@dataclass class Error:
    message: str = ""; status_code: int = 0    # defined but never populated — errors are raised

@dataclass class Response:
    provider: str; model: str; streaming: bool; synthetic: bool
    content: str | None; reasoning_content: str | None; reasoning_signature: str | None
    tool_calls: list[ToolCall] | None; stop_reason: StopReason | None; error: Error | None
    usage: Usage | None; cost: Cost | None; performance: Performance | None

@dataclass class ToolCall:
    id: str; name: str; arguments: dict
    thought_signature: str | None    # Google only — echo back every subsequent turn
    error: str | None; duration: float; results: list[dict]

@dataclass class Usage:
    input_tokens: int; output_tokens: int; total_tokens: int
    cache_read_tokens: int; cache_write_tokens: int

@dataclass class Cost:
    input: float; output: float; cache_read: float; cache_write: float; total: float

@dataclass class Performance:
    latency: float            # seconds to first chunk
    reasoning_duration: float; content_duration: float; total_duration: float
    throughput: int           # output tokens / second

@dataclass class FileAttachment:
    data: bytes; filename: str; media_type: str = ""
    # text/* → <file name="…">…</file> block
    # image/jpeg|png|gif|webp → base64 inline
    # application/pdf|json + OOXML (docx/xlsx/pptx) → extracted text

Usage

With tools

from sunwaee.core.tools import tool, ok, err
from sunwaee.modules.gen.engine.types import Tool

@tool("Return the current UTC time.")
def get_time() -> str:
    from datetime import datetime, timezone
    return ok({"time": datetime.now(timezone.utc).isoformat()})

response = await engine.chat(messages, tools=[get_time._tool])

File and image attachments

from sunwaee.modules.gen.engine.types import FileAttachment, Message, Role

with open("report.pdf", "rb") as f:
    att = FileAttachment(data=f.read(), filename="report.pdf")

response = await engine.chat([Message(role=Role.USER, content="Summarise.", attachments=[att])])

Supported: text/*, application/json, image/jpeg|png|gif|webp, application/pdf, .docx, .xlsx, .pptx

ReAct agent loop

from sunwaee.modules.gen.agent import stream_run

new_messages = []
async for chunk in stream_run(messages, tools, engine, new_messages=new_messages):
    if chunk.content:
        print(chunk.content, end="", flush=True)
# new_messages has all assistant + tool turns appended during the run

Up to 10 iterations by default. Concurrent tool calls via asyncio.gather. Sync tools dispatched via run_in_executor.

Listing models

from sunwaee.modules.gen.engine.models import list_models, get_model

all_models = list_models()              # list[Model]
model = get_model("claude-sonnet-4-6")  # Model | None

Testing

pytest tests/gen/ -m "not live"                                        # unit (no keys needed)
pytest tests/gen/ -m live                                              # live (real API calls)
pytest tests/gen/ -m "not live" --cov=sunwaee --cov-report=term-missing

Unit test conventions:

  • Mock httpx.AsyncClient — never make real HTTP calls
  • Assert response.cost, response.usage, response.performance populated on final chunk
  • For streaming, use an async generator as mock transport

Live test scenarios (all providers × chat + stream):

Scenario What it tests
ONLY_SYSTEM System-only input edge case; lenient assertions
ONLY_USER Single user message
SYSTEM_AND_USER System prompt respected in response
TOOL_CALL Model must issue at least one tool call
TOOL_CALL_RESULT Full multi-turn with real tool IDs/signatures
FILE_ATTACHMENT Text file attached; asserts content populated
CONTEXT_ROLE Role.CONTEXT message handled without errors

Image attachments tested separately, parametrized over vision-capable engines only.


How to add a model

File: sunwaee/modules/gen/engine/models/<provider>.py

Model(
    name="provider-model-name",
    display_name="Human Readable Name",
    provider="anthropic",
    context_window=200_000,
    max_output_tokens=64_000,
    input_price_per_mtok=3.0,
    output_price_per_mtok=15.0,
    cache_read_price_per_mtok=0.3,
    cache_write_price_per_mtok=3.75,
    input_price_per_mtok_200k=6.0,     # omit if no >200k tier
    output_price_per_mtok_200k=22.5,
    supports_vision=True, supports_tools=True,
    supports_thinking=True, supports_reasoning_tokens=True,
    cache_min_tokens=1_024,            # omit (None) if caching is undocumented
    release_date="2025-01-01",
)

Pricing tiers (engine/model.py): base required; _128k when input_tokens > 128_000 (xAI only); _200k when > 200_000; _272k when > 272_000 (OpenAI only). Thresholds are strict > — exactly at the boundary uses the lower tier.

cache_min_tokens — minimum tokens required at a cache breakpoint for prompt caching to activate. None = no caching. 0 = no minimum (caches everything). Known values:

Provider Minimum Models
Anthropic 4,096 Opus 4.6, Opus 4.5, Haiku 4.5
Anthropic 2,048 Sonnet 4.6
Anthropic 1,024 Sonnet 4.5
OpenAI 1,024 All models (automatic prefix caching)
Google 1,024 All models (explicit context caching)
xAI 0 All models (automatic, no minimum)
DeepSeek 64 All models (automatic prefix caching)
Moonshot 0 All models (automatic, no minimum)

Tests: Add assertions in engine/test_model.py for non-standard pricing tiers.


How to add an OpenAI-compatible provider

  1. engine/models/<provider>.pyMODELS list
  2. engine/models/__init__.py — import + add to _ALL
  3. engine/factory.py — add to _OPENAI_COMPATIBLE: dict[str, str] (env var auto-derived as PROVIDER_API_KEY)
  4. tests/gen/engine/live/test_providers.py — add ("provider", "cheapest-model") to ENGINES

How to add a provider with a custom API

  1. engine/models/<provider>.py + register in __init__.py
  2. engine/providers/<provider>.py — implement BaseEngine:
    • async def chat(self, messages, tools=None) -> Response
    • async def stream(self, messages, tools=None) -> AsyncIterator[Response]
    • Accept client: httpx.AsyncClient | None = None
    • Call resolve_tokens(usage) before compute_cost
    • Strip reasoning_content/reasoning_signature from all but the last assistant turn
    • Handle system-only input: promote to Role.USER
    • On 4xx/5xx in streaming: read full body before raising
    • Buffer tool call JSON across SSE chunks; parse only on stop
  3. engine/factory.py — wire into get_engine()
  4. Tests: unit (providers/test_<provider>.py) + live entry

How to add a tool to the agent

from typing import Annotated
from sunwaee.core.tools import tool, ok, err

@tool("Search the web for current information.")
def web_search(
    query: Annotated[str, "The search query"],
    num_results: Annotated[int, "Number of results"] = 5,
) -> str:
    try:
        return ok(_do_search(query, num_results))
    except Exception as e:
        return err(str(e))

Register: add web_search._tool to TOOLS in sunwaee/modules/gen/tools.py.

Tests: tests/gen/test_<tool_name>.py — call directly, assert JSON output shape, test error path. Never call real external APIs.


@tool decorator

Introspects signature to build JSON Schema parameters automatically.

Supports: str, int, float, bool, list[T], Literal[...], Optional[T], Annotated[T, "description"]

  • Parameters with defaults → not required
  • Both sync and async supported
  • Must return JSON string: ok(data) / err(message) / json.dumps(...)
ok({"id": "123"})   # '{"ok": true, "data": {"id": "123"}}'
err("Not found")    # '{"ok": false, "error": "Not found"}'

Provider-specific quirks

# Rule
1 resolve_tokens() before compute_cost() — xAI/Google exclude reasoning tokens from output_tokens; resolve_tokens back-calculates from total_tokens
2 Strip reasoning from all but last assistant turn — stale reasoning_signature breaks APIs
3 OpenAI uses max_completion_tokens, not max_tokens
4 OpenAI reasoning models: yield synthetic chunk immediately — stream is silent during thinking; Response(reasoning_content="Reasoning in progress…", synthetic=True)
5 Google: thoughtSignature on functionCall partToolCall.thought_signature; echo every subsequent turn
6 Google: no tool call IDs — use function name as correlation ID
7 Google streaming: ?alt=sse required on streamGenerateContent
8 System-only input — promote system message to Role.USER (Anthropic + Google)
9 Anthropic thinking budget: 1024 ≤ thinking_budget < max_tokens; default max(1024, max_tokens - 1024)
10 Connection pooling: one httpx.AsyncClient per (event_loop_id, base_url) in factory.py
11 Role.CONTEXT mapping: all providers wrap content in <context> tags automatically — Anthropic → {"role":"user","content":"<context>…</context>"}; OpenAI → {"role":"system","content":"<context>…</context>"}; Google → {"role":"user","parts":[{"text":"<context>…</context>"}]}
12 Anthropic cache tokens suppressed when thinking is enabledcache_read_tokens/cache_write_tokens are 0 in the API response even when caching is active; caching still happens transparently

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sunwaee-1.3.1.tar.gz (66.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sunwaee-1.3.1-py3-none-any.whl (40.9 kB view details)

Uploaded Python 3

File details

Details for the file sunwaee-1.3.1.tar.gz.

File metadata

  • Download URL: sunwaee-1.3.1.tar.gz
  • Upload date:
  • Size: 66.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for sunwaee-1.3.1.tar.gz
Algorithm Hash digest
SHA256 a0642ec7a0a8d5d273bb763eae2bf7c7ddbb72458553be576c770b3bf3bd39d3
MD5 95b99d80da25fcdc441007a27545814c
BLAKE2b-256 6ec57afb5b373ba4a4066aa5c075de6684a1a02002e3c25cdf8ca781d86d8c22

See more details on using hashes here.

File details

Details for the file sunwaee-1.3.1-py3-none-any.whl.

File metadata

  • Download URL: sunwaee-1.3.1-py3-none-any.whl
  • Upload date:
  • Size: 40.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for sunwaee-1.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 21519309200b3a79dd26bd372af0fb3befbde98527cfd74a93029933ba7fb092
MD5 70e37033050998ec70f9c20757b6d518
BLAKE2b-256 7c3e9dea868a935108aa4fc10b179587119aae913b0e1686c859ad42fb6b439e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page