Provider-agnostic LLM dispatch layer: 3 injected seams (config / usage / dispatch) + a pure cost model. Relays usage to a sink rather than tracking it.
Project description
dispatch-relay
A provider-agnostic LLM layer with three injected seams. Resolve a model, dispatch a call across any provider, and relay usage to a sink your application owns — instead of the library tracking it for you. Pure-stdlib core, zero runtime dependencies.
pip install dispatch-relay
Who it's for: anyone running more than one LLM provider who wants one consistent dispatch + usage-attribution surface, with the host application in control of config resolution, usage recording, and the actual transport. The "relay, not track" name is the contract: usage is relayed to your sink (a database, a log, nothing) — the library never decides where it lands.
This is the dependency-light foundation increment: the three injected-interface seams + the pure cost model. (Caching and the higher-level façade arrive in a later increment and bring langchain-core etc. with them; this increment is pure-stdlib.)
Renamed from
omega-llm.import omega_llmstill works as a deprecated alias that re-exportsdispatch_relay(with aDeprecationWarning) — migrate toimport dispatch_relay.
The 3 injected seams (dispatch_relay.interfaces)
Each is a @runtime_checkable typing.Protocol (structural typing — a host satisfies the contract WITHOUT importing this library) + a dependency-light default impl.
| Seam | Method(s) | Default impl | A host can back it with |
|---|---|---|---|
ConfigSource |
resolve(key, role, default) → model_id |
DefaultConfigSource (os.getenv(f"{KEY}_MODEL") or default) |
a config store (role → global → env → default) |
UsageSink |
record(*, provider, role, caller, model, tier, input_tokens, output_tokens, cache_read=0, cache_creation=0, cost_usd=0.0, cost_usd_raw=0.0, billing="metered", **extra) → None |
NoOpUsageSink (no-op) |
a usage store / time-series table |
DispatchBackend |
supports(*, provider, role, tier) → bool + dispatch(*, provider, model, messages, tier, role, caller, **kwargs) → LLMResponse |
DefaultDispatchBackend (direct SDK via injected llm_factory; supports→True) |
subscription lanes / custom transports |
cache_read and cache_creation are separate fields on UsageSink.record and on UsageRecord — summing them undercounts Anthropic. billing marks the lane: "metered" ($-tracked SDK) vs "subscription" ($0).
Value types & core-owned facts (dispatch_relay.core)
@dataclass(frozen=True)
class UsageRecord: # input_tokens, output_tokens, cache_read=0, cache_creation=0, model=""
@dataclass(frozen=True)
class LLMResponse: # text, usage: UsageRecord | None, raw: Any
The provider-facts live in dispatch_relay.core (one place, never duplicated per backend):
DEFAULTS: dict[str, str]— the abstract-key → model-id table. The core passesdefault=DEFAULTS[key]intoConfigSource.resolve.extract_usage(provider, raw) → UsageRecord | None— the single place that knows each provider's usage-from-raw shape. Anthropic dual-path: preferraw.response_metadata["usage"](the uncached remainder), fall back toraw.usage_metadataonly if absent (using the wrong one double-counts). The model name comes fromraw.response_metadata["model_name"](both Anthropic and Gemini surface it there — a real LangChainAIMessagehas no top-level.modelattribute), falling back to"". ReturnsNonewhen no usage metadata is present.resolve_usage(response, provider, model) → UsageRecord | None— the locked reconciliation rule: resolveresponse.usage if response.usage is not None else extract_usage(provider, response.raw), then stamp the authoritativemodel— the dispatch call knows the configuredmodel, so the dispatch-arg model always wins over whatever the raw echoed (viadataclasses.replace). ReturnsNoneunchanged when there's no usage (the subscription lane).LLMResponse.usageis a real escape hatch — a backend MAY pre-populate it; else the core extracts.
Both shipped backends return LLMResponse(usage=None); the core extracts usage. The DefaultDispatchBackend derives text from raw.content: a str passes through; an Anthropic content list has its type=="text" blocks joined (non-text blocks skipped); anything else falls back to str(raw). That fallback is only the default backend's degenerate case — real subscription backends (raws are dicts, not strings) construct text explicitly and pass usage=None with billing="subscription".
The pure cost model (dispatch_relay.cost)
estimate_cost(*, prompt, tier="flash", provider="gemini", output_tokens_max=1024, cache_hit_ratio=0.0, role="agents") -> dict — a single source of cost truth. Pricing tables for Gemini / Anthropic / OpenAI, the Gemini Flex 50% rebate gate, Anthropic + OpenAI cache-ratio math. Zero deps.
Usage
from dispatch_relay import estimate_cost, DefaultConfigSource, DEFAULTS
DefaultConfigSource().resolve("gemini_flash", "council", DEFAULTS["gemini_flash"])
# -> "gemini-2.5-flash" (env GEMINI_FLASH_MODEL wins if set)
estimate_cost(prompt=10_000, tier="sonnet", provider="anthropic", output_tokens_max=512)
Authors
Pierre Samson and Claude. MIT licensed.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dispatch_relay-0.0.1.tar.gz.
File metadata
- Download URL: dispatch_relay-0.0.1.tar.gz
- Upload date:
- Size: 41.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d7589dd948a756b01ac32b81d11486f47c3721b21f90855de08a2beb4babf1cb
|
|
| MD5 |
b026dc923e5c3242784ba9311ecc4255
|
|
| BLAKE2b-256 |
3d43c8b383370f18821e79fe11897d9d356d6e305b635a2b1e96b7e9b1844b1b
|
File details
Details for the file dispatch_relay-0.0.1-py3-none-any.whl.
File metadata
- Download URL: dispatch_relay-0.0.1-py3-none-any.whl
- Upload date:
- Size: 31.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3961e206df4c04a237e4e0c9e829b87bfdfb7b2befe3e7d6f54ef23830398b74
|
|
| MD5 |
85d49293c8925ec01b00fed9deac347e
|
|
| BLAKE2b-256 |
3260f4c25443266d38f858e98c8a5ac924ae36b35c7f45c508101943fdbe6935
|