Skip to main content

Provider-agnostic LLM dispatch layer: 3 injected seams (config / usage / dispatch) + a pure cost model. Relays usage to a sink rather than tracking it.

Project description

dispatch-relay

A provider-agnostic LLM layer with three injected seams. Resolve a model, dispatch a call across any provider, and relay usage to a sink your application owns — instead of the library tracking it for you. Pure-stdlib core, zero runtime dependencies.

pip install dispatch-relay

Who it's for: anyone running more than one LLM provider who wants one consistent dispatch + usage-attribution surface, with the host application in control of config resolution, usage recording, and the actual transport. The "relay, not track" name is the contract: usage is relayed to your sink (a database, a log, nothing) — the library never decides where it lands.

This is the dependency-light foundation increment: the three injected-interface seams + the pure cost model. (Caching and the higher-level façade arrive in a later increment and bring langchain-core etc. with them; this increment is pure-stdlib.)

Renamed from omega-llm. import omega_llm still works as a deprecated alias that re-exports dispatch_relay (with a DeprecationWarning) — migrate to import dispatch_relay.

The 3 injected seams (dispatch_relay.interfaces)

Each is a @runtime_checkable typing.Protocol (structural typing — a host satisfies the contract WITHOUT importing this library) + a dependency-light default impl.

Seam Method(s) Default impl A host can back it with
ConfigSource resolve(key, role, default) → model_id DefaultConfigSource (os.getenv(f"{KEY}_MODEL") or default) a config store (role → global → env → default)
UsageSink record(*, provider, role, caller, model, tier, input_tokens, output_tokens, cache_read=0, cache_creation=0, cost_usd=0.0, cost_usd_raw=0.0, billing="metered", **extra) → None NoOpUsageSink (no-op) a usage store / time-series table
DispatchBackend supports(*, provider, role, tier) → bool + dispatch(*, provider, model, messages, tier, role, caller, **kwargs) → LLMResponse DefaultDispatchBackend (direct SDK via injected llm_factory; supports→True) subscription lanes / custom transports

cache_read and cache_creation are separate fields on UsageSink.record and on UsageRecord — summing them undercounts Anthropic. billing marks the lane: "metered" ($-tracked SDK) vs "subscription" ($0).

Value types & core-owned facts (dispatch_relay.core)

@dataclass(frozen=True)
class UsageRecord:  # input_tokens, output_tokens, cache_read=0, cache_creation=0, model=""
@dataclass(frozen=True)
class LLMResponse:  # text, usage: UsageRecord | None, raw: Any

The provider-facts live in dispatch_relay.core (one place, never duplicated per backend):

  • DEFAULTS: dict[str, str] — the abstract-key → model-id table. The core passes default=DEFAULTS[key] into ConfigSource.resolve.
  • extract_usage(provider, raw) → UsageRecord | None — the single place that knows each provider's usage-from-raw shape. Anthropic dual-path: prefer raw.response_metadata["usage"] (the uncached remainder), fall back to raw.usage_metadata only if absent (using the wrong one double-counts). The model name comes from raw.response_metadata["model_name"] (both Anthropic and Gemini surface it there — a real LangChain AIMessage has no top-level .model attribute), falling back to "". Returns None when no usage metadata is present.
  • resolve_usage(response, provider, model) → UsageRecord | None — the locked reconciliation rule: resolve response.usage if response.usage is not None else extract_usage(provider, response.raw), then stamp the authoritative model — the dispatch call knows the configured model, so the dispatch-arg model always wins over whatever the raw echoed (via dataclasses.replace). Returns None unchanged when there's no usage (the subscription lane). LLMResponse.usage is a real escape hatch — a backend MAY pre-populate it; else the core extracts.

Both shipped backends return LLMResponse(usage=None); the core extracts usage. The DefaultDispatchBackend derives text from raw.content: a str passes through; an Anthropic content list has its type=="text" blocks joined (non-text blocks skipped); anything else falls back to str(raw). That fallback is only the default backend's degenerate case — real subscription backends (raws are dicts, not strings) construct text explicitly and pass usage=None with billing="subscription".

The pure cost model (dispatch_relay.cost)

estimate_cost(*, prompt, tier="flash", provider="gemini", output_tokens_max=1024, cache_hit_ratio=0.0, role="agents") -> dict — a single source of cost truth. Pricing tables for Gemini / Anthropic / OpenAI, the Gemini Flex 50% rebate gate, Anthropic + OpenAI cache-ratio math. Zero deps.

Usage

from dispatch_relay import estimate_cost, DefaultConfigSource, DEFAULTS

DefaultConfigSource().resolve("gemini_flash", "council", DEFAULTS["gemini_flash"])
# -> "gemini-2.5-flash"  (env GEMINI_FLASH_MODEL wins if set)
estimate_cost(prompt=10_000, tier="sonnet", provider="anthropic", output_tokens_max=512)

Authors

Pierre Samson and Claude. MIT licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dispatch_relay-0.0.1.tar.gz (41.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dispatch_relay-0.0.1-py3-none-any.whl (31.0 kB view details)

Uploaded Python 3

File details

Details for the file dispatch_relay-0.0.1.tar.gz.

File metadata

  • Download URL: dispatch_relay-0.0.1.tar.gz
  • Upload date:
  • Size: 41.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for dispatch_relay-0.0.1.tar.gz
Algorithm Hash digest
SHA256 d7589dd948a756b01ac32b81d11486f47c3721b21f90855de08a2beb4babf1cb
MD5 b026dc923e5c3242784ba9311ecc4255
BLAKE2b-256 3d43c8b383370f18821e79fe11897d9d356d6e305b635a2b1e96b7e9b1844b1b

See more details on using hashes here.

File details

Details for the file dispatch_relay-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: dispatch_relay-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 31.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for dispatch_relay-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3961e206df4c04a237e4e0c9e829b87bfdfb7b2befe3e7d6f54ef23830398b74
MD5 85d49293c8925ec01b00fed9deac347e
BLAKE2b-256 3260f4c25443266d38f858e98c8a5ac924ae36b35c7f45c508101943fdbe6935

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page