OmniGate — a litellm-style multi-provider LLM SDK: call OpenAI, Anthropic, Gemini & Azure in-process with routing, retry, fallback, circuit breaking, cost tracking and an opt-in cache — or point it at a hosted OmniGate gateway.
Project description
omnigate
A small, fully-typed, litellm-style multi-provider LLM SDK — sync and
async, streaming-aware, with typed errors. Depends only on httpx and
pydantic.
pip install omnigate
Two ways to use it:
- In-process — call OpenAI / Anthropic / Gemini / Azure directly, no server to run. You get routing, retry + backoff, fallbacks, circuit breaking, per-call cost tracking, an opt-in response cache, callbacks and a local spend cap.
- Hosted gateway client — point
Client/AsyncClientat a running OmniGate server for centralised auth, budgets, rate limiting and metrics.
In-process quick start
Set a provider key the usual way (OPENAI_API_KEY, ANTHROPIC_API_KEY,
GEMINI_API_KEY / GOOGLE_API_KEY, or AZURE_OPENAI_API_KEY +
AZURE_OPENAI_ENDPOINT) — or pass api_key= explicitly.
import omnigate
r = omnigate.completion(model="gpt-4o-mini", messages="Say hi in French")
print(r.content, r.usage.total_tokens, r.cost_usd, r.model, r.provider)
messages is flexible: pass a bare string (treated as one user message), a
single dict/Message, or a list of dicts/Messages. The model name routes to
the provider by prefix (gpt-*/o1/o3/o4 → OpenAI, claude-* → Anthropic,
gemini-* → Gemini, azure/<deployment> → Azure OpenAI).
Async
import asyncio, omnigate
async def main():
r = await omnigate.acompletion(
model="claude-3-5-haiku-latest",
messages=[{"role": "user", "content": "hi"}],
)
print(r.content)
asyncio.run(main())
Streaming
completion(stream=True) returns an iterator of StreamChunk; the async twin
returns an async iterator. Content chunks carry text; the final chunk carries
usage.
for chunk in omnigate.completion(model="gpt-4o-mini", messages="haiku", stream=True):
print(chunk.text, end="", flush=True)
# async
async for chunk in await omnigate.acompletion(model="gpt-4o-mini",
messages="haiku", stream=True):
print(chunk.text, end="")
Fallbacks
Try models in order until one succeeds. Each may resolve to a different
provider; transient failures (429/5xx/timeout) trip the breaker, client errors
(4xx) just move on. response.fallback_used tells you if a fallback answered.
r = omnigate.completion(
model="gpt-4o-mini",
messages="hi",
fallbacks=["claude-3-5-haiku-latest", "gemini-1.5-flash"],
)
Cost tracking
Every non-streamed response carries cost_usd computed from a built-in
per-model price table (omnigate.pricing). Cached hits are billed as 0.0.
Response cache (opt-in)
A deterministic, in-memory TTL cache for repeated temperature=0 calls. Enable
per call with cache=True, or globally via configure(cache_enabled=True).
r1 = omnigate.completion(model="gpt-4o-mini", messages="2+2?", temperature=0, cache=True)
r2 = omnigate.completion(model="gpt-4o-mini", messages="2+2?", temperature=0, cache=True)
assert r2.cached and r2.cost_usd == 0.0 # served from cache, no second API call
Callbacks
Register success/failure hooks to log usage, cost and latency to your own sink.
omnigate.register_callback(
on_success=lambda e: print(e.provider, e.model, e.cost_usd, e.latency_ms),
on_failure=lambda e: print("failed:", e.exception),
)
Local spend cap
Set a process-wide USD ceiling; once reached, further calls raise
BudgetExceededError.
omnigate.configure(max_spend_usd=5.00)
Configuration & keys
configure(...) sets process-global defaults and/or keys; per-call kwargs
(timeout=, num_retries=, cache=, api_key=, api_base=, api_version=)
override them. Everything also reads from the environment:
| Setting | Env var | Default |
|---|---|---|
| Request timeout (s) | OMNIGATE_TIMEOUT_SECONDS |
60 |
| Retry attempts | OMNIGATE_RETRY_MAX_ATTEMPTS |
3 |
| Retry base delay (s) | OMNIGATE_RETRY_BASE_DELAY_SECONDS |
0.25 |
| Retry max delay (s) | OMNIGATE_RETRY_MAX_DELAY_SECONDS |
8.0 |
| Retry jitter (s) | OMNIGATE_RETRY_JITTER_SECONDS |
0.25 |
| Circuit breaker on | OMNIGATE_CIRCUIT_BREAKER_ENABLED |
true |
| Breaker fail threshold | OMNIGATE_CIRCUIT_BREAKER_FAIL_THRESHOLD |
5 |
| Breaker cooldown (s) | OMNIGATE_CIRCUIT_BREAKER_COOLDOWN_SECONDS |
30 |
| Cache on | OMNIGATE_CACHE_ENABLED |
false |
| Cache TTL (s) | OMNIGATE_CACHE_TTL_SECONDS |
300 |
| Local spend cap (USD) | OMNIGATE_MAX_SPEND_USD |
(off) |
import omnigate
omnigate.configure(
openai_api_key="sk-...",
anthropic_api_key="...",
azure_endpoint="https://my.openai.azure.com",
cache_enabled=True,
num_retries=3, # note: in configure this is EngineConfig.retry_max_attempts
)
# Azure: deployment is taken from the model id
omnigate.completion(model="azure/my-gpt4o-deployment", messages="hi",
api_key="...", api_base="https://my.openai.azure.com")
Errors (in-process)
All errors derive from GatewayError.
| Exception | When |
|---|---|
AuthError |
provider returned 401/403 (your provider key is bad) |
RateLimitError |
429 — has .retry_after (honored by retry) |
BudgetExceededError |
local spend cap reached |
ProviderError |
5xx / network / timeout (retried, then surfaced) |
APIError |
config errors (unknown model, missing key) and other 4xx |
Hosted gateway client
If you run an OmniGate server, point the client at it for centralised auth, budgets, rate limiting and metrics. The client talks the gateway's HTTP surface; it does not call providers itself.
from omnigate import Client
# Public client (no key) just for signup:
with Client(base_url="https://gw.example.com") as anon:
acct = anon.signup(email="dev@acme.com", org_name="Acme", project_name="prod")
client = Client(api_key=acct.api_key, base_url="https://gw.example.com", user_id="u-42")
client.set_provider_key(provider="openai", api_key="sk-...") # stored encrypted by the gateway
resp = client.chat(model="gpt-4o-mini", messages=[{"role": "user", "content": "Hi"}])
print(resp.content, resp.usage.total_tokens, resp.cost_usd)
client.close()
AsyncClient mirrors Client exactly (identical constructor and method names),
but every method is async def and chat_stream returns an async iterator. Use
async with / await client.aclose().
import asyncio
from omnigate import AsyncClient, BudgetExceededError, RateLimitError
async def main():
async with AsyncClient(api_key="llmg_...", base_url="https://gw.example.com") as c:
try:
async for piece in c.chat_stream(model="claude-3-5-sonnet-latest", messages="hi"):
print(piece, end="")
except RateLimitError as e:
print("slow down; retry after", e.retry_after)
except BudgetExceededError as e:
print("budget hit:", e.detail)
asyncio.run(main())
Pointing the OpenAI SDK at the gateway
The gateway exposes an OpenAI-compatible POST /v1/chat/completions, so you can
reuse the official OpenAI SDK and just change the base URL + key:
from openai import OpenAI
oai = OpenAI(
base_url="https://gw.example.com/v1",
api_key="llmg_...",
default_headers={"x-api-key": "llmg_...", "x-user-id": "u-42"},
)
oai.chat.completions.create(model="gpt-4o-mini", messages=[{"role": "user", "content": "Hi"}])
Models, metrics & key management (hosted)
for m in client.models(): # GET /v1/models
print(m.id, m.owned_by, m.provider)
mx = client.metrics(range="7d") # GET /v1/metrics (1h | 24h | 7d | 30d)
print(mx.totals.requests, mx.totals.cost_usd, mx.totals.p95_latency_ms)
key = client.create_api_key(name="ci") # POST /v1/keys/api -> ApiKeyCreated (plaintext shown once)
client.me(); client.health()
Gateway-client errors map the same exception hierarchy; a provider-surfaced 401
is classified as ProviderError (not AuthError) so you can tell "my gateway
key is bad" from "my OpenAI key is bad".
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file omnigate-0.2.0.tar.gz.
File metadata
- Download URL: omnigate-0.2.0.tar.gz
- Upload date:
- Size: 42.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4693639aa98619cbcd941a6595f7e670cdda588ae814e158b6b503902ca1c43c
|
|
| MD5 |
87bde789730c805f76c9c5e264e2bf82
|
|
| BLAKE2b-256 |
ca01a040fcd95abcf7fa3a3acc03d7996d56c2dbe505204a0296e51bdcb42606
|
Provenance
The following attestation bundles were made for omnigate-0.2.0.tar.gz:
Publisher:
publish.yml on sreekarp/omnigate
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
omnigate-0.2.0.tar.gz -
Subject digest:
4693639aa98619cbcd941a6595f7e670cdda588ae814e158b6b503902ca1c43c - Sigstore transparency entry: 1723677947
- Sigstore integration time:
-
Permalink:
sreekarp/omnigate@6dc0a6e1f1d1e93f05cf2ddf95e34c4f53609404 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/sreekarp
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6dc0a6e1f1d1e93f05cf2ddf95e34c4f53609404 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file omnigate-0.2.0-py3-none-any.whl.
File metadata
- Download URL: omnigate-0.2.0-py3-none-any.whl
- Upload date:
- Size: 47.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af257ba92ba9c447697bd9a370a57a1d84bfe2dc84c856196359a1d4991744ee
|
|
| MD5 |
9b174ebb1b507836824917a57ea8a630
|
|
| BLAKE2b-256 |
0476776702e55797860ec4995133754d460c3656ae3725a995e375203ac13a60
|
Provenance
The following attestation bundles were made for omnigate-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on sreekarp/omnigate
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
omnigate-0.2.0-py3-none-any.whl -
Subject digest:
af257ba92ba9c447697bd9a370a57a1d84bfe2dc84c856196359a1d4991744ee - Sigstore transparency entry: 1723678047
- Sigstore integration time:
-
Permalink:
sreekarp/omnigate@6dc0a6e1f1d1e93f05cf2ddf95e34c4f53609404 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/sreekarp
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6dc0a6e1f1d1e93f05cf2ddf95e34c4f53609404 -
Trigger Event:
workflow_dispatch
-
Statement type: