The official Python SDK for the OpenModex AI Gateway API
Project description
OpenModex Python SDK
The official Python SDK for the OpenModex AI Gateway API. Access 100+ LLM models from OpenAI, Anthropic, Google, DeepSeek, Mistral, and Qwen through a single unified API with intelligent routing, automatic fallbacks, and built-in cost tracking.
Features
- Unified API -- One client for all major LLM providers
- Smart Routing -- Automatic model selection optimized for cost, latency, or quality
- Client-Side Fallbacks -- Automatic retry with backup models on failure
- Streaming -- First-class SSE streaming with sync and async iterators
- Async Support -- Full
asynciosupport viahttpx - Lightweight -- Zero dependencies beyond
httpx(no Pydantic required) - Type Safe -- Fully typed with dataclasses and type hints
- OpenAI Compatible -- Drop-in replacement by changing
base_url
Requirements
- Python 3.8+
Installation
pip install openmodex
Quick Start
import os
from openmodex import OpenModex
client = OpenModex(api_key=os.environ["OPENMODEX_API_KEY"])
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "What is OpenModex?"},
],
)
print(response.choices[0].message.content)
Usage
Chat Completions
response = client.chat.completions.create(
model="claude-3-5-sonnet",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."},
],
temperature=0.7,
max_tokens=1000,
)
print(response.choices[0].message.content)
Streaming
stream = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "Write a short story."},
],
stream=True,
)
with stream:
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Async Usage
import asyncio
from openmodex import AsyncOpenModex
async def main():
client = AsyncOpenModex(api_key="omx_sk_...")
# Non-streaming
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
# Streaming
stream = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
)
async with stream:
async for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
await client.close()
asyncio.run(main())
Smart Routing (OpenModex Extension)
Let the gateway pick the best model or optimize for cost/latency:
from openmodex import OpenModex, MODEL_AUTO, MODEL_CHEAPEST
# Use model aliases
response = client.chat.completions.create(
model=MODEL_AUTO, # "@auto" -- balanced selection
# model=MODEL_CHEAPEST, # "@cheapest" -- lowest cost
messages=[{"role": "user", "content": "Hello!"}],
)
# Or configure routing strategy per-request
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
routing={"strategy": "cost_optimized", "allow_upgrade": True},
)
OpenModex Metadata
Every response includes OpenModex-specific metadata:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
)
if response.openmodex:
print(f"Provider: {response.openmodex.provider}")
print(f"Model used: {response.openmodex.model_used}")
print(f"Cache hit: {response.openmodex.cache_hit}")
print(f"Routing: {response.openmodex.routing_strategy}")
print(f"Latency: {response.openmodex.latency_ms}ms")
print(f"Request ID: {response.openmodex.request_id}")
Cache Control
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is 2+2?"}],
cache={"enabled": True, "ttl": 3600},
)
Client-Side Fallbacks
Automatically retry with backup models on failure:
client = OpenModex(
api_key="omx_sk_...",
fallback_models=["gpt-4o", "claude-3-5-sonnet", "gemini-1.5-pro"],
)
# If gpt-4o fails (5xx/timeout), automatically tries the next model
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
)
Embeddings
response = client.embeddings.create(
model="text-embedding-3-small",
input="The quick brown fox jumps over the lazy dog.",
)
print(f"Dimensions: {len(response.data[0].embedding)}")
Models
# List all available models
models = client.models.list()
for m in models.data:
print(f"{m.id} ({m.provider})")
# Get a specific model
model = client.models.retrieve("openai/gpt-4o")
print(f"{model.name}: {model.description}")
# Compare models side by side
comparison = client.models.compare(["openai/gpt-4o", "anthropic/claude-3-5-sonnet"])
print(f"Cheapest: {comparison.highlights.cheapest}")
print(f"Best quality: {comparison.highlights.best_quality}")
Legacy Completions
response = client.completions.create(
model="gpt-3.5-turbo-instruct",
prompt="Once upon a time",
max_tokens=100,
)
print(response.choices[0].text)
Error Handling
from openmodex import OpenModex, APIError, AllFallbacksFailedError
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
)
except APIError as e:
print(f"API error: {e.message} (status: {e.status_code}, code: {e.code})")
if e.is_rate_limited:
print("Rate limited -- back off and retry")
if e.is_auth_error:
print("Check your API key")
except AllFallbacksFailedError:
print("All fallback models failed")
Configuration
| Parameter | Description | Default |
|---|---|---|
api_key |
Your OpenModex API key | OPENMODEX_API_KEY env var |
base_url |
API base URL | https://api.openmodex.com/v1 |
timeout |
Request timeout (seconds) | 30.0 |
max_retries |
Max retry attempts on transient errors | 2 |
default_headers |
Headers sent with every request | {} |
default_model |
Default model when none specified | None |
fallback_models |
Ordered fallback model chain | [] |
http_client |
Custom httpx.Client / httpx.AsyncClient |
Auto-created |
OpenAI SDK Compatibility
OpenModex Gateway supports drop-in compatibility. If you are already using the OpenAI Python SDK, you can route through OpenModex by changing just the base_url:
from openai import OpenAI
# Just change the base URL and API key
client = OpenAI(
base_url="https://api.openmodex.com/compat/openai/v1",
api_key="omx_sk_live_...",
)
# Everything else works exactly the same
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
)
Examples
See the examples/ directory for runnable examples:
- chat_completion.py -- Basic chat completion with routing
- streaming.py -- Sync and async SSE streaming
- models.py -- List, retrieve, and compare models
Version
import openmodex
print(openmodex.__version__) # "0.1.0"
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openmodex_sdk-0.1.0.tar.gz.
File metadata
- Download URL: openmodex_sdk-0.1.0.tar.gz
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b01c1ca3457769744dba1f9578e054bd44990ae5b23ed42e27c983a4f48c6fa
|
|
| MD5 |
853cda038899380f931c409f922dfe67
|
|
| BLAKE2b-256 |
bef986d4bb4678320c5d512762e176c01e778d79c113a47db5cdb70d17978682
|
File details
Details for the file openmodex_sdk-0.1.0-py3-none-any.whl.
File metadata
- Download URL: openmodex_sdk-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42aaae80711a13d5d3a2fb6f820a3bdd56b52680b31807431c59d02cc1675820
|
|
| MD5 |
497091093da6ff74329cfb453ed9c921
|
|
| BLAKE2b-256 |
c321819e645f9b90bec237cf86c75cf0aa8ebd614070b1247f8990118b3c7812
|