Batteries-included loadbalancing client for Azure OpenAI
Project description
Azure Switchboard
Batteries-included, coordination-free client loadbalancing for Azure OpenAI and OpenAI.
uv add azure-switchboard
Overview
azure-switchboard is a Python 3 asyncio library that provides an API-compatible client loadbalancer for Chat Completions. You instantiate a Switchboard with one or more Deployments, and requests are distributed across healthy deployments using the power of two random choices method. Deployments can point at Azure OpenAI (base_url=.../openai/v1/) or OpenAI (base_url=None).
Features
- API Compatibility:
Switchboard.createis a transparently-typed proxy forOpenAI.chat.completions.create. - Coordination-Free: The default Two Random Choices algorithm does not require coordination between client instances to achieve excellent load distribution characteristics.
- Utilization-Aware: TPM/RPM utilization is tracked per model per deployment for use during selection.
- Batteries Included:
- Session Affinity: Provide a
session_idto route requests in the same session to the same deployment. - Automatic Failover: Retries are controlled by a tenacity
AsyncRetryingpolicy (failover_policy). - Pluggable Selection: Custom selection algorithms can be provided by passing a callable to the
selectorparameter on the Switchboard constructor. - OpenTelemetry Integration: Built-in metrics for request routing and healthy deployment counts.
- Session Affinity: Provide a
- Lightweight: Small codebase with minimal dependencies:
openai,tenacity,wrapt, andopentelemetry-api.
Runnable Example
#!/usr/bin/env python3
#
# To run this, use:
# uv run --env-file .env tools/readme_example.py
#
# /// script
# requires-python = ">=3.10"
# dependencies = [
# "azure-switchboard",
# ]
# ///
import asyncio
import os
from azure_switchboard import Deployment, Model, Switchboard
azure_openai_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
azure_openai_api_key = os.getenv("AZURE_OPENAI_API_KEY")
openai_api_key = os.getenv("OPENAI_API_KEY")
deployments = []
if azure_openai_endpoint and azure_openai_api_key:
# create 3 deployments. reusing the endpoint
# is fine for the purposes of this demo
for name in ("east", "west", "south"):
deployments.append(
Deployment(
name=name,
base_url=f"{azure_openai_endpoint}/openai/v1/",
api_key=azure_openai_api_key,
models=[Model(name="gpt-4o-mini")],
)
)
if openai_api_key:
deployments.append(
Deployment(
name="openai",
api_key=openai_api_key,
models=[Model(name="gpt-4o-mini")],
)
)
if not deployments:
raise RuntimeError(
"Set AZURE_OPENAI_ENDPOINT/AZURE_OPENAI_API_KEY or OPENAI_API_KEY to run this example."
)
async def main():
async with Switchboard(deployments=deployments) as sb:
print("Basic functionality:")
await basic_functionality(sb)
print("Session affinity (should warn):")
await session_affinity(sb)
async def basic_functionality(switchboard: Switchboard):
# Make a completion request (non-streaming)
response = await switchboard.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello, world!"}],
)
print("completion:", response.choices[0].message.content)
# Make a streaming completion request
stream = await switchboard.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello, world!"}],
stream=True,
)
print("streaming: ", end="")
async for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
print()
async def session_affinity(switchboard: Switchboard):
session_id = "anything"
# First message will select a random healthy
# deployment and associate it with the session_id
r = await switchboard.create(
session_id=session_id,
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Who won the World Series in 2020?"}],
)
d1 = switchboard.select_deployment(model="gpt-4o-mini", session_id=session_id)
print("deployment 1:", d1)
print("response 1:", r.choices[0].message.content)
# Follow-up requests with the same session_id will route to the same deployment
r2 = await switchboard.create(
session_id=session_id,
model="gpt-4o-mini",
messages=[
{"role": "user", "content": "Who won the World Series in 2020?"},
{"role": "assistant", "content": r.choices[0].message.content},
{"role": "user", "content": "Who did they beat?"},
],
)
print("response 2:", r2.choices[0].message.content)
# Simulate a failure by marking down the deployment
d1.models["gpt-4o-mini"].mark_down()
# A new deployment will be selected for this session_id
r3 = await switchboard.create(
session_id=session_id,
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Who won the World Series in 2021?"}],
)
d2 = switchboard.select_deployment(model="gpt-4o-mini", session_id=session_id)
print("deployment 2:", d2)
print("response 3:", r3.choices[0].message.content)
assert d2 != d1
if __name__ == "__main__":
asyncio.run(main())
Benchmarks
just bench
uv run --env-file .env tools/bench.py -v -r 1000 -d 10 -e 500
Distributing 1000 requests across 10 deployments
Max inflight requests: 1000
Request 500/1000 completed
Utilization Distribution:
0.000 - 0.200 | 0
0.200 - 0.400 | 10 ..............................
0.400 - 0.600 | 0
0.600 - 0.800 | 0
0.800 - 1.000 | 0
Avg utilization: 0.339 (0.332 - 0.349)
Std deviation: 0.006
{
'bench_0': {'gpt-4o-mini': {'util': 0.361, 'tpm': '10556/30000', 'rpm': '100/300'}},
'bench_1': {'gpt-4o-mini': {'util': 0.339, 'tpm': '9819/30000', 'rpm': '100/300'}},
'bench_2': {'gpt-4o-mini': {'util': 0.333, 'tpm': '9405/30000', 'rpm': '97/300'}},
'bench_3': {'gpt-4o-mini': {'util': 0.349, 'tpm': '10188/30000', 'rpm': '100/300'}},
'bench_4': {'gpt-4o-mini': {'util': 0.346, 'tpm': '10210/30000', 'rpm': '99/300'}},
'bench_5': {'gpt-4o-mini': {'util': 0.341, 'tpm': '10024/30000', 'rpm': '99/300'}},
'bench_6': {'gpt-4o-mini': {'util': 0.343, 'tpm': '10194/30000', 'rpm': '100/300'}},
'bench_7': {'gpt-4o-mini': {'util': 0.352, 'tpm': '10362/30000', 'rpm': '102/300'}},
'bench_8': {'gpt-4o-mini': {'util': 0.35, 'tpm': '10362/30000', 'rpm': '102/300'}},
'bench_9': {'gpt-4o-mini': {'util': 0.365, 'tpm': '10840/30000', 'rpm': '101/300'}}
}
Utilization Distribution:
0.000 - 0.100 | 0
0.100 - 0.200 | 0
0.200 - 0.300 | 0
0.300 - 0.400 | 10 ..............................
0.400 - 0.500 | 0
0.500 - 0.600 | 0
0.600 - 0.700 | 0
0.700 - 0.800 | 0
0.800 - 0.900 | 0
0.900 - 1.000 | 0
Avg utilization: 0.348 (0.333 - 0.365)
Std deviation: 0.009
Distribution overhead: 926.14ms
Average response latency: 5593.77ms
Total latency: 17565.37ms
Requests per second: 1079.75
Overhead per request: 0.93ms
Distribution overhead scales ~linearly with the number of deployments.
Configuration Reference
switchboard.Model Parameters
| Parameter | Description | Default |
|---|---|---|
name |
Model name as sent to Chat Completions | Required |
tpm |
Tokens-per-minute budget used for utilization tracking and routing | 0 (unlimited) |
rpm |
Requests-per-minute budget used for utilization tracking and routing | 0 (unlimited) |
default_cooldown |
Cooldown duration (seconds) after a deployment/model failure mark-down | 10.0 |
switchboard.Deployment Parameters
| Parameter | Description | Default |
|---|---|---|
name |
Unique identifier for the deployment | Required |
base_url |
API base URL. Azure example: https://<resource>.openai.azure.com/openai/v1/. OpenAI: leave None. |
None |
api_key |
API key for the deployment | None |
timeout |
Default request timeout (seconds) | 600.0 |
models |
Models available on this deployment | Built-in model name defaults |
switchboard.Switchboard Parameters
| Parameter | Description | Default |
|---|---|---|
deployments |
List of deployment configs | Required |
selector |
Deployment selection function (model, eligible_deployments) -> deployment |
two_random_choices |
failover_policy |
Tenacity AsyncRetrying policy used around each create call |
AsyncRetrying(stop=stop_after_attempt(2), retry=retry_if_not_exception_type(SwitchboardError), reraise=True) |
ratelimit_window |
How often usage counters reset (seconds). Set 0 to disable periodic reset. |
60.0 |
max_sessions |
LRU capacity for session affinity map | 1024 |
Development
This project uses uv for package management, and just for task automation. See the justfile for available commands.
git clone https://github.com/arini-ai/azure-switchboard
cd azure-switchboard
just install
Running tests
just test
Release
This library uses CalVer for versioning. On push to master, if tests pass, a package is automatically built, released, and uploaded to PyPI.
Locally, the package can be built with uv:
uv build
OpenTelemetry Integration
azure-switchboard uses OpenTelemetry metrics via the meter azure_switchboard.switchboard.
Metrics emitted on the request path include:
healthy_deployments_count(gauge)requests(counter, with deployment + model attributes)
To run with local OTEL instrumentation:
just otel-run
Contributing
- Fork/clone repo
- Make changes
- Run tests with
just test - Lint with
just lint - Commit and make a PR
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file azure_switchboard-2026.2.1.tar.gz.
File metadata
- Download URL: azure_switchboard-2026.2.1.tar.gz
- Upload date:
- Size: 196.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f33b41050eba2c667553250d12cc2e3a12b359f77730c43bb0050734307cb0a5
|
|
| MD5 |
74521d20f50c2d3496fe8a19509182df
|
|
| BLAKE2b-256 |
04e89e6b822e1b336760de4797d40c22b42f3295ef4a45b5cf8ad2e51692e3af
|
File details
Details for the file azure_switchboard-2026.2.1-py3-none-any.whl.
File metadata
- Download URL: azure_switchboard-2026.2.1-py3-none-any.whl
- Upload date:
- Size: 12.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3645d11d91e17420ee3618df920cb111a035db07b0aad449f1c2b97332f57e37
|
|
| MD5 |
83ee784756a4ed6b0e850123b7ee0139
|
|
| BLAKE2b-256 |
746c11131f897779e3004243aa43b30b214564325e660865995b0739022ca483
|