Official Python client for the Arbr AI control-plane gateway — one function to route, observe, and govern every LLM call.
Project description
arbr-client (Python)
Official Python client for the Arbr AI control plane — one function to route, observe, and govern every LLM call your app makes.
Your app calls the gateway instead of provider SDKs. The gateway holds the provider keys,
honors the model you pin (or picks one when you say "auto"), applies human-approved routing
rules and cost policies, and logs every call with full cost attribution — visible in the dashboard.
- Zero dependencies — Python ≥ 3.11, stdlib only. Sync and async (
achat/astream). - One function for the 90% case —
chat(). - Robust by default — per-attempt timeouts, retries with exponential backoff + jitter on network errors / 429 / 5xx, typed errors.
- Optional LangChain integration — a real
BaseChatModelviaarbr-client[langchain].
Install
pip install arbr-client # core (zero deps)
pip install "arbr-client[langchain]" # + the LangChain BaseChatModel adapter
# (pre-release: pip install /path/to/arbr_client-0.1.0-py3-none-any.whl)
60-second quickstart
from arbr_client import create_client
arbr = create_client(
"http://localhost:4100", # or set ARBR_GATEWAY_URL
application="my-app", # attribution — shows up in the dashboard
)
res = arbr.chat("Summarise this support ticket: ...", model="auto", max_tokens=300)
print(res.text)
print(res.model, res.routing_decision) # e.g. "gpt-4o-mini", "ai"
Async (FastAPI, LangGraph, etc.):
res = await arbr.achat("Summarise this ticket: ...", model="auto")
That's a complete integration. No provider keys in your app, and every call is logged, costed, and governable from the dashboard.
How model choice works
| You send | What happens |
|---|---|
model="gpt-4o" (provider connected) |
Honored as-is — all routing policies skipped. routing_decision == "explicit" |
model="auto" or omitted |
The gateway decides: cache → operator rules → automated routing (cost guardrail or AI policy) → default model |
| a model whose provider isn't connected | Falls back to the router (same as "auto") |
res.model_requested shows what you asked for, res.model what served it, res.routing_decision
why (explicit / rule / auto / ai / cache / fallback / passthrough), and res.classified_by how
the task type was determined (provided / keyword / ai).
API
create_client(base_url=None, *, application=None, workflow=None, department=None, user_id=None, api_key=None, timeout_s=60, retries=2) → Client
base_url falls back to $ARBR_GATEWAY_URL; api_key to $ARBR_API_KEY. A gateway API key
(ab_…, dashboard → Settings → API keys) is sent as Authorization: Bearer and binds attribution
server-side — required once the gateway has Require API keys on. The metadata kwargs are defaults
merged into every call (per-call kwargs override them).
Client.chat(messages, *, model=None, provider=None, task_type=None, temperature=None, max_tokens=None, ...) → ChatResponse
messages accepts a bare string, {"role", "content"} dicts, or LangChain message objects.
ChatResponse is a frozen dataclass: text, usage (input_tokens/output_tokens/total_tokens),
model, model_requested, provider, routing_decision, classified_by, cache_hit,
request_id, plus .raw (the unmodified gateway payload).
Client.achat(...) / Client.astream(...) / Client.astatus()
Async counterparts (the blocking call runs in a worker thread via asyncio.to_thread).
Streaming
The gateway supports two streaming modes:
Real SSE (token-by-token) — use the OpenAI-compatible endpoint at POST /v1/chat/completions
with stream=True. Works with the OpenAI Python SDK, any chat UI, or a raw httpx/requests call:
from openai import OpenAI
client = OpenAI(api_key="ab_…", base_url="http://localhost:4100")
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Tell me a joke"}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)
Client.stream(messages, ...) → Iterator[str] — makes one buffered chat() call and yields
the text in small chunks. Useful when you want full routing metadata (res.model,
res.routing_decision, etc.) alongside a streaming-style emit:
for chunk in arbr.stream("Explain quantum entanglement simply"):
print(chunk, end="", flush=True)
Use the OpenAI-compat endpoint when you need real token-by-token delivery or are integrating with
chat UIs. Use stream() when you want the routing metadata the OpenAI endpoint doesn't expose.
Client.status() → dict
Healthcheck against GET /api/status — demoMode, liveProviders, defaultProvider,
defaultModel, routingMode, breachedCaps.
When the gateway has admin auth enabled (ARBR_ADMIN_KEY set server-side), this endpoint
requires a credential — your gateway api_key is accepted, so set it and status() keeps working.
Error handling
All failures raise GatewayError with .status, .code, .retryable, .request_id:
code |
Meaning | Retried automatically? |
|---|---|---|
invalid_input |
Bad arguments (caught before any network call) | no |
bad_request |
Gateway rejected the request (HTTP 400) | no |
demo_mode |
Gateway has no provider keys configured (HTTP 503) | no |
provider_error |
All providers failed for this call (HTTP 502) | yes (5xx) |
http_error |
Other non-2xx | 429/5xx only |
invalid_api_key |
Missing/unknown/revoked gateway API key (HTTP 401) | no |
budget_exceeded |
A budget cap with action Block is breached for your scope (HTTP 429) | no — retrying won't help until the window rolls past |
rate_limited |
Your API key is over its requests/minute limit (HTTP 429) | yes |
network |
Connection failed | yes |
timeout |
Per-attempt timeout elapsed | yes |
LangChain integration
Two options, by how deep your LangChain usage goes:
1. Full BaseChatModel (recommended for LangChain/LangGraph apps) — requires the extra:
from arbr_client import create_client
from arbr_client.langchain import ArbrChatModel
client = create_client("http://localhost:4100", application="my-app")
llm = ArbrChatModel(client=client, model_name="auto", max_tokens=1024)
chain = my_prompt | llm # full Runnable compatibility:
await chain.ainvoke({...}) # pipes, async, batching, callbacks
2. Zero-dep duck-typed adapter — when you don't want a langchain-core dependency:
from arbr_client import as_langchain_model
llm = as_langchain_model(client, workflow="answer-drafting")
msg = llm.invoke(messages) # .invoke()/.ainvoke(); AIMessage-shaped result
Out of gateway scope either way: tool calling / with_structured_output, embeddings, and
token-level streaming — keep those on direct provider SDKs.
Gradual rollout pattern
Gate the swap at your app's LLM factory so nothing else changes:
def get_llm():
if os.environ.get("ARBR_GATEWAY_URL"):
return ArbrChatModel(client=_arbr_client(), model_name=settings.llm_model)
return build_direct_provider_model() # unchanged path
Unset ARBR_GATEWAY_URL to revert instantly.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file arbr_client-0.1.0-py3-none-any.whl.
File metadata
- Download URL: arbr_client-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52a0fb96dec54a27121bc598d454d976958861518dce415810fc0274973714ef
|
|
| MD5 |
13d8aeba1523493a06d59287a3f59a38
|
|
| BLAKE2b-256 |
e923d3b76c1630c6b0b8d0650f85f97472aacb773caf431d95ea6415ef66f360
|