Skip to main content

Official Python client for the Arbr AI control-plane gateway — one function to route, observe, and govern every LLM call.

Project description

arbr-client (Python)

Official Python client for the Arbr AI control plane — one function to route, observe, and govern every LLM call your app makes.

Your app calls the gateway instead of provider SDKs. The gateway holds the provider keys, honors the model you pin (or picks one when you say "auto"), applies human-approved routing rules and cost policies, and logs every call with full cost attribution — visible in the dashboard.

  • Zero dependencies — Python ≥ 3.11, stdlib only. Sync and async (achat/astream).
  • One function for the 90% casechat().
  • Robust by default — per-attempt timeouts, retries with exponential backoff + jitter on network errors / 429 / 5xx, typed errors.
  • Optional LangChain integration — a real BaseChatModel via arbr-client[langchain].

Install

pip install arbr-client                # core (zero deps)
pip install "arbr-client[langchain]"   # + the LangChain BaseChatModel adapter
# (pre-release: pip install /path/to/arbr_client-0.1.0-py3-none-any.whl)

60-second quickstart

from arbr_client import create_client

arbr = create_client(
    "http://localhost:4100",      # or set ARBR_GATEWAY_URL
    application="my-app",         # attribution — shows up in the dashboard
)

res = arbr.chat("Summarise this support ticket: ...", model="auto", max_tokens=300)
print(res.text)
print(res.model, res.routing_decision)   # e.g. "gpt-4o-mini", "ai"

Async (FastAPI, LangGraph, etc.):

res = await arbr.achat("Summarise this ticket: ...", model="auto")

That's a complete integration. No provider keys in your app, and every call is logged, costed, and governable from the dashboard.

How model choice works

You send What happens
model="gpt-4o" (provider connected) Honored as-is — all routing policies skipped. routing_decision == "explicit"
model="auto" or omitted The gateway decides: cache → operator rules → automated routing (cost guardrail or AI policy) → default model
a model whose provider isn't connected Falls back to the router (same as "auto")

res.model_requested shows what you asked for, res.model what served it, res.routing_decision why (explicit / rule / auto / ai / cache / fallback / passthrough), and res.classified_by how the task type was determined (provided / keyword / ai).

API

create_client(base_url=None, *, application=None, workflow=None, department=None, user_id=None, api_key=None, timeout_s=60, retries=2) → Client

base_url falls back to $ARBR_GATEWAY_URL; api_key to $ARBR_API_KEY. A gateway API key (ab_…, dashboard → Settings → API keys) is sent as Authorization: Bearer and binds attribution server-side — required once the gateway has Require API keys on. The metadata kwargs are defaults merged into every call (per-call kwargs override them).

Client.chat(messages, *, model=None, provider=None, task_type=None, temperature=None, max_tokens=None, ...) → ChatResponse

messages accepts a bare string, {"role", "content"} dicts, or LangChain message objects. ChatResponse is a frozen dataclass: text, usage (input_tokens/output_tokens/total_tokens), model, model_requested, provider, routing_decision, classified_by, cache_hit, request_id, plus .raw (the unmodified gateway payload).

Client.achat(...) / Client.astream(...) / Client.astatus()

Async counterparts (the blocking call runs in a worker thread via asyncio.to_thread).

Streaming

The gateway supports two streaming modes:

Real SSE (token-by-token) — use the OpenAI-compatible endpoint at POST /v1/chat/completions with stream=True. Works with the OpenAI Python SDK, any chat UI, or a raw httpx/requests call:

from openai import OpenAI

client = OpenAI(api_key="ab_…", base_url="http://localhost:4100")
stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Tell me a joke"}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Client.stream(messages, ...) → Iterator[str] — makes one buffered chat() call and yields the text in small chunks. Useful when you want full routing metadata (res.model, res.routing_decision, etc.) alongside a streaming-style emit:

for chunk in arbr.stream("Explain quantum entanglement simply"):
    print(chunk, end="", flush=True)

Use the OpenAI-compat endpoint when you need real token-by-token delivery or are integrating with chat UIs. Use stream() when you want the routing metadata the OpenAI endpoint doesn't expose.

Client.status() → dict

Healthcheck against GET /api/statusdemoMode, liveProviders, defaultProvider, defaultModel, routingMode, breachedCaps. When the gateway has admin auth enabled (ARBR_ADMIN_KEY set server-side), this endpoint requires a credential — your gateway api_key is accepted, so set it and status() keeps working.

Error handling

All failures raise GatewayError with .status, .code, .retryable, .request_id:

code Meaning Retried automatically?
invalid_input Bad arguments (caught before any network call) no
bad_request Gateway rejected the request (HTTP 400) no
demo_mode Gateway has no provider keys configured (HTTP 503) no
provider_error All providers failed for this call (HTTP 502) yes (5xx)
http_error Other non-2xx 429/5xx only
invalid_api_key Missing/unknown/revoked gateway API key (HTTP 401) no
budget_exceeded A budget cap with action Block is breached for your scope (HTTP 429) no — retrying won't help until the window rolls past
rate_limited Your API key is over its requests/minute limit (HTTP 429) yes
network Connection failed yes
timeout Per-attempt timeout elapsed yes

LangChain integration

Two options, by how deep your LangChain usage goes:

1. Full BaseChatModel (recommended for LangChain/LangGraph apps) — requires the extra:

from arbr_client import create_client
from arbr_client.langchain import ArbrChatModel

client = create_client("http://localhost:4100", application="my-app")
llm = ArbrChatModel(client=client, model_name="auto", max_tokens=1024)

chain = my_prompt | llm           # full Runnable compatibility:
await chain.ainvoke({...})        # pipes, async, batching, callbacks

2. Zero-dep duck-typed adapter — when you don't want a langchain-core dependency:

from arbr_client import as_langchain_model
llm = as_langchain_model(client, workflow="answer-drafting")
msg = llm.invoke(messages)        # .invoke()/.ainvoke(); AIMessage-shaped result

Out of gateway scope either way: tool calling / with_structured_output, embeddings, and token-level streaming — keep those on direct provider SDKs.

Gradual rollout pattern

Gate the swap at your app's LLM factory so nothing else changes:

def get_llm():
    if os.environ.get("ARBR_GATEWAY_URL"):
        return ArbrChatModel(client=_arbr_client(), model_name=settings.llm_model)
    return build_direct_provider_model()   # unchanged path

Unset ARBR_GATEWAY_URL to revert instantly.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arbr_client-0.1.0-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file arbr_client-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: arbr_client-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for arbr_client-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 52a0fb96dec54a27121bc598d454d976958861518dce415810fc0274973714ef
MD5 13d8aeba1523493a06d59287a3f59a38
BLAKE2b-256 e923d3b76c1630c6b0b8d0650f85f97472aacb773caf431d95ea6415ef66f360

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page