Official Python SDK for Ferro Labs AI Gateway — route LLM requests across 29 providers with a single OpenAI-compatible API
Project description
Route LLM requests across 29 providers and 2,500+ models through a single OpenAI-compatible API.
Zero code changes to migrate from openai. Built on Ferro Labs AI Gateway.
from ferrolabsai import FerroClient
client = FerroClient(api_key="sk-ferro-...")
# Route to OpenAI
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
# Route to Anthropic — same client, same call
response = client.chat.completions.create(
model="claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": "Hello"}],
)
print(response.content)
print(f"Handled by: {response.provider} in {response.latency_ms}ms")
Why ferrolabsai
- One API for 29 providers. OpenAI, Anthropic, Google, Groq, Together, Mistral, Cohere, Bedrock, Vertex, Azure, and more — all via a single client.
- Drop-in OpenAI replacement. The surface matches the OpenAI SDK. Change two lines and keep all your existing code.
- Smart routing built in. Fallback chains, weighted load balancing, and per-request overrides via
route_tag. - Cost and provider visibility. Every response includes
provider,cost_usd,latency_ms, andtrace_id— no extra calls. - Self-hostable. Point
base_urlat any Ferro Labs AI Gateway instance and go. - Typed and async-first. Dataclass response models, full
AsyncFerroClient, streaming in both modes.
Contents
- Installation
- Quickstart
- Migrate from OpenAI
- Framework integrations
- Usage
- Observability
- Configuration
- Error handling
- Admin API (OSS gateway)
- Development
- License
Installation
pip install ferrolabsai
Requires Python 3.9+. The only runtime dependency is httpx.
Quickstart
You'll need a running Ferro Labs AI Gateway instance and an API key issued by it.
from ferrolabsai import FerroClient
client = FerroClient(
api_key="sk-ferro-your-key",
base_url="http://localhost:8080", # your gateway address
)
Environment variables
export FERRO_API_KEY="sk-ferro-your-key"
export FERRO_BASE_URL="http://localhost:8080"
client = FerroClient() # reads FERRO_API_KEY / FERRO_BASE_URL automatically
FERRO_API_KEY takes precedence, but OPENAI_API_KEY is also accepted as a fallback to make migration painless.
Migrate from OpenAI
# Before
from openai import OpenAI
client = OpenAI(api_key="sk-openai-...")
# After — all your existing code works unchanged
from ferrolabsai import FerroClient
client = FerroClient(api_key="sk-ferro-...")
Every client.chat.completions.create(...) call, every streaming loop, every tool call — identical API surface. Ferro routes to the right provider based on the model name.
Framework integrations
Ferro's gateway exposes an OpenAI-compatible HTTP API at /v1/*, so anything that speaks OpenAI works. Point the base URL at your gateway and keep your existing framework.
LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
api_key="sk-ferro-your-key",
base_url="http://localhost:8080/v1",
model="gpt-4o",
)
response = llm.invoke("Hello from LangChain via Ferro")
LlamaIndex
from llama_index.llms.openai import OpenAI
llm = OpenAI(
api_key="sk-ferro-your-key",
api_base="http://localhost:8080/v1",
model="gpt-4o",
)
Vercel AI SDK (Next.js)
import { createOpenAI } from '@ai-sdk/openai';
const ferro = createOpenAI({
apiKey: process.env.FERRO_API_KEY,
baseURL: 'http://localhost:8080/v1',
});
Usage
Chat completions
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain LLM routing in one paragraph."},
],
temperature=0.7,
max_tokens=256,
)
print(response.content) # shortcut for choices[0].message.content
print(f"Cost: ${response.usage.cost_usd:.6f}")
print(f"Provider: {response.provider}") # which backend handled it
Streaming
for chunk in client.chat.completions.create(
model="claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": "Write a haiku about Go performance."}],
stream=True,
):
print(chunk.choices[0].delta.content or "", end="", flush=True)
Async
import asyncio
from ferrolabsai import AsyncFerroClient
async def main():
async with AsyncFerroClient(api_key="sk-ferro-...") as client:
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
print(response.content)
asyncio.run(main())
Async streaming:
async def stream_example():
async with AsyncFerroClient(api_key="sk-ferro-...") as client:
async for chunk in await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Count to 5"}],
stream=True,
):
print(chunk.choices[0].delta.content or "", end="", flush=True)
Embeddings
response = client.embeddings.create(
model="text-embedding-3-small",
input=["Ferro routes LLM requests", "across 29 providers"],
)
vectors = [d.embedding for d in response.data]
print(f"Embedding dimensions: {len(vectors[0])}")
Image generation
response = client.images.generate(
model="dall-e-3",
prompt="A futuristic AI gateway routing data streams across glowing servers",
size="1024x1024",
quality="hd",
)
print(response.data[0].url)
Model catalog
# Browse all 2,500+ models
models = client.models.list()
# Filter by provider
anthropic_models = client.models.list(provider="anthropic")
# Filter by capability
vision_models = client.models.list(capability="vision")
# Pricing for a specific model
info = client.models.retrieve("gpt-4o")
print(f"Context window: {info.context_window:,} tokens")
print(f"Input: ${info.input_cost_per_token * 1_000_000:.2f}/M tokens")
print(f"Output: ${info.output_cost_per_token * 1_000_000:.2f}/M tokens")
Ferro extras: templates & route tags
The SDK passes two Ferro-specific fields on chat.completions.create(...):
template_id + template_variables — render a server-side prompt template at request time. Templates are defined in your gateway config and use Go text/template syntax ({{.variable_name}}):
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "I can't log in"}],
template_id="support-agent",
template_variables={
"product": "Acme SaaS",
"plan": "Pro",
"date": "2026-04-09",
},
)
route_tag — override the routing strategy for a single request. Maps to a conditional rule in your gateway config:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
route_tag="low-cost", # e.g. forces fallback to cheaper providers
)
Both fields are silently ignored by any OpenAI-compatible backend that doesn't understand them, so it's safe to keep them in shared code paths.
Observability
Every ChatCompletion includes fields that tell you what the gateway actually did — no extra API calls, no log scraping:
| Field | Type | Source |
|---|---|---|
response.provider |
str |
Which upstream provider served the request (e.g. "openai", "anthropic") |
response.trace_id |
str |
Correlates this request with gateway logs |
response.latency_ms |
int |
End-to-end gateway latency |
response.usage.cost_usd |
float |
Computed cost in USD |
response.usage.cache_hit |
bool |
Whether the response came from the gateway's semantic cache |
response.usage.prompt_tokens / completion_tokens / total_tokens |
int |
Standard OpenAI token counts |
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
print(f"trace={response.trace_id} provider={response.provider} "
f"latency={response.latency_ms}ms cost=${response.usage.cost_usd:.6f}")
To dig deeper into a specific request, use client.admin.logs.list(trace_id=...) — see Admin API.
Configuration
FerroClient and AsyncFerroClient accept the same keyword arguments:
client = FerroClient(
api_key="sk-ferro-...", # or FERRO_API_KEY env var
base_url="http://localhost:8080", # or FERRO_BASE_URL env var
timeout=120.0, # seconds (default: 120.0)
max_retries=2, # retries on connection errors (default: 2)
default_headers={"x-env": "prod"}, # merged into every request
http_client=my_httpx_client, # bring your own httpx.Client
)
Retries are triggered only by httpx.ConnectError and httpx.TimeoutException — HTTP errors (4xx/5xx) propagate immediately as typed exceptions so you can handle them yourself.
Bring-your-own httpx client lets you configure proxies, custom TLS, connection pool limits, or instrumentation middleware and reuse that across the SDK:
import httpx
pooled = httpx.Client(limits=httpx.Limits(max_connections=50))
client = FerroClient(api_key="sk-ferro-...", http_client=pooled)
Close the client explicitly when you're done (or use a with block):
with FerroClient(api_key="sk-ferro-...") as client:
...
# or
client = FerroClient(api_key="sk-ferro-...")
try:
...
finally:
client.close()
Error handling
from ferrolabsai import (
FerroClient,
FerroAuthError,
FerroRateLimitError,
FerroNotFoundError,
FerroServerError,
FerroConnectionError,
)
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
except FerroAuthError:
print("Invalid API key — check FERRO_API_KEY")
except FerroRateLimitError:
print("Rate limit hit — back off and retry")
except FerroNotFoundError:
print("Model or endpoint not found")
except FerroServerError as e:
print(f"Gateway error {e.status_code} — upstream provider may be down")
except FerroConnectionError:
print("Cannot reach gateway — is it running?")
All HTTP-level exceptions inherit from FerroAPIError and expose .status_code, .code, .message, and .request_id. FerroConnectionError and FerroStreamError inherit from FerroError directly.
Admin API (OSS gateway)
These APIs are available on any self-hosted Ferro Labs AI Gateway instance. Requires an admin-scoped API key.
The admin namespace mirrors the OSS gateway's /admin/* HTTP surface defined in internal/admin/handlers.go.
API keys
# Create
new_key = client.admin.keys.create(
name="backend-service",
scopes=["admin"],
)
print(new_key.key) # full key value — shown ONCE, store it securely
# List
keys = client.admin.keys.list()
# Per-key usage counts (sorted by usage by default)
usage = client.admin.keys.usage(limit=20)
# Revoke — keeps the record for audit, invalidates the key immediately
client.admin.keys.revoke("key_id")
# Rotate — atomically invalidates old, returns new
rotated = client.admin.keys.rotate("key_id")
# Permanently delete the record
client.admin.keys.delete("key_id")
Gateway routing config
The OSS gateway has a single active routing config. Use history() to inspect prior versions and rollback(version) to revert. Updates are zero-downtime hot reloads.
# Read the current config
cfg = client.admin.config.get()
print(cfg.strategy) # e.g. {"mode": "fallback"}
print(cfg.targets) # list of {virtual_key, weight, ...}
# Replace it (PUT) — hot reload, no restart
client.admin.config.update({
"strategy": {"mode": "fallback"},
"targets": [
{"virtual_key": "openai", "weight": 1},
{"virtual_key": "anthropic", "weight": 1},
{"virtual_key": "groq", "weight": 1},
],
"plugins": [
{"name": "cache", "enabled": True},
{"name": "logger", "enabled": True},
],
})
# Inspect history and roll back
history = client.admin.config.history()
client.admin.config.rollback(history[-2].version)
Request logs
The gateway logs every request (when the logger plugin is enabled). Query, aggregate, and prune via client.admin.logs.
# Recent failures
errors = client.admin.logs.list(limit=20, stage="on_error")
for entry in errors["data"]:
print(entry["trace_id"], entry["model"], entry["provider"])
# Aggregate stats
stats = client.admin.logs.stats()
# Prune old entries
client.admin.logs.delete(before="2026-01-01T00:00:00Z")
Providers, plugins, dashboard
providers = client.admin.providers.list() # registered LLM providers
plugins = client.admin.plugins.list() # installed gateway plugins
dashboard = client.admin.dashboard() # high-level counts
health = client.admin.health() # gateway health check
Development
git clone https://github.com/ferro-labs/ferrolabs-python-sdk
cd ferrolabs-python-sdk
make install # editable install with dev dependencies
make test # pytest (all HTTP is mocked — no gateway needed)
make lint # ruff + mypy
make format # ruff format
make build # build sdist + wheel into dist/
make clean # remove artifacts
All 30 tests run in under a second against pytest-httpx fixtures, so no network or running gateway is required.
See CHANGELOG.md for release history.
License
Apache 2.0 — see LICENSE.
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ferrolabsai-0.1.0.tar.gz.
File metadata
- Download URL: ferrolabsai-0.1.0.tar.gz
- Upload date:
- Size: 43.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a58c3a7dccc38b4e76853f8aec34611d4cbcec6a922e96380d6781f41682f70
|
|
| MD5 |
7ac5b013f7fd6b5096004dcb9bd57484
|
|
| BLAKE2b-256 |
80f4a30297acc7081182505860f1ca56186f2318b0f1e743b1c7721cff5c8b28
|
Provenance
The following attestation bundles were made for ferrolabsai-0.1.0.tar.gz:
Publisher:
ci.yml on ferro-labs/ferrolabs-python-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ferrolabsai-0.1.0.tar.gz -
Subject digest:
1a58c3a7dccc38b4e76853f8aec34611d4cbcec6a922e96380d6781f41682f70 - Sigstore transparency entry: 1271308280
- Sigstore integration time:
-
Permalink:
ferro-labs/ferrolabs-python-sdk@7e5355f74fdcdb5c373c0c3ad4150d998953ce8c -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ferro-labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@7e5355f74fdcdb5c373c0c3ad4150d998953ce8c -
Trigger Event:
push
-
Statement type:
File details
Details for the file ferrolabsai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ferrolabsai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 28.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce1fdce1d20aec042cdaa5d53b40d444e9dadb768a2623d0bc5169ccd8b08577
|
|
| MD5 |
c82b88ab44e238b53f38aa95524e539a
|
|
| BLAKE2b-256 |
583cdcbd8344c8326afb1aa9b37ae32943b468aca90688becd10c42e0ad29ff1
|
Provenance
The following attestation bundles were made for ferrolabsai-0.1.0-py3-none-any.whl:
Publisher:
ci.yml on ferro-labs/ferrolabs-python-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ferrolabsai-0.1.0-py3-none-any.whl -
Subject digest:
ce1fdce1d20aec042cdaa5d53b40d444e9dadb768a2623d0bc5169ccd8b08577 - Sigstore transparency entry: 1271308294
- Sigstore integration time:
-
Permalink:
ferro-labs/ferrolabs-python-sdk@7e5355f74fdcdb5c373c0c3ad4150d998953ce8c -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ferro-labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@7e5355f74fdcdb5c373c0c3ad4150d998953ce8c -
Trigger Event:
push
-
Statement type: