Official Python SDK for FlexInference - a deadline-aware, OpenAI-compatible inference router.
Project description
FlexInference (Python)
The official Python SDK for FlexInference - a deadline-aware, OpenAI-compatible inference router. Send the OpenAI requests you already send, bring your own OpenAI key, and add one field - start_within - to trade latency for cost.
pip install flexinference
Quickstart
from flexinference import FlexInference
client = FlexInference(api_key="flex_live_...")
res = client.responses.create({
"model": "gpt-5.5",
"input": "Write a haiku about cheap GPUs.",
"start_within": "00h-00m-30s",
})
print(res["output_text"])
start_within takes "priority", "standard", or a duration "HHh-MMm-SSs" (5s-10m) that races OpenAI's flex tier and falls back to standard if it can't start in time. See the docs.
Streaming
stream = client.responses.create(
{"model": "gpt-5-nano", "input": "Count to ten.", "start_within": "00h-00m-20s"},
stream=True,
)
for event in stream:
if event.get("type") == "response.output_text.delta":
print(event["delta"], end="")
Chat Completions
res = client.chat.completions.create({
"model": "gpt-5.5",
"messages": [{"role": "user", "content": "Hello!"}],
"start_within": "standard",
})
print(res["choices"][0]["message"]["content"])
Closing the client
The client holds a pooled httpx.Client, so close it when you're done to release connections. Use it as a context manager:
with FlexInference(api_key="flex_live_...") as client:
res = client.responses.create({"model": "gpt-5.5", "input": "Hi."})
print(res["output_text"])
# connections are released on exit
Or close it yourself:
client = FlexInference(api_key="flex_live_...")
try:
...
finally:
client.close()
Errors
Non-2xx responses raise FlexInferenceError, carrying the OpenAI-shaped status, type, code, and param:
from flexinference import FlexInferenceError
try:
client.responses.create({"model": "gpt-5.5", "input": "hi", "start_within": "priority"})
except FlexInferenceError as err:
if err.code == "no_byok_key":
print("Add your OpenAI key in the dashboard.")
else:
raise
Configuration
| Argument | Default | Description |
|---|---|---|
api_key |
(required) | Your flex_live_ key. |
base_url |
https://api.flexinference.com/v1 |
Override the router endpoint. |
client |
httpx.Client with a 600s read timeout |
Provide your own httpx.Client. |
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flexinference-0.1.0.tar.gz.
File metadata
- Download URL: flexinference-0.1.0.tar.gz
- Upload date:
- Size: 34.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
053dfa2f165e43a662a43c738e65a1a4999a2e381d46ed1736c591eb6b26cb00
|
|
| MD5 |
3d648099aa5b5745f3df6e317fbe1225
|
|
| BLAKE2b-256 |
ef8c09c53bd43cfea58d9ecf1bf227ea2a6b516847d66f7e64f337cce135d372
|
File details
Details for the file flexinference-0.1.0-py3-none-any.whl.
File metadata
- Download URL: flexinference-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2da23b007cb4bf25aa5770f361fc0b83e6dd8fa5b83a75d7ba063ab1a96af45b
|
|
| MD5 |
710ccb53bc3089f373a504084775525e
|
|
| BLAKE2b-256 |
14cbb3700b7dfb5c5e1404312da623c1395d7838fec97e80ddf6d89c412e7ad4
|