Skip to main content

Official Python SDK for FlexInference - a deadline-aware, OpenAI-compatible inference router.

Project description

FlexInference (Python)

The official Python SDK for FlexInference - a deadline-aware, OpenAI-compatible inference router. Send the OpenAI requests you already send, bring your own OpenAI key, and set one required field - start_within - to trade latency for cost.

pip install flexinference

Quickstart

from flexinference import FlexInference, output_text

client = FlexInference(api_key="flex_live_...")

res = client.responses.create({
    "model": "gpt-5.5",
    "input": "Write a haiku about cheap GPUs.",
    "start_within": "00h-00m-30s",
})

print(output_text(res))

Responses come back as the raw OpenAI JSON (we never reshape the body), so there is no output_text field on the wire - that is computed by OpenAI's own SDKs. output_text(res) pulls the assistant's text out of either a response or a chat completion for you.

start_within is required on every request. It takes "default", "priority", "auto", or a duration "HHh-MMm-SSs" (5s-10m). The duration races OpenAI's flex tier on a flex-capable model and falls back to standard if it can't start in time; "default", "priority", and "auto" map to those OpenAI service tiers and proxy any model. See the docs.

Streaming

stream = client.responses.create(
    {"model": "gpt-5-nano", "input": "Count to ten.", "start_within": "00h-00m-20s"},
    stream=True,
)
for event in stream:
    if event.get("type") == "response.output_text.delta":
        print(event["delta"], end="")

Chat Completions

res = client.chat.completions.create({
    "model": "gpt-5.5",
    "messages": [{"role": "user", "content": "Hello!"}],
    "start_within": "default",
})
print(res["choices"][0]["message"]["content"])

Closing the client

The client holds a pooled httpx.Client, so close it when you're done to release connections. Use it as a context manager:

with FlexInference(api_key="flex_live_...") as client:
    res = client.responses.create({"model": "gpt-5.5", "input": "Hi.", "start_within": "default"})
    print(output_text(res))
# connections are released on exit

Or close it yourself:

client = FlexInference(api_key="flex_live_...")
try:
    ...
finally:
    client.close()

Request validation

Before a request leaves your machine, the SDK validates the parts it owns. start_within is required and must be "default", "priority", "auto", or a duration "HHh-MMm-SSs" between 5s and 10m; model and input/messages must be present. A missing or bad value raises a ValueError locally instead of making a round trip to a provider 400:

client.responses.create({"model": "gpt-5.5", "input": "hi"})
# ValueError: Invalid request body:
#   Missing required parameter: `start_within`. Set it to "default", "priority", "auto", or a duration "HHh-MMm-SSs".

Validation is request-only. Unknown fields pass straight through to the provider (so new OpenAI parameters keep working), and responses are never validated or reshaped.

Errors

Non-2xx responses raise FlexInferenceError, carrying the OpenAI-shaped status, type, code, and param:

from flexinference import FlexInferenceError

try:
    client.responses.create({"model": "gpt-5.5", "input": "hi", "start_within": "priority"})
except FlexInferenceError as err:
    if err.code == "no_byok_key":
        print("Add your OpenAI key in the dashboard.")
    else:
        raise

Configuration

Argument Default Description
api_key (required) Your flex_live_ key.
base_url https://api.flexinference.com/v1 Override the router endpoint.
client httpx.Client (600s read, 10s connect) Provide your own httpx.Client.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flexinference-1.0.0.tar.gz (46.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flexinference-1.0.0-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file flexinference-1.0.0.tar.gz.

File metadata

  • Download URL: flexinference-1.0.0.tar.gz
  • Upload date:
  • Size: 46.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.15

File hashes

Hashes for flexinference-1.0.0.tar.gz
Algorithm Hash digest
SHA256 9c0409214a39ce7ec910c30abd4cccdda6d8ef1e6bcf21be9943f0c895fc1d32
MD5 a84ad449dce7bfe2d70155497af5ddd1
BLAKE2b-256 02c9a21ee1509f204d507591bc9a4f13a0de919e3d9de741d017b079eb8f1761

See more details on using hashes here.

File details

Details for the file flexinference-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for flexinference-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 271611dc27934fb1024ea71e3c9a832f7f29e6fe50c07019b63836b11b5f6350
MD5 1a7e95e144522ea049f3137c5cdf9e2e
BLAKE2b-256 10745a0d116154130325f77f7147984afb6d1cf7243fb150ee4f3e5382720c82

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page