A FastAPI implementation of the Open Inference Protocol
Project description
aiSSEMBLE Open Inference Protocol™ FastAPI
The Open Inference Protocol (OIP) specification defines a standard protocol for performing machine learning model inference across serving runtimes for different ML frameworks. This Python application can be leveraged to create FastAPI routes that are compatible with the Open Inference Protocol.
Installation
Add aissemble-open-inference-protocol-fastapi to an application
pip install aissemble-open-inference-protocol-fastapi
Usage
Use aissemble-open-inference-protocol-fastapi to create a FastAPI app by creating a file main.py with
from aissemble_open_inference_protocol_fastapi.aissemble_oip_fastapi import AissembleOIPFastAPI
fastapi_server = AissembleOIPFastAPI().server
The server will now have a complete set of Open Inference Protocol compatible routes! Ensure you have the fastapi cli tools installed (pip install "fastapi[standard]"), then run with:
fastapi dev main.py
View the routes by going to http://127.0.0.1:8000/docs.
Implementing a Handler
The endpoints will call a default handler that will return 501 Not Implemented. To make a handler, create your class and extend the abstract base method dataplane.py. Then pass your class into the AissembleOIPFastAPI constructor.
_Note: All incoming InferenceRequest and outgoing InferenceResponse objects will be automatically validated against their declared tensor shapes and datatypes. Any discrepancy will raise an error and abort the call.
Example of Usage with a Handler
Create your custom handler class with:
from typing import Optional
from aissemble_open_inference_protocol_shared.handlers.dataplane import (
DataplaneHandler,
)
from aissemble_open_inference_protocol_shared.types.dataplane import (
InferenceRequest,
InferenceResponse,
ModelMetadataResponse,
MetadataTensor,
ModelReadyResponse,
Datatype,
)
class MyHandler(DataplaneHandler):
def __init__(self):
super().__init__()
def infer(
self,
payload: InferenceRequest,
model_name: str,
model_version: Optional[str] = None,
) -> InferenceResponse:
return InferenceResponse(
model_name=model_name, model_version=model_version, id="id", outputs=[]
)
def model_metadata(
self,
model_name: str,
model_version: Optional[str] = None,
) -> ModelMetadataResponse:
# Return a stub ModelMetadataResponse
return ModelMetadataResponse(
name=model_name,
versions=[model_version] if model_version else None,
platform="python",
inputs=[MetadataTensor(name="input", datatype=Datatype.FP32, shape=[1])],
outputs=[
MetadataTensor(name="output", datatype=Datatype.FP32, shape=[1])
],
)
def model_ready(
self,
model_name: str,
model_version: Optional[str] = None,
) -> ModelReadyResponse:
# Testing: always ready
return ModelReadyResponse(name=model_name, ready=True)
Use aissemble-open-inference-protocol-fastapi to create a FastAPI app and pass it MyHandler
from aissemble_open_inference_protocol_fastapi.aissemble_oip_fastapi import AissembleOIPFastAPI
fastapi_server = AissembleOIPFastAPI(MyHandler).server
Now when starting the FastAPI server, the inference request will route to MyHandler.infer()
Configurations
There are several configurations available that affect the server. These can be implemented via Krausening properties file oip.properties or environment variables.
| Configuration Name | Environment Variable | Default Value | Description |
|---|---|---|---|
fastapi_host |
FASTAPI_HOST |
127.0.0.1 | The host the fastapi server will run on |
fastapi_port |
FASTAPI_PORT |
8082 | The port the fastapi server will run on |
fastapi_reload |
FASTAPI_RELOAD |
True | Whether Uvicorn should reload on changes |
auth_enabled |
AUTH_ENABLED |
true | Whether authentication is enabled for the server. Strongly recommend enabling for higher environments |
auth_secret |
AUTH_SECRET |
None | The secret key used to decode jwt token |
auth_algorithm |
AUTH_ALGORITHM |
HS256 | The algorithm used to decode jwt tokens |
pdp_url |
OIP_PDP_URL |
http://localhost:8080/pdp | The URL of the Policy Decision Point (PDP) used for authorization checks |
Examples
For working examples, refer to the Examples documentation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aissemble_open_inference_protocol_fastapi-1.0.1.tar.gz.
File metadata
- Download URL: aissemble_open_inference_protocol_fastapi-1.0.1.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.4 CPython/3.11.4 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01c61dc299ee9feb26d964ad3bbbb901e320a2e1109bbafc60b2b0d7dcc0fab4
|
|
| MD5 |
8fc558cacfdde318e224eaeaa238d754
|
|
| BLAKE2b-256 |
0d4152e8b111f3fd50676d58e9eb0790fc1a85a9ec3f9517576e16ae4df8d4ec
|
File details
Details for the file aissemble_open_inference_protocol_fastapi-1.0.1-py3-none-any.whl.
File metadata
- Download URL: aissemble_open_inference_protocol_fastapi-1.0.1-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.4 CPython/3.11.4 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
184b60cb2591a52693777d37edb754a817fe2ca34443bef42915391617773e30
|
|
| MD5 |
7e32bf1dce5928506a0ca4bf91ef7760
|
|
| BLAKE2b-256 |
e6b8ca82b0466d4d273873b242c1f20f200fe3cb9499550ae6e93fc03737abc6
|