Skip to main content

A FastAPI implementation of the Open Inference Protocol

Project description

aiSSEMBLE Open Inference Protocol™ FastAPI

The Open Inference Protocol (OIP) specification defines a standard protocol for performing machine learning model inference across serving runtimes for different ML frameworks. This Python application can be leveraged to create FastAPI routes that are compatible with the Open Inference Protocol.

Installation

Add aissemble-open-inference-protocol-fastapi to an application

pip install aissemble-open-inference-protocol-fastapi

Usage

Use aissemble-open-inference-protocol-fastapi to create a FastAPI app by creating a file main.py with

from aissemble_open_inference_protocol_fastapi.aissemble_oip_fastapi import AissembleOIPFastAPI

fastapi_server = AissembleOIPFastAPI().server

The server will now have a complete set of Open Inference Protocol compatible routes! Ensure you have the fastapi cli tools installed (pip install "fastapi[standard]"), then run with:

fastapi dev main.py

View the routes by going to http://127.0.0.1:8000/docs.

Implementing a Handler

The endpoints will call a default handler that will return 501 Not Implemented. To make a handler, create your class and extend the abstract base method dataplane.py. Then pass your class into the AissembleOIPFastAPI constructor.

_Note: All incoming InferenceRequest and outgoing InferenceResponse objects will be automatically validated against their declared tensor shapes and datatypes. Any discrepancy will raise an error and abort the call.

Example of Usage with a Handler

Create your custom handler class with:

from typing import Optional

from aissemble_open_inference_protocol_shared.handlers.dataplane import (
    DataplaneHandler,
)
from aissemble_open_inference_protocol_shared.types.dataplane import (
    InferenceRequest,
    InferenceResponse,
    ModelMetadataResponse,
    MetadataTensor,
    ModelReadyResponse,
    Datatype,
)


class MyHandler(DataplaneHandler):
    def __init__(self):
        super().__init__()

    def infer(
            self,
            payload: InferenceRequest,
            model_name: str,
            model_version: Optional[str] = None,
    ) -> InferenceResponse:
        return InferenceResponse(
            model_name=model_name, model_version=model_version, id="id", outputs=[]
        )

    def model_metadata(
            self,
            model_name: str,
            model_version: Optional[str] = None,
    ) -> ModelMetadataResponse:
        # Return a stub ModelMetadataResponse
        return ModelMetadataResponse(
            name=model_name,
            versions=[model_version] if model_version else None,
            platform="python",
            inputs=[MetadataTensor(name="input", datatype=Datatype.FP32, shape=[1])],
            outputs=[
                MetadataTensor(name="output", datatype=Datatype.FP32, shape=[1])
            ],
        )

    def model_ready(
            self,
            model_name: str,
            model_version: Optional[str] = None,
    ) -> ModelReadyResponse:
        # Testing: always ready
        return ModelReadyResponse(name=model_name, ready=True)

Use aissemble-open-inference-protocol-fastapi to create a FastAPI app and pass it MyHandler

from aissemble_open_inference_protocol_fastapi.aissemble_oip_fastapi import AissembleOIPFastAPI

fastapi_server = AissembleOIPFastAPI(MyHandler).server

Now when starting the FastAPI server, the inference request will route to MyHandler.infer()

Configurations

There are several configurations available that affect the server. These can be implemented via Krausening properties file oip.properties or environment variables.

Configuration Name Environment Variable Default Value Description
fastapi_host FASTAPI_HOST 127.0.0.1 The host the fastapi server will run on
fastapi_port FASTAPI_PORT 8082 The port the fastapi server will run on
fastapi_reload FASTAPI_RELOAD True Whether Uvicorn should reload on changes
auth_enabled AUTH_ENABLED true Whether authentication is enabled for the server. Strongly recommend enabling for higher environments
auth_secret AUTH_SECRET None The secret key used to decode jwt token
auth_algorithm AUTH_ALGORITHM HS256 The algorithm used to decode jwt tokens
pdp_url OIP_PDP_URL http://localhost:8080/pdp The URL of the Policy Decision Point (PDP) used for authorization checks

Examples

For working examples, refer to the Examples documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file aissemble_open_inference_protocol_fastapi-1.0.1.tar.gz.

File metadata

File hashes

Hashes for aissemble_open_inference_protocol_fastapi-1.0.1.tar.gz
Algorithm Hash digest
SHA256 01c61dc299ee9feb26d964ad3bbbb901e320a2e1109bbafc60b2b0d7dcc0fab4
MD5 8fc558cacfdde318e224eaeaa238d754
BLAKE2b-256 0d4152e8b111f3fd50676d58e9eb0790fc1a85a9ec3f9517576e16ae4df8d4ec

See more details on using hashes here.

File details

Details for the file aissemble_open_inference_protocol_fastapi-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for aissemble_open_inference_protocol_fastapi-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 184b60cb2591a52693777d37edb754a817fe2ca34443bef42915391617773e30
MD5 7e32bf1dce5928506a0ca4bf91ef7760
BLAKE2b-256 e6b8ca82b0466d4d273873b242c1f20f200fe3cb9499550ae6e93fc03737abc6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page