Skip to main content

A FastAPI implementation of the Open Inference Protocol

Project description

aiSSEMBLE™ Open Inference Protocol FastAPI

PyPI - Version PyPI - Python Version PyPI - Format PyPI - Downloads Build (github)

The Open Inference Protocol (OIP) specification defines a standard protocol for performing machine learning model inference across serving runtimes for different ML frameworks. This Python application can be leveraged to create FastAPI routes that are compatible with the Open Inference Protocol.

Installation

Add aissemble-open-inference-protocol-fastapi to an application

pip install aissemble-open-inference-protocol-fastapi

Usage

Use aissemble-open-inference-protocol-fastapi to create a FastAPI app by creating a file main.py with

from aissemble_open_inference_protocol_fastapi.aissemble_oip_fastapi import AissembleOIPFastAPI

fastapi_server = AissembleOIPFastAPI().server

The server will now have a complete set of Open Inference Protocol compatible routes! Ensure you have the fastapi cli tools installed (pip install "fastapi[standard]"), then run with:

fastapi dev main.py

View the routes by going to http://127.0.0.1:8000/docs.

Implementing a Handler

The endpoints will call a default handler that will return 501 Not Implemented. To make a handler, create your class and extend the abstract base method dataplane.py. Then pass your class into the AissembleOIPFastAPI constructor.

_Note: All incoming InferenceRequest and outgoing InferenceResponse objects will be automatically validated against their declared tensor shapes and datatypes. Any discrepancy will raise an error and abort the call.

Example of Usage with a Handler

Create your custom handler class with:

from typing import Optional

from aissemble_open_inference_protocol_shared.handlers.model_handler import (
    ModelHandler,
)
from aissemble_open_inference_protocol_shared.types.dataplane import (
    InferenceRequest,
    InferenceResponse,
    ModelMetadataResponse,
    MetadataTensor,
    Datatype,
)


class MyHandler(ModelHandler):
    def __init__(self):
        super().__init__()

    def infer(
            self,
            payload: InferenceRequest,
            model_name: str,
            model_version: Optional[str] = None,
    ) -> InferenceResponse:
        return InferenceResponse(
            model_name=model_name, model_version=model_version, id="id", outputs=[]
        )

    def model_metadata(
            self,
            model_name: str,
            model_version: Optional[str] = None,
    ) -> ModelMetadataResponse:
        # Return a stub ModelMetadataResponse
        return ModelMetadataResponse(
            name=model_name,
            versions=[model_version] if model_version else None,
            platform="python",
            inputs=[MetadataTensor(name="input", datatype=Datatype.FP32, shape=[1])],
            outputs=[
                MetadataTensor(name="output", datatype=Datatype.FP32, shape=[1])
            ],
        )

    def model_load(self, model_name: str) -> bool:
        # Do some model loading
        return True

Use aissemble-open-inference-protocol-fastapi to create a FastAPI app and pass it MyHandler

from aissemble_open_inference_protocol_fastapi.aissemble_oip_fastapi import AissembleOIPFastAPI

fastapi_server = AissembleOIPFastAPI(MyHandler()).server

Now when starting the FastAPI server, the inference request will route to MyHandler.infer()

Configurations

There are several configurations available that affect the server. These can be implemented via Krausening properties file oip.properties or environment variables.

Configuration Name Environment Variable Default Value Description
fastapi_host FASTAPI_HOST 127.0.0.1 The host the fastapi server will run on
fastapi_port FASTAPI_PORT 8082 The port the fastapi server will run on
fastapi_reload FASTAPI_RELOAD True Whether Uvicorn should reload on changes
auth_enabled AUTH_ENABLED true Whether authentication is enabled for the server. Strongly recommend enabling for higher environments
auth_secret AUTH_SECRET None The secret key used to decode jwt token
auth_algorithm AUTH_ALGORITHM HS256 The algorithm used to decode jwt tokens
pdp_url OIP_PDP_URL http://localhost:8080/pdp The URL of the Policy Decision Point (PDP) used for authorization checks

Examples

For working examples, refer to the Examples documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file aissemble_open_inference_protocol_fastapi-1.1.0.tar.gz.

File metadata

File hashes

Hashes for aissemble_open_inference_protocol_fastapi-1.1.0.tar.gz
Algorithm Hash digest
SHA256 2803947394a253143e25a303a8ba2eeefda6cdeed126943727b8513c99d3d312
MD5 2ce552736226271819e4c63e7034fad4
BLAKE2b-256 154daaed447c28e5e3a0b1d354191f60e900d41dd6d05ea726819f028c4fd492

See more details on using hashes here.

File details

Details for the file aissemble_open_inference_protocol_fastapi-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for aissemble_open_inference_protocol_fastapi-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e9499041892c607d7b8119be954a5de1a6dcc300d6124a6d2c5723e64c187f55
MD5 57d1f2dc7fb14fce5a4ce51b49878983
BLAKE2b-256 b352442ea704fba774c0617b771fe3f07c287c0f42a35048392d729d97f4e175

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page