Skip to main content

A gRPC implementation of the Open Inference Protocol

Project description

aiSSEMBLE™ Open Inference Protocol gRPC

PyPI - Version PyPI - Python Version PyPI - Format PyPI - Downloads Build (github)

The Open Inference Protocol (OIP) specification defines a standard protocol for performing machine learning model inference across serving runtimes for different ML frameworks. This Python application can be leveraged to create a gRPC server that is compatible with the Open Inference Protocol. It handles standing up and tearing down the server so you only need to worry about the inferencing functionality.

Installation

Add aissemble-open-inference-protocol-grpc to an application

pip install aissemble-open-inference-protocol-grpc

Usage

Creating the Server

Use aissemble-open-inference-protocol-grpc to create a gRPC server by creating a file main.py with:

import asyncio
from aissemble_open_inference_protocol_grpc.aissemble_oip_grpc import AissembleOIPgRPC

grpc = AissembleOIPgRPC()

if __name__ == '__main__':
    asyncio.run(grpc.start_server())

The gRPC server will come up after a few seconds and will be OIP compliant. The proto specifications can be found in the grpc_inference_service.proto file.

Implementing the Endpoints Handler

By default, most of the gRPC endpoints will return a Method Not Implemented. You can implement these functions by creating a custom handler extending DataplaneHandler.

[!NOTE] All incoming InferenceRequest and outgoing InferenceResponse objects will be automatically validated against their declared tensor shapes and datatypes. Any discrepancy will raise an error and abort the call.

Example:

from typing import Optional

from aissemble_open_inference_protocol_shared.handlers.model_handler import (
    ModelHandler,
)
from aissemble_open_inference_protocol_shared.types.dataplane import (
    Datatype,
    InferenceRequest,
    InferenceResponse,
    ModelMetadataResponse,
    MetadataTensor,
)


class MyHandler(ModelHandler):
    def __init__(self):
        super().__init__()

    def infer(
            self,
            payload: InferenceRequest,
            model_name: str,
            model_version: Optional[str] = None,
    ) -> InferenceResponse:
        return InferenceResponse(
            model_name=model_name, model_version=model_version, id="id", outputs=[]
        )

    def model_metadata(
            self,
            model_name: str,
            model_version: Optional[str] = None,
    ) -> ModelMetadataResponse:
        return ModelMetadataResponse(
            name=model_name,
            versions=[model_version] if model_version else None,
            platform="python",
            inputs=[MetadataTensor(name="input", datatype=Datatype.FP32, shape=[1])],
            outputs=[
                MetadataTensor(name="output", datatype=Datatype.FP32, shape=[1])
            ],
        )

    def model_load(self, model_name: str) -> bool:
        # Do some model loading
        return True

Use aissemble-open-inference-protocol-grpc to create a gRPC server and pass it MyHandler

from aissemble_open_inference_protocol_grpc.aissemble_oip_grpc import AissembleOIPgRPC

grpc = AissembleOIPgRPC(MyHandler())

Now when starting the server, the inference requests will route to the handler.

Configuration

There are several configurations available that affect the server. These can be implemented via Krausening properties file oip.properties or environment variables.

Configuration Name Environment Variable Default Value Description
grpc_host GRPC_HOST 0.0.0.0 The host the grpc server will start on
grpc_port GRPC_PORT 8081 The port the grpc server will start on
grpc_workers GRPC_WORKERS 3 Number of workers to be used by the server to execute non-AsyncIO RPC handlers
auth_enabled AUTH_ENABLED true Whether authentication is enabled for the server. Strongly recommend enabling for higher environments
auth_secret AUTH_SECRET None The secret key used to decode jwt token
auth_algorithm AUTH_ALGORITHM HS256 The algorithm used to decode jwt tokens
pdp_url OIP_PDP_URL http://localhost:8080/pdp The URL of the Policy Decision Point (PDP) used for authorization checks

Examples

For working examples, refer to the Examples documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file aissemble_open_inference_protocol_grpc-1.1.0.tar.gz.

File metadata

File hashes

Hashes for aissemble_open_inference_protocol_grpc-1.1.0.tar.gz
Algorithm Hash digest
SHA256 1f72fe001b3ef830671b6e259aa59e08fd109364b5df4f9d8b7104366f0dd2ad
MD5 1c422c52b11537aad9a112847237efef
BLAKE2b-256 68abda93175a38421858a075113db4b19f269661705f39fa0b1c68ce802fed55

See more details on using hashes here.

File details

Details for the file aissemble_open_inference_protocol_grpc-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for aissemble_open_inference_protocol_grpc-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 76ef578a9b847cf6997f36887fbed6cc9dd07cf7eac2676d1bde4bbb878e027a
MD5 455b54f967fbec2b7e2d1520632eee7b
BLAKE2b-256 540eb21369897340ad5427829241e8e17b4bc0359d149590a1120923150995c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page