A gRPC implementation of the Open Inference Protocol
Project description
aiSSEMBLE™ Open Inference Protocol gRPC
The Open Inference Protocol (OIP) specification defines a standard protocol for performing machine learning model inference across serving runtimes for different ML frameworks. This Python application can be leveraged to create a gRPC server that is compatible with the Open Inference Protocol. It handles standing up and tearing down the server so you only need to worry about the inferencing functionality.
Installation
Add aissemble-open-inference-protocol-grpc to an application
pip install aissemble-open-inference-protocol-grpc
Usage
Creating the Server
Use aissemble-open-inference-protocol-grpc to create a gRPC server by creating a file main.py with:
import asyncio
from aissemble_open_inference_protocol_grpc.aissemble_oip_grpc import AissembleOIPgRPC
grpc = AissembleOIPgRPC()
if __name__ == '__main__':
asyncio.run(grpc.start_server())
The gRPC server will come up after a few seconds and will be OIP compliant. The proto specifications can be found in the grpc_inference_service.proto file.
Implementing the Endpoints Handler
By default, most of the gRPC endpoints will return a Method Not Implemented. You can implement these functions by creating a custom handler extending DataplaneHandler.
[!NOTE] All incoming
InferenceRequestand outgoingInferenceResponseobjects will be automatically validated against their declared tensor shapes and datatypes. Any discrepancy will raise an error and abort the call.
Example:
from typing import Optional
from aissemble_open_inference_protocol_shared.handlers.model_handler import (
ModelHandler,
)
from aissemble_open_inference_protocol_shared.types.dataplane import (
Datatype,
InferenceRequest,
InferenceResponse,
ModelMetadataResponse,
MetadataTensor,
)
class MyHandler(ModelHandler):
def __init__(self):
super().__init__()
def infer(
self,
payload: InferenceRequest,
model_name: str,
model_version: Optional[str] = None,
) -> InferenceResponse:
return InferenceResponse(
model_name=model_name, model_version=model_version, id="id", outputs=[]
)
def model_metadata(
self,
model_name: str,
model_version: Optional[str] = None,
) -> ModelMetadataResponse:
return ModelMetadataResponse(
name=model_name,
versions=[model_version] if model_version else None,
platform="python",
inputs=[MetadataTensor(name="input", datatype=Datatype.FP32, shape=[1])],
outputs=[
MetadataTensor(name="output", datatype=Datatype.FP32, shape=[1])
],
)
def model_load(self, model_name: str) -> bool:
# Do some model loading
return True
Use aissemble-open-inference-protocol-grpc to create a gRPC server and pass it MyHandler
from aissemble_open_inference_protocol_grpc.aissemble_oip_grpc import AissembleOIPgRPC
grpc = AissembleOIPgRPC(MyHandler())
Now when starting the server, the inference requests will route to the handler.
Configuration
There are several configurations available that affect the server. These can be implemented via Krausening properties file oip.properties or environment variables.
| Configuration Name | Environment Variable | Default Value | Description |
|---|---|---|---|
grpc_host |
GRPC_HOST |
0.0.0.0 | The host the grpc server will start on |
grpc_port |
GRPC_PORT |
8081 | The port the grpc server will start on |
grpc_workers |
GRPC_WORKERS |
3 | Number of workers to be used by the server to execute non-AsyncIO RPC handlers |
auth_enabled |
AUTH_ENABLED |
true | Whether authentication is enabled for the server. Strongly recommend enabling for higher environments |
auth_secret |
AUTH_SECRET |
None | The secret key used to decode jwt token |
auth_algorithm |
AUTH_ALGORITHM |
HS256 | The algorithm used to decode jwt tokens |
pdp_url |
OIP_PDP_URL |
http://localhost:8080/pdp | The URL of the Policy Decision Point (PDP) used for authorization checks |
Examples
For working examples, refer to the Examples documentation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aissemble_open_inference_protocol_grpc-1.1.0.tar.gz.
File metadata
- Download URL: aissemble_open_inference_protocol_grpc-1.1.0.tar.gz
- Upload date:
- Size: 19.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.11.4 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f72fe001b3ef830671b6e259aa59e08fd109364b5df4f9d8b7104366f0dd2ad
|
|
| MD5 |
1c422c52b11537aad9a112847237efef
|
|
| BLAKE2b-256 |
68abda93175a38421858a075113db4b19f269661705f39fa0b1c68ce802fed55
|
File details
Details for the file aissemble_open_inference_protocol_grpc-1.1.0-py3-none-any.whl.
File metadata
- Download URL: aissemble_open_inference_protocol_grpc-1.1.0-py3-none-any.whl
- Upload date:
- Size: 26.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.11.4 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76ef578a9b847cf6997f36887fbed6cc9dd07cf7eac2676d1bde4bbb878e027a
|
|
| MD5 |
455b54f967fbec2b7e2d1520632eee7b
|
|
| BLAKE2b-256 |
540eb21369897340ad5427829241e8e17b4bc0359d149590a1120923150995c5
|