A KServe implementation of the Open Inference Protocol
Project description
aiSSEMBLE™ Open Inference Protocol KServe
The Open Inference Protocol (OIP) specification defines a standard protocol for performing machine learning model inference across serving runtimes for different ML frameworks. This Python application can be leveraged to deploy KServe that are compatible with the Open Inference Protocol.
Installation
Add aissemble-open-inference-protocol-kserve to an application
pip install aissemble-open-inference-protocol-kserve
Usage
Prerequisite
In order to stand up KServe Using aiSSEMBLE Open Inference Protocol, user should make sure all infrastructure/environment for KServe is set up using the official Documentation. Once KServe environment is set up, user can proceed with implementing custom handler for KServe using aiSSEMBLE Open Inference Protocol.
Implementing a Handler
To make a custom handler to integrate with Kserve, create your class and extend the DataplaneHandler. Then, implement methods based on the model such as load and infer.
Example of Usage with a Handler
Create your custom handler class with:
from typing import Optional
from aissemble_open_inference_protocol_shared.handlers.model_handler import (
ModelHandler,
)
from aissemble_open_inference_protocol_shared.types.dataplane import (
InferenceRequest,
InferenceResponse,
ModelMetadataResponse,
MetadataTensor,
Datatype,
)
class MyHandler(ModelHandler):
def __init__(self):
super().__init__()
def infer(
self,
payload: InferenceRequest,
model_name: str,
model_version: Optional[str] = None,
) -> InferenceResponse:
return InferenceResponse(
model_name=model_name, model_version=model_version, id="id", outputs=[]
)
def model_metadata(
self,
model_name: str,
model_version: Optional[str] = None,
) -> ModelMetadataResponse:
# Return a stub ModelMetadataResponse
return ModelMetadataResponse(
name=model_name,
versions=[model_version] if model_version else None,
platform="python",
inputs=[MetadataTensor(name="input", datatype=Datatype.FP32, shape=[1])],
outputs=[
MetadataTensor(name="output", datatype=Datatype.FP32, shape=[1])
],
)
def model_load(self, model_name: str) -> bool:
# Do some model loading
return True
You can now use this handler to create the AissembleOIPKServe class to be loaded into the Kserve inferencing server Example Kserve inferencing server
from aissemble_open_inference_protocol_kserve.aissemble_oip_kserve import (
AissembleOIPKServe,
)
if __name__ == "__main__":
model_name = "my_model"
oip_kserve = AissembleOIPKServe(name=model_name, model_handler=MyHandler())
# load() should be called before start server.
# which will call the handler's model_load() to ensure model is loaded
oip_kserve.load()
oip_kserve.start_server()
You are now ready to containerize the app and pass it to the Kserve Kubernetes resources.
Configurations
There are several configurations available that affect the server. These can be implemented via container arguments (passed through InferenceService YAML in the args field), environment variables, or Krausening properties file oip.properties.
| Configuration Name | Container Argument | Environment Variable | Default Value | Description |
|---|---|---|---|---|
kserve_http_port |
--http_port |
KSERVE_HTTP_PORT |
8080 | The HTTP Port listened to by the model server |
kserve_grpc_port |
--grpc_port |
KSERVE_GRPC_PORT |
8081 | The gRPC Port listened to by the model server |
kserve_workers |
--workers |
KSERVE_WORKERS |
1 | The number of uvicorn workers for multi-processing |
kserve_max_threads |
--max_threads |
KSERVE_MAX_THREADS |
4 | The max number of gRPC processing threads |
kserve_max_asyncio_workers |
--max_asyncio_workers |
KSERVE_MAX_ASYNCIO_WORKERS |
None | The max number of asyncio workers to spawn |
kserve_enable_grpc |
--enable_grpc |
KSERVE_ENABLE_GRPC |
True | Enable gRPC for the model server |
kserve_enable_docs_url |
--enable_docs_url |
KSERVE_ENABLE_DOCS_URL |
False | Enable docs url '/docs' to display Swagger UI |
kserve_enable_latency_logging |
--enable_latency_logging |
KSERVE_ENABLE_LATENCY_LOGGING |
True | Enable a log line per request with preprocess/predict/postprocess latency metrics |
kserve_access_log_format |
--access_log_format |
KSERVE_ACCESS_LOG_FORMAT |
None | The asgi access logging format. It allows to override only the uvicorn.access's format configuration with a richer set of fields |
Configuration Precedence
Configuration values are resolved in the following order of precedence (highest to lowest):
- Container arguments (e.g.,
--http_port=9000passed via InferenceService YAMLargsfield) - Environment variables (e.g.,
KSERVE_HTTP_PORT=9000) - Krausening properties (e.g.,
kserve_http_port=9000inoip.properties) - Default values (as shown in the table above)
Additional configuration options may be available via container arguments or environment variables. See the KServe documentation for more details.
Examples
For working examples, refer to the Examples documentation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aissemble_open_inference_protocol_kserve-1.1.0.tar.gz.
File metadata
- Download URL: aissemble_open_inference_protocol_kserve-1.1.0.tar.gz
- Upload date:
- Size: 10.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.11.4 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26922706a1f2a890a0340a86ac9712bbfc893a291c284fef13f66cbf867f0c3a
|
|
| MD5 |
1c8399d48b59666cdb02a842143d42e9
|
|
| BLAKE2b-256 |
166cae35c02e861e4a6640927f0a2e3c6659546e8fca8e4ec0e0139936846dbf
|
File details
Details for the file aissemble_open_inference_protocol_kserve-1.1.0-py3-none-any.whl.
File metadata
- Download URL: aissemble_open_inference_protocol_kserve-1.1.0-py3-none-any.whl
- Upload date:
- Size: 13.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.11.4 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b98932366333ead1a260dede820b782dde0a6a4aaf4b9f126d1f72c2bcf8fa9
|
|
| MD5 |
c3fea75a269a153a9f4ceb963b175500
|
|
| BLAKE2b-256 |
dda7a4a58a21a00cc68d7af65cfefb592d02c80fa8289148cbdccd30399ee762
|