Openvino runtime for MLServer
Project description
Overview
This package provides a MLServer runtime compatible with Openvino. This package has couple features:
- If server detect that model file is onnx format script will auto convert to openvino format (xml, bin) with dynamic batch size for openvino.
- Openvino dynamic batch size
- Grpc Ready
- V2 Inference Protocol
- Models metrics
Why MLserver?
For serving Openvino I choose MLServer because this framework has V2 Inference Protocol (https://kserve.github.io/website/modelserving/inference_api/), grpc and metrics out of the box.
Install
pip install mlserver mlserver-openvino
Content Types
If no content type is present on the request or metadata, the Openvino runtime will try to decode the payload as a NumPy Array. To avoid this, either send a different content type explicitly, or define the correct one as part of your model’s metadata.
Models repository
Your models add to models folder. Accepted files: ["model.xml", "model.onnx"]
/example
/models/your-model-name/
/tests
setup.py
README.md
Training and serve example: https://mlserver.readthedocs.io/en/latest/examples/sklearn/README.html
Metrics
For download metrics (prometheus) use below links
GET http://<your-endpoint>/metrics
GET http://0.0.0.0:8080/metrics
Start docker server
# Build docker image
mlserver build . -t test
# Start server and pass mlserevr_models_dir
docker run -it --rm -e MLSERVER_MODELS_DIR=/opt/mlserver/models/ -p 8080:8080 -p 8081:8081 test
Example queries:
For example script see below files:
/example/grpc-example.py
/example/rest-example.py
Kserve usage
- First create one time kserve runtime from file: kserve/cluster-runtime.yaml
- Create InferenceService from template:
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "my-openvino-model"
spec:
predictor:
model:
modelFormat:
name: openvino
runtime: kserve-mlserver-openvino
#storageUri: "gs://kfserving-examples/models/xgboost/iris"
storageUri: https://github.com/myrepo/models/mymodel.joblib?raw=true
Example model-settings.json
{
"name": "mnist-onnx-openvino",
"implementation": "mlserver_openvino.OpenvinoRuntime",
"parameters": {
"uri": "./model.onnx",
"version": "v0.1.0",
"extra": {
"transform": [
{
"name": "Prepare Metadata",
"pipeline_file_path": "./pipeline.cloudpickle",
"input_index": 0
}
]
}
},
"inputs": [
{
"name": "input-0",
"datatype": "FP32",
"shape": [28,28,1]
}
],
"outputs": [
{
"name": "output",
"datatype": "FP32",
"shape": [10]
}
]
}
Transformers
If you add transformer pipeline in extra properties you should dump code in same python version as execute mlserver
Tests
make test
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mlserver_openvino-0.4.10-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b386e51db0ae13517693d6ee686e8277e8e468cc2db531817749040f552819a5 |
|
MD5 | 1860effb37ce0f4109dfcf9da73a63ff |
|
BLAKE2b-256 | c642eb11b5fff61ffcb5c4c922caed6314d0d2f1d4c4063d086671bd733fbae4 |