Tritonserver Mlflow Deployment
Project description
MLflow Tritonserver
MLflow plugin for deploying your models from MLflow to Triton Inference Server. Scripts are included for publishing models, which are in Triton recognized structure, to your MLflow Model Registry.
Supported flavors
MLFlow Tritonserver currently supports the following flavors, you may substitute the flavor specification in the example below according to the model to be deployed.
- onnx
- triton
Requirements
- MLflow
- Triton Python HTTP client
- Triton Inference Server
Installation
The plugin can be installed from source using the following commands
pip install mlflow_tritonserver
Quick Start
In this documentation, we will use the files in examples
to showcase how
the plugin interacts with Triton Inference Server. The onnx_float32_int32_int32
model in examples
is a simple model that takes two float32 inputs, INPUT0 and
INPUT1, with shape [-1, 16], and produces two int32 outputs, OUTPUT0 and
OUTPUT1, where OUTPUT0 is the element-wise summation of INPUT0 and INPUT1 and
OUTPUT1 is the element-wise subtraction of INPUT0 and INPUT1.
Start Triton Inference Server in EXPLICIT mode
The MLflow Triton plugin must work with a running Triton server, see
documentation
of Triton Inference Server for how to start the server. Note that
the server should be run in EXPLICIT mode (--model-control-mode=explicit
)
to exploit the deployment feature of the plugin.
Once the server has started, the following environment must be set so that the plugin can interact with the server properly:
TRITON_URL
: The address to the Triton HTTP endpointTRITON_MODEL_REPO
: The path to the Triton model repository. It can be an s3 URI but keep in
mind that the env vars AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are needed.
Publish models to MLflow
ONNX flavor
The MLFlow ONNX built-in functionalities can be used to publish onnx
flavor
models to MLFlow directly, and the MLFlow Tritonserver will prepare the model
to the format expected by Triton. You may also log
config.pbtxt
as additional artifact which Triton will be used to serve the model. Otherwise,
the server should be run with auto-complete feature enabled
(--strict-model-config=false
) to generate the model configuration.
import mlflow.onnx
import onnx
model = onnx.load("examples/onnx_float32_int32_int32/1/model.onnx")
mlflow.onnx.log_model(model, "triton", registered_model_name="onnx_float32_int32_int32")
Triton flavor
For other model frameworks that Triton supports but not yet recognized by
the MLFlow Tritonserver, the cli mlflow_tritonserver_cli
can be used to
publish triton
flavor models to MLflow. A triton
flavor model is a directory
containing the model files following the
model layout.
Below is an example usage:
mlflow_tritonserver_cli --model_name onnx_float32_int32_int32 --model_directory <path-to-the-examples-directory>/onnx_float32_int32_int32 --flavor triton
Deploy models tracked in MLflow to Triton
Once a model is published and tracked in MLflow, it can be deployed to Triton via MLflow's deployments command, the following command will download the model to Triton's model repository and request Triton to load the model.
mlflow deployments create -t triton --flavor triton --name onnx_float32_int32_int32 -m models:/onnx_float32_int32_int32/1
Perform inference
After the model is deployed, the following command is the CLI usage to send inference request to a deployment.
mlflow deployments predict -t triton --name onnx_float32_int32_int32 --input-path <path-to-the-examples-directory>/input.json --output-path output.json
The inference result will be written in output.json
and you may compare it
with the results in expected_output.json
MLflow Deployments
"MLflow Deployments" is a set of MLflow APIs for deploying MLflow models to custom serving tools. The MLflow Triton plugin implements the following deployment functions to support the interaction with Triton server in MLflow.
Create Deployment
MLflow deployments create API deploys a model to the Triton target, which will download the model to Triton's model repository and request Triton to load the model.
To create a MLflow deployment using CLI:
mlflow deployments create -t triton --flavor triton --name model_name -m models:/model_name/1
To create a MLflow deployment using Python API:
from mlflow.deployments import get_deploy_client
client = get_deploy_client('triton')
client.create_deployment("model_name", "models:/model_name/1", flavor="triton")
Delete Deployment
MLflow deployments delete API removes an existing deployment from the Triton target, which will remove the model in Triton's model repository and request Triton to unload the model.
To delete a MLflow deployment using CLI
mlflow deployments delete -t triton --name model_name
To delete a MLflow deployment using Python API
from mlflow.deployments import get_deploy_client
client = get_deploy_client('triton')
client.delete_deployment("model_name")
Update Deployment
MLflow deployments update API updates an existing deployment with another model (version) tracked in MLflow, which will overwrite the model in Triton's model repository and request Triton to reload the model.
To update a MLflow deployment using CLI
mlflow deployments update -t triton --flavor triton --name model_name -m models:/model_name/2
To update a MLflow deployment using Python API
from mlflow.deployments import get_deploy_client
client = get_deploy_client('triton')
client.update_deployment("model_name", "models:/model_name/2", flavor="triton")
List Deployments
MLflow deployments list API lists all existing deployments in Triton target.
To list all MLflow deployments using CLI
mlflow deployments list -t triton
To list all MLflow deployments using Python API
from mlflow.deployments import get_deploy_client
client = get_deploy_client('triton')
client.list_deployments()
Get Deployment
MLflow deployments get API returns information regarding a specific deployments in Triton target.
To list a specific MLflow deployment using CLI
mlflow deployments get -t triton --name model_name
To list a specific MLflow deployment using Python API
from mlflow.deployments import get_deploy_client
client = get_deploy_client('triton')
client.get_deployment("model_name")
Run Inference on Deployments
MLflow deployments predict API runs inference by preparing and sending the request to Triton and returns the Triton response.
To run inference using CLI
mlflow deployments predict -t triton --name model_name --input-path input_file --output-path output_file
To run inference using Python API
from mlflow.deployments import get_deploy_client
client = get_deploy_client('triton')
client.predict("model_name", inputs)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mlflow-tritonserver-1.1.0.tar.gz
.
File metadata
- Download URL: mlflow-tritonserver-1.1.0.tar.gz
- Upload date:
- Size: 24.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0879eadc067c9d48cb45fa03f339db320f86b8419c07af4a2bd4ccf19dca3a39 |
|
MD5 | c6d992680763efd52a7d720755484cb8 |
|
BLAKE2b-256 | 41148c34590f7e09f5fb17cce1ca19b641d58c9345a80be34484214e7381af0f |
File details
Details for the file mlflow_tritonserver-1.1.0-py3-none-any.whl
.
File metadata
- Download URL: mlflow_tritonserver-1.1.0-py3-none-any.whl
- Upload date:
- Size: 17.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b0d46159741e5c9cd2f717a6afe9c4c88e81a1d570c86652612f02f7b2ce87c3 |
|
MD5 | 39489b6e63b1e74d5b264ffb544f99d1 |
|
BLAKE2b-256 | 14e46211dd275639f26b9afc2398acd077e6778aef430d9f04ceea18770a8cb4 |