A mlflow flavor for working with H2O-3 MOJO and POJO models
Project description
A tiny library containing a MLFlow flavor for working with H2O-3 MOJO and POJO models.
Logging Models to MLFlow Registry
The model that was trained with H2O-3 runtime can be exported to MLFlow registry with log_model function.:
import mlflow
import h2o_mlflow_flavor
mlflow.set_tracking_uri("http://127.0.0.1:8080")
h2o_model = ... training phase ...
with mlflow.start_run(run_name="myrun") as run:
h2o_mlflow_flavor.log_model(h2o_model=h2o_model,
artifact_path="folder",
model_type="MOJO",
extra_prediction_args=["--predictCalibrated"])
Compared to log_model functions of the other flavors being a part of MLFlow, this function has two extra arguments:
model_type - It indicates whether the model should be exported as MOJO or POJO. The default value is MOJO.
extra_prediction_args - A list of extra arguments for java scoring process. Possible values:
--setConvertInvalidNum - The scoring process will convert invalid numbers to NA.
--predictContributions - The scoring process will Return also Shapley values a long with the predictions. Model must support that Shapley values, otherwise scoring process will throw an error.
--predictCalibrated - The scoring process will also return calibrated prediction values.
The save_model function that persists h2o binary model to MOJO or POJO has the same signature as the log_model function.
Extracting Information about Model
The flavor offers several functions to extract information about the model.
get_metrics(h2o_model, metric_type=None) - Extracts metrics from the trained H2O binary model. It returns dictionary and takes following parameters:
h2o_model - An H2O binary model.
metric_type - The type of metrics. Possible values are “training”, “validation”, “cross_validation”. If parameter is not specified, metrics for all types are returned.
get_params(h2o_model) - Extracts training parameters for the H2O binary model. It returns dictionary and expects one parameter:
h2o_model - An H2O binary model.
get_input_example(h2o_model, number_of_records=5, relevant_columns_only=True) - Creates an example Pandas dataset from the training dataset of H2O binary model. It takes following parameters:
h2o_model - An H2O binary model.
number_of_records - A number of records that will be extracted from the training dataset.
relevant_columns_only - A flag indicating whether the output dataset should contain only columns required by the model. Defaults to True.
The functions can be utilized as follows:
import mlflow
import h2o_mlflow_flavor
mlflow.set_tracking_uri("http://127.0.0.1:8080")
h2o_model = ... training phase ...
with mlflow.start_run(run_name="myrun") as run:
mlflow.log_params(h2o_mlflow_flavor.get_params(h2o_model))
mlflow.log_metrics(h2o_mlflow_flavor.get_metrics(h2o_model))
input_example = h2o_mlflow_flavor.get_input_example(h2o_model)
h2o_mlflow_flavor.log_model(h2o_model=h2o_model,
input_example=input_example,
artifact_path="folder",
model_type="MOJO",
extra_prediction_args=["--predictCalibrated"])
Model Scoring
After a model obtained from the model registry, the model doesn’t require h2o runtime for ability to score. The only thing that model requires is a h2o-gemodel.jar which was persisted with the model during saving procedure. The model could be loaded by the function load_model(model_uri, dst_path=None). It returns an objecting making predictions on Pandas dataframe and takes the following parameters:
model_uri - An unique identifier of the model within MLFlow registry.
dst_path - (Optional) A local filesystem path for downloading the persisted form of the model.
The object for scoring could be obtained also via the pyfunc flavor as follows:
import mlflow
mlflow.set_tracking_uri("http://127.0.0.1:8080")
logged_model = 'runs:/9a42265cf0ef484c905b02afb8fe6246/iris'
loaded_model = mlflow.pyfunc.load_model(logged_model)
import pandas as pd
data = pd.read_csv("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv")
loaded_model.predict(data)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for h2o_mlflow_flavor-0.1.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b06523b02d98ef914d2dc76914430a6655e949376ad150fe764b73cf5158777 |
|
MD5 | 070433e6ca894c25c2f8a0b241aa8cf7 |
|
BLAKE2b-256 | 82158e4b5d385c40f9a811dde97f37b7fd12bef614c03828c7db9b5d8ca3fcee |