Skip to main content

A mlflow flavor for working with H2O-3 MOJO and POJO models

Project description

A tiny library containing a MLFlow flavor for working with H2O-3 MOJO and POJO models.

Logging Models to MLFlow Registry

The model that was trained with H2O-3 runtime can be exported to MLFlow registry with log_model function.:

import mlflow
import h2o_mlflow_flavor

mlflow.set_tracking_uri("http://127.0.0.1:8080")

h2o_model = ... training phase ...

with mlflow.start_run(run_name="myrun") as run:
    h2o_mlflow_flavor.log_model(h2o_model=h2o_model,
                                artifact_path="folder",
                                model_type="MOJO",
                                extra_prediction_args=["--predictCalibrated"])

Compared to log_model functions of the other flavors being a part of MLFlow, this function has two extra arguments:

  • model_type - It indicates whether the model should be exported as MOJO or POJO. The default value is MOJO.

  • extra_prediction_args - A list of extra arguments for java scoring process. Possible values:

    • --setConvertInvalidNum - The scoring process will convert invalid numbers to NA.

    • --predictContributions - The scoring process will Return also Shapley values a long with the predictions. Model must support that Shapley values, otherwise scoring process will throw an error.

    • --predictCalibrated - The scoring process will also return calibrated prediction values.

The save_model function that persists h2o binary model to MOJO or POJO has the same signature as the log_model function.

Extracting Information about Model

The flavor offers several functions to extract information about the model.

  • get_metrics(h2o_model, metric_type=None) - Extracts metrics from the trained H2O binary model. It returns dictionary and takes following parameters:

    • h2o_model - An H2O binary model.

    • metric_type - The type of metrics. Possible values are “training”, “validation”, “cross_validation”. If parameter is not specified, metrics for all types are returned.

  • get_params(h2o_model) - Extracts training parameters for the H2O binary model. It returns dictionary and expects one parameter:

    • h2o_model - An H2O binary model.

  • get_input_example(h2o_model, number_of_records=5, relevant_columns_only=True) - Creates an example Pandas dataset from the training dataset of H2O binary model. It takes following parameters:

    • h2o_model - An H2O binary model.

    • number_of_records - A number of records that will be extracted from the training dataset.

    • relevant_columns_only - A flag indicating whether the output dataset should contain only columns required by the model. Defaults to True.

The functions can be utilized as follows:

import mlflow
import h2o_mlflow_flavor

mlflow.set_tracking_uri("http://127.0.0.1:8080")

h2o_model = ... training phase ...

with mlflow.start_run(run_name="myrun") as run:
        mlflow.log_params(h2o_mlflow_flavor.get_params(h2o_model))
        mlflow.log_metrics(h2o_mlflow_flavor.get_metrics(h2o_model))
        input_example = h2o_mlflow_flavor.get_input_example(h2o_model)
        h2o_mlflow_flavor.log_model(h2o_model=h2o_model,
                                    input_example=input_example,
                                    artifact_path="folder",
                                    model_type="MOJO",
                                    extra_prediction_args=["--predictCalibrated"])

Model Scoring

After a model obtained from the model registry, the model doesn’t require h2o runtime for ability to score. The only thing that model requires is a h2o-gemodel.jar which was persisted with the model during saving procedure. The model could be loaded by the function load_model(model_uri, dst_path=None). It returns an objecting making predictions on Pandas dataframe and takes the following parameters:

  • model_uri - An unique identifier of the model within MLFlow registry.

  • dst_path - (Optional) A local filesystem path for downloading the persisted form of the model.

The object for scoring could be obtained also via the pyfunc flavor as follows:

import mlflow
mlflow.set_tracking_uri("http://127.0.0.1:8080")

logged_model = 'runs:/9a42265cf0ef484c905b02afb8fe6246/iris'
loaded_model = mlflow.pyfunc.load_model(logged_model)

import pandas as pd
data = pd.read_csv("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv")
loaded_model.predict(data)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

h2o_mlflow_flavor-0.1.0-py2.py3-none-any.whl (7.4 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page