Skip to main content

A tool for generating end-to-end pipelines on GCP.

Project description

ML Pipeline Generator

ML Pipeline Generator is a tool for generating end-to-end pipelines composed of GCP components so that users can easily migrate their local ML models onto GCP and start realizing the benefits of the Cloud quickly.

The following ML frameworks will be supported:

  1. TensorFlow (TF)
  2. Scikit-learn (SKL)
  3. XGBoost (XGB)

The following backends are currently supported for model training:

  1. Google Cloud AI Platform
  2. AI Platform Pipelines (managed Kubeflow Pipelines)

Installation

pip install ml-pipeline-gen

Setup

GCP credentials

gcloud auth login
gcloud auth application-default login
gcloud config set project [PROJECT_ID]

Enabling required APIs

The tool requires following Google Cloud APIs to be enabled:

  1. Compute Engine
  2. AI Platform Training and Prediction
  3. Cloud Storage

Enable the above APIs by following the links, or run the below command to enable the APIs for your project.

gcloud services enable ml.googleapis.com \
compute.googleapis.com \
storage-component.googleapis.com

Python environment

python3 -m venv venv
source ./venv/bin/activate
pip install ml-pipeline-gen

Kubeflow

Create a Kubeflow deployment using Cloud Marketplace. Follow these instructions to give the Kubeflow instance access to GCP services.

A future release will automate provisioning of KFP clusters and incorporate K8s Workload Identity for auth.

Cloud AI Platform Demo

This demo uses the scikit-learn model in examples/sklearn/model/sklearn_model.py to create a training module to run on CAIP. First, make a copy of the scikit-learn example.

cp -r examples/sklearn sklearn-demo
cd sklearn-demo

Create a config.yaml by using the config.yaml.example template. See the Input args section for details on the config parameters. Once the config file is filled out, run the demo.

python demo.py

Running this demo uses the config file to generate a trainer/ module that is compatible with CAIP.

KFP Demo

This demo uses the TensorFlow model in examples/kfp/model/tf_model.py to create a KubeFlow Pipeline (hosted on AI Platform Pipelines). First, make a copy of the kfp example.

cp -r examples/kfp kfp-demo
cd kfp-demo

Create a config.yaml by using the config.yaml.example template. See the Input args section for details on the config parameters. Once the config file is filled out, run the demo.

python demo.py

Running this demo uses the config file to generate a trainer/ module that is compatible with CAIP. It also generates orchestration/pipeline.py, which compiles a Kubeflow Pipeline.

Tests

The tests use unittest, Python's built-in unit testing framework. By running python -m unittest, the framework performs test discovery to find all tests within this project. Tests can be run on a more granular level by feeding a directory to test discover. Read more about unittest here.

python -m unittest

Input args

The following input args are included by default. Overwrite them by adding them as inputs in the config file.

Arg Description
train_path Dir or bucket containing train data.
eval_path Dir or bucket containing eval data.
model_dir Dir or bucket to save model files.
batch_size Number of rows of data to be fed into the model each iteration.
max_steps The maximum number of iterations to train the model for.
learning_rate Multiplier that controls how much the weights of our network are adjusted with respoect to the loss gradient.
export_format File format expected by the exported model at inference time.
save_checkpoints_steps Number of steps to run before saving a model checkpoint.
keep_checkpoint_max Number of model checkpoints to keep.
log_step_count_steps Number of steps to run before logging training performance.
eval_steps Number of steps to use to evaluate the model.
early_stopping_steps Number of steps with no loss decrease before stopping early.

Contribute

To modify the behavior of the library, install ml-pipeline-gen using:

pip install -e .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml-pipeline-gen-0.0.3.tar.gz (21.3 kB view hashes)

Uploaded Source

Built Distribution

ml_pipeline_gen-0.0.3-py3-none-any.whl (18.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page