A tool for generating end-to-end pipelines on GCP.
Project description
ML Pipeline Generator
ML Pipeline Generator is a tool for generating end-to-end pipelines composed of GCP components so that users can easily migrate their local ML models onto GCP and start realizing the benefits of the Cloud quickly.
The following ML frameworks will be supported:
- TensorFlow (TF)
- Scikit-learn (SKL)
- XGBoost (XGB)
The following backends are currently supported for model training:
- Google Cloud AI Platform
- AI Platform Pipelines (managed Kubeflow Pipelines)
Installation
pip install ml-pipeline-gen
Setup
GCP credentials
gcloud auth login
gcloud auth application-default login
gcloud config set project [PROJECT_ID]
Enabling required APIs
The tool requires following Google Cloud APIs to be enabled:
Enable the above APIs by following the links, or run the below command to enable the APIs for your project.
gcloud services enable ml.googleapis.com \
compute.googleapis.com \
storage-component.googleapis.com
Python environment
python3 -m venv venv
source ./venv/bin/activate
pip install ml-pipeline-gen
Kubeflow
Create a Kubeflow deployment using Cloud Marketplace. Follow these instructions to give the Kubeflow instance access to GCP services.
A future release will automate provisioning of KFP clusters and incorporate K8s Workload Identity for auth.
Cloud AI Platform Demo
This demo uses the scikit-learn model in
examples/sklearn/model/sklearn_model.py
to create a training module to run on
CAIP. First, make a copy of the scikit-learn example.
cp -r examples/sklearn sklearn-demo
cd sklearn-demo
Create a config.yaml
by using the config.yaml.example
template. See the
Input args section for details on the config parameters. Once the
config file is filled out, run the demo.
python demo.py
Running this demo uses the config file to generate a trainer/
module that is
compatible with CAIP.
KFP Demo
This demo uses the TensorFlow model in examples/kfp/model/tf_model.py
to
create a KubeFlow Pipeline (hosted on AI Platform Pipelines). First, make a copy
of the kfp example.
cp -r examples/kfp kfp-demo
cd kfp-demo
Create a config.yaml
by using the config.yaml.example
template. See the
Input args section for details on the config parameters. Once the
config file is filled out, run the demo.
python demo.py
Running this demo uses the config file to generate a trainer/
module that is
compatible with CAIP. It also generates orchestration/pipeline.py
, which
compiles a Kubeflow Pipeline.
Tests
The tests use unittest
, Python's built-in unit testing framework. By running
python -m unittest
, the framework performs test discovery to find all tests
within this project. Tests can be run on a more granular level by feeding a
directory to test discover. Read more about unittest
here.
python -m unittest
Input args
The following input args are included by default. Overwrite them by adding them as inputs in the config file.
Arg | Description |
---|---|
train_path | Dir or bucket containing train data. |
eval_path | Dir or bucket containing eval data. |
model_dir | Dir or bucket to save model files. |
batch_size | Number of rows of data to be fed into the model each iteration. |
max_steps | The maximum number of iterations to train the model for. |
learning_rate | Multiplier that controls how much the weights of our network are adjusted with respoect to the loss gradient. |
export_format | File format expected by the exported model at inference time. |
save_checkpoints_steps | Number of steps to run before saving a model checkpoint. |
keep_checkpoint_max | Number of model checkpoints to keep. |
log_step_count_steps | Number of steps to run before logging training performance. |
eval_steps | Number of steps to use to evaluate the model. |
early_stopping_steps | Number of steps with no loss decrease before stopping early. |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ml_pipeline_gen-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9bb18f4ba9e2498ac55a992971cd7a098d42d82cf98d93c3ee67e5e8f77b59dd |
|
MD5 | cb085892d848387544f0be36686061ea |
|
BLAKE2b-256 | b7b0d7a2117908ce1808b4476cf4f23f2ffc023c1d9e381a71c8a0c105d7d223 |