Use Kubeflow Pipeline to run distributed training jobs
Project description
Kubeflow Pipeline distributed training support
kfp-dist-train contains utilities to use together with Kubeflow Pipeline to enable writing distributed training code directly using Kubeflow Pipeline SDK.
Get Started
- Setup an Kubeflow environment (maybe use https://github.com/alauda/kubeflow-chart).
- Upload the example kfp-dist-train.ipynb into a Notebook instance, or setup local pipeline submit.
- Execute the example to submit a workflow, you can configure the number of workers in the Kubeflow web UI. The job should look like below:
Some Roadmap
- support
kfpdist.component(dist=True)
decorator as an wrap ofdsl.component
- support parameter server strategy
- support pytorch
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.