Skip to main content

Use Kubeflow Pipeline to run distributed training jobs

Project description

Kubeflow Pipeline distributed training support

kfp-dist-train contains utilities to use together with Kubeflow Pipeline to enable writing distributed training code directly using Kubeflow Pipeline SDK.

Get Started

  1. Setup an Kubeflow environment (maybe use https://github.com/alauda/kubeflow-chart).
  2. Upload the example kfp-dist-train.ipynb into a Notebook instance, or setup local pipeline submit.
  3. Execute the example to submit a workflow, you can configure the number of workers in the Kubeflow web UI. The job should look like below:

Some Roadmap

  • support kfpdist.component(dist=True) decorator as an wrap of dsl.component
  • support parameter server strategy
  • support pytorch

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

kfpdist-0.1.7-py3-none-any.whl (3.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page