Skip to main content

Use Kubeflow Pipeline to run distributed training jobs

Project description

Kubeflow Pipeline distributed training support

kfp-dist-train contains utilities to use together with Kubeflow Pipeline to enable writing distributed training code directly using Kubeflow Pipeline SDK.

Get Started

  1. Setup an Kubeflow environment (maybe use https://github.com/alauda/kubeflow-chart).
  2. Upload the example kfp-dist-train.ipynb into a Notebook instance, or setup local pipeline submit.
  3. Execute the example to submit a workflow, you can configure the number of workers in the Kubeflow web UI. The job should look like below:

Some Roadmap

  • support kfpdist.component(dist=True) decorator as an wrap of dsl.component
  • support parameter server strategy
  • support pytorch

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

kfpdist-0.1.8-py3-none-any.whl (3.9 kB view details)

Uploaded Python 3

File details

Details for the file kfpdist-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: kfpdist-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 3.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for kfpdist-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 14ec778d5cbaf6dded96b7d15df7ef0ca8f76c6503271b4e4a2d18fe038e2f64
MD5 a88be020ed23dac02aaeaa6117e9bbfd
BLAKE2b-256 9939589db3838dcb6edf44cffa4d460d212833c04db9329f91a7365cab0a7df1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page