Use Kubeflow Pipeline to run distributed training jobs
Project description
Kubeflow Pipeline distributed training support
kfp-dist-train contains utilities to use together with Kubeflow Pipeline to enable writing distributed training code directly using Kubeflow Pipeline SDK.
Get Started
- Setup an Kubeflow environment (maybe use https://github.com/alauda/kubeflow-chart).
- Upload the example kfp-dist-train.ipynb into a Notebook instance, or setup local pipeline submit.
- Execute the example to submit a workflow, you can configure the number of workers in the Kubeflow web UI. The job should look like below:
Some Roadmap
- support
kfpdist.component(dist=True)decorator as an wrap ofdsl.component - support parameter server strategy
- support pytorch
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kfpdist-0.1.8-py3-none-any.whl.
File metadata
- Download URL: kfpdist-0.1.8-py3-none-any.whl
- Upload date:
- Size: 3.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14ec778d5cbaf6dded96b7d15df7ef0ca8f76c6503271b4e4a2d18fe038e2f64
|
|
| MD5 |
a88be020ed23dac02aaeaa6117e9bbfd
|
|
| BLAKE2b-256 |
9939589db3838dcb6edf44cffa4d460d212833c04db9329f91a7365cab0a7df1
|