Tensorflow Cluster on Ray
Project description
The raytf framework provides a simple interface to support distributed training on ray, including tensorflow/pytorch/mxnet. Now tensorflow has been supported, others will be included in later.
Quick Start
Only tested under Python3.6 version
Install the latest ray version: pip install ray
Install the latest raytf: pip install raytf
Git clone this project: git clone https://github.com/zuston/raytf.git
Enter the example folder and execute the python script file, like the following command.
cd raytf
cd example
python mnist.py
How to Use
from raytf.raytf_driver import Driver
# When you using it in local single machine
# ray.init()
tf_cluster = Driver.build(resources=
{
'ps': {'cores': 2, 'memory': 2, 'gpu': 2, 'instances': 2},
'worker': {'cores': 2, 'memory': 2, 'gpu': 2, 'instances': 6},
'chief': {'cores': 2, 'memory': 2, 'gpu': 2, 'instances': 1}
},
event_log='/tmp/opal/4',
resources_allocation_timeout=10
)
tf_cluster.start(model_process=process, args=None)
This training code will be attached to the existed perm-Ray cluster. If you want to debug, you can use ray.init() to init Ray cluster in local.
When you specify the event_log in tf builder, sidecar tensorboard will be started on one worker.
GANG scheduler has been supported.Besides raytf provides the configuration of timeout for waiting resources which is shown in above code. The resources_reserved_timeout unit is sec
How to build and deploy
<Requirement> python -m pip install twine
python setup.py bdist\_wheel --universal
python -m pip install xxxxxx.whl
twine upload dist/*
Tips
To solve the problem of Python module importing on Ray perm-cluster, this project must use Ray 1.5+ version, refer to this RFC(https://github.com/ray-project/ray/issues/14019)
This project is only be tested by Tensorflow estimator training
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file raytf-0.0.1-py2.py3-none-any.whl
.
File metadata
- Download URL: raytf-0.0.1-py2.py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9511d8b1ae1cb4bc346c52a0b83d15d33f1e92137421c2b4ffb11d427f10db0e |
|
MD5 | 104f4c997148ad34f23c07b1cf48237e |
|
BLAKE2b-256 | d9aaffe96a337eabe51e13fece6b8f130c13485d19b608e20df553215a3122ed |