Skip to main content

Tensorflow Cluster on Ray

Project description

The raytf framework provides a simple interface to support distributed training on ray, including tensorflow/pytorch/mxnet. Now tensorflow has been supported, others will be included in later.

Quick Start

Only tested under Python3.6 version

  1. Install the latest ray version: pip install ray

  2. Install the latest raytf: pip install raytf

  3. Git clone this project: git clone https://github.com/zuston/raytf.git

  4. Enter the example folder and execute the python script file, like the following command.

cd raytf
cd example
python mnist.py

How to Use

from raytf.raytf_driver import Driver
# When you using it in local single machine
# ray.init()
tf_cluster = Driver.build(resources=
    {
        'ps': {'cores': 2, 'memory': 2, 'gpu': 2, 'instances': 2},
        'worker': {'cores': 2, 'memory': 2, 'gpu': 2, 'instances': 6},
        'chief': {'cores': 2, 'memory': 2, 'gpu': 2, 'instances': 1}
    },
    event_log='/tmp/opal/4',
    resources_allocation_timeout=10
)
tf_cluster.start(model_process=process, args=None)

This training code will be attached to the existed perm-Ray cluster. If you want to debug, you can use ray.init() to init Ray cluster in local.

When you specify the event_log in tf builder, sidecar tensorboard will be started on one worker.

GANG scheduler has been supported.Besides raytf provides the configuration of timeout for waiting resources which is shown in above code. The resources_reserved_timeout unit is sec

How to build and deploy

<Requirement> python -m pip install twine

  1. python setup.py bdist\_wheel --universal

  2. python -m pip install xxxxxx.whl

  3. twine upload dist/*

Tips

  1. To solve the problem of Python module importing on Ray perm-cluster, this project must use Ray 1.5+ version, refer to this RFC(https://github.com/ray-project/ray/issues/14019)

  2. This project is only be tested by Tensorflow estimator training

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

raytf-0.0.1-py2.py3-none-any.whl (11.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file raytf-0.0.1-py2.py3-none-any.whl.

File metadata

  • Download URL: raytf-0.0.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.7

File hashes

Hashes for raytf-0.0.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 9511d8b1ae1cb4bc346c52a0b83d15d33f1e92137421c2b4ffb11d427f10db0e
MD5 104f4c997148ad34f23c07b1cf48237e
BLAKE2b-256 d9aaffe96a337eabe51e13fece6b8f130c13485d19b608e20df553215a3122ed

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page