Skip to main content

An EXPERIMENTAL Ray decorator for Metaflow

Project description

Metaflow-Ray

Introduction

metaflow-ray is an extension for Metaflow that enables seamless integration with Ray, allowing users to easily leverage Ray's powerful distributed computing capabilities within their Metaflow flows. With metaflow-ray, you can spin up ephemeral Ray clusters on AWS Batch or Kubernetes directly from your Metaflow steps using the @metaflow_ray decorator. This enables you to run your Ray applications that leverage Ray Core, Ray Train, Ray Tune, and Ray Data effortlessly within your Metaflow flow.

Features

  • Effortless Ray Integration: This extension provides a simple and intuitive way to incorporate Ray into your Metaflow workflows using the @metaflow_ray decorator.
  • Elastic Ephemeral Ray Clusters: Let Metaflow orchestrate the creation of ephemeral Ray clusters on top of either:
    • AWS Batch multi-node parallel jobs
    • Kubernetes JobSets
  • Seamless Ray Initialization: The @metaflow_ray decorator handles the initialization of the Ray cluster for you, so you can focus on writing your Ray code without worrying about cluster setup
  • Wide Range of Applications: Run a wide variety of Ray applications, including hyperparameter tuning, distributed data processing, and distributed training, etc.

Installation

You can install metaflow-ray via pip alongside your existing Metaflow installation:

pip install metaflow-ray

Getting Started

  1. Import the @metaflow_ray decorator to enable integration:
from metaflow import metaflow_ray
  1. Decorate your step with @metaflow_ray and Initialize Ray within Your Step:
@step
def start(self):
    self.next(self.train, num_parallel=NUM_NODES)

@metaflow_ray
@pypi(packages={"ray": "2.39.0"})
@batch(**RESOURCES) # You can even use @kubernetes 
@step
def train(self):
    import ray
    ray.init()
    # Your step's training code here

    self.next(self.join)

@step
def join(self, inputs):
    self.next(self.end)

@step
def end(self):
    pass

Some things to consider:

  1. The num_parallel argument must always be specified in the step preceding the transition to a step decorated with @metaflow_ray. In the example above, the start step transitions to the train step, and it includes the num_parallel argument because the train step is decorated with @metaflow_ray. This ensures the train step can execute in parallel as intended.
  • As a consequence, there must always exist a corresponding join step as highlighted in the snippet above.
  1. For remote execution environments (i.e. @metaflow_ray is used in conjunction with @batch or @kubernetes), the value of num_parallel should greater than 1 i.e. at least 2. However, when using the @metaflow_ray decorator in a standalone manner, the value of num_parallel cannot be greater than 1 (on Windows and macOS) because locally spun up ray clusters do not support multiple nodes unless the underlying OS is linux based.
  • Ideally, ray should be available in the remote execution environments. If not, one can always use the @pypi decorator to introduce ray as a dependency.
  1. If the @metaflow_ray decorator is used in a local context i.e. without @batch or @kubernetes, a local ray cluster is spinned up, provided that the ray library (installable via pip install ray) is available in the underlying python environment. Running the flow again (locally) could result in the issue of:
ConnectionError: Ray is trying to start at 127.0.0.1:6379, but is already running at 127.0.0.1:6379.
Please specify a different port using the `--port` flag of `ray start` command.

One can simply run ray stop in another terminal to terminate the ray cluster that was spun up locally.

Examples

Check out the examples directory for sample Metaflow flows that demonstrate how to use the metaflow-ray extension with various Ray applications.

Directory Description
Counter Run a basic Counter with Ray that increments in Python, then do it inside a Metaflow task!
Process Dataframe Process a large dataframe in chunks with Ray and Python, then do it inside a Metaflow task!
Custom Docker Images Specify custom docker images on kubernetes / batch with Ray on Metaflow
Train XGBoost Use Ray Train to build XGBoost models on multiple nodes, including CPU and GPU examples.
Tune PyTorch Use Ray Tune to build PyTorch models on multiple nodes, including CPU and GPU examples.
PyTorch Lightning Get started with running a PyTorch Lightning job on the Ray cluster formed in a @metaflow_ray step.
GPT-J Fine Tuning Fine tune the 6B parameter GPT-J model on a Ray cluster.
vLLM Inference Run Inference on Llama models with vLLM and Ray via Metaflow.
End-to-end Batch Workflow Train models, evaluate them, and serve them. See how to use Metaflow workflows and various Ray abstractions together in a complete workflow.

License

metaflow-ray is distributed under the Apache License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metaflow_ray-0.1.4.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metaflow_ray-0.1.4-py2.py3-none-any.whl (19.1 kB view details)

Uploaded Python 2Python 3

File details

Details for the file metaflow_ray-0.1.4.tar.gz.

File metadata

  • Download URL: metaflow_ray-0.1.4.tar.gz
  • Upload date:
  • Size: 18.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.9

File hashes

Hashes for metaflow_ray-0.1.4.tar.gz
Algorithm Hash digest
SHA256 56749d7dbf9ad57d092717403675b9ed000834c2c497814a9b0ebf2e9c5e6761
MD5 015dbb07d0567e45004683b9250e7d6e
BLAKE2b-256 90a9a538638919dd0708aa9a5d5587bea2f16577418b4bd2fe90f718dca3f0d4

See more details on using hashes here.

File details

Details for the file metaflow_ray-0.1.4-py2.py3-none-any.whl.

File metadata

  • Download URL: metaflow_ray-0.1.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 19.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.9

File hashes

Hashes for metaflow_ray-0.1.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 aab42e1060fdca3a83af3267a29390f1cb29858efa9b579ae224850e89a114d7
MD5 b13b1125c0129d71e5d49a43679c2f09
BLAKE2b-256 a7e5b6c9eb3fc0246784a285ea1b2c85ee9e0a4d583edd89d0aee53c54e87f01

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page