An Apache Airflow provider for Ray
Project description
Apache Airflow Provider for Ray
A provider you can install into your Airflow environment to access custom Ray XCom backends, Ray Hooks, and Ray Operators.
🧪 Experimental Version
This provider is an experimental alpha containing necessary components to orchestrate and schedule Ray tasks using Airflow. It is actively maintained and being developed to bring production-ready workflows to Ray using Airflow. Thie release contains everything needed to begin building these workflows using the Airlfow taskflow API.
Release: 0.1.1.alpha0
Requirements
Visit the Ray Project page for more info on Ray.
⚠️ The server version and client version (build) of Ray MUST be the same.
- Python Version == 3.7
- Airflow Version >= 2.0.0
- Ray Version == 1.3.0
- Filelock >= 3.0.0
Modules
- Ray XCom Backend: Custom XCom backend Serving to move data between tasks using the Ray API with its internal Plasma store, thereby allowing for in-memory distributed processing.
- Ray Hook: Extension of
Http
hooks that uses the Ray client to provide connections to the Ray Server. - Ray Decorator: Task decorator
to be used with the task flow API, combining wrapping the existing airflow
@task
decorate withray.remote
functionality, thereby executing each task on the ray cluster.
Configuration and Usage
-
Add the provider package wheel file to the root directory of your Airflow project.
-
In your Airflow
Dockerfile
, you will need to add an environment variable to specify your custom backend, along with the provider wheel install. Add the following:FROM quay.io/astronomer/ap-airflow:2.0.0-3-buster-onbuild USER root RUN pip install airflow_provider_ray-0.1.0a0-py3-none-any.whl RUN pip uninstall astronomer-airflow-version-check -y USER astro ENV AIRFLOW__CORE__XCOM_BACKEND=ray_provider.xcom.ray_backend.RayBackend
Check ap-airflow version, if unsure, change to
ap-airflow:latest-onbuild
-
We are using a Ray
1.3.0
and python version3.7
. To get a bleeding edge version of Ray, you can to follow this format to build the wheel url in yourrequirements.txt
file:https://s3-us-west-2.amazonaws.com/ray-wheels/master/{COMMIT_HASH}/ray-{RAY_VERSION}-{PYTHON_VERSION}-{PYTHON_VERSION}m-{OS_VERSION}_intel.whl
For example, for linux based systems and linux containers @ commit
0f9d1bb223bb1ba5edbdd557f2f2f3551a51061f
it would be:https://s3-us-west-2.amazonaws.com/ray-wheels/master/0f9d1bb223bb1ba5edbdd557f2f2f3551a51061f/ray-2.0.0.dev0-cp37-cp37m-manylinux2014_x86_64.whl
For MacOS system, the wheel version naming goes as
macosx_10_13
, so that would be:https://s3-us-west-2.amazonaws.com/ray-wheels/master/0f9d1bb223bb1ba5edbdd557f2f2f3551a51061f/ray-2.0.0.dev0-cp37-cp37m-macosx_10_13_intel.whl
-
Start your Airflow environment and open the UI.
-
In the Airflow UI, add an
Airflow Pool
with the following:Pool (name): ray_worker_pool Slots: 25
-
In the Airflow UI, add an
Airflow Connection
with the following:Conn Id: ray_cluster_connection Conn Type: HTTP Host: Cluster IP Address, with basic Auth params if needed Port: 10001
-
In your Airflow DAG python file, you must include the following in your
default_args
dictionary:from ray_provider.xcom.ray_backend import RayBackend . . . default_args = { 'on_success_callback': RayBackend.on_success_callback, 'on_failure_callback': RayBackend.on_failure_callback, . . . } @dag( default_args=default_args, . . ) def ray_example_dag(): # do stuff
-
Using the taskflow API, your airflow task should now use the
@ray_task
decorator for any ray task, like:from ray_provider.operators.ray_decorators import ray_task . . def ray_example_dag(): @ray_task(ray_conn_id='ray_cluster_connection') def sum_cols(df: pd.DataFrame) -> pd.DataFrame: return pd.DataFrame(df.sum()).T
Project Contributors and Maintainers
This project is built in collaboration between Astronomer and Anyscale, with active contributions from:
This project is formatted via black
:
pip install black
black .
Connections
TBD - [Info on building a connection to Ray]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for airflow-provider-ray-0.1.1a0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | cbcbe11772a1a1f64c6c8ce8a67fbba4513b157d2be19b41047b74bcf62afb4a |
|
MD5 | 8c09441dd61c788e3890e55be9bf099e |
|
BLAKE2b-256 | 1c9c060933823ea76904e4c8b9bb97f191b142fdebf67cc02c4c24a7806d94e8 |