Skip to main content

A sample Apache Airflow provider package built by Astronomer.

Project description

Airflow Airflow

Apache Airflow Provider for SkyPilot

A provider you can utilize multiple clouds on Apache Airflow through SkyPilot.


Installation

The SkyPilot provider for Apache Airflow was developed and tested on an environment with the following dependencies installed:

The installation of the SkyPilot provider may start from the Airflow environment configured with Docker instructed in "Running Airflow in Docker". Base on the docker configuration, add a pip install command in the Dockerfile and build your own Docker image.

RUN pip install --user airflow-provider-skypilot

Then, make sure that SkyPilot is properly installed and initialized on the same environment. The initialization includes cloud account setup and access verification. Please refer to SkyPilot Installation for more information.

Configuration

A SkyPilot provider process runs on an Airflow worker, but it stores its metadata into the Airflow master node. This scheme allows a set of consecutive sky tasks runs across multiple workers by sharing the metadata.

Following settings in the docker-compose.yaml defines the data sharing, including cloud credentials, metadata and workspace.

x-airflow-common:
  environment:
    volumes:
      - ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags
      - ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs
      - ${AIRFLOW_PROJ_DIR:-.}/config:/opt/airflow/config
      - ${AIRFLOW_PROJ_DIR:-.}/plugins:/opt/airflow/plugins
      # mount cloud credentials
      - ${HOME}/.aws:/opt/airflow/sky_home_dir/.aws
      - ${HOME}/.azure:/opt/airflow/sky_home_dir/.azure
      - ${HOME}/.config/gcloud:/opt/airflow/sky_home_dir/.config/gcloud
      - ${HOME}/.scp:/opt/airflow/sky_home_dir/.scp
      # mount sky metadata 
      - ${HOME}/.sky:/opt/airflow/sky_home_dir/.sky
      - ${HOME}/.ssh:/opt/airflow/sky_home_dir/.ssh
      # mount sky working dir
      - ${HOME}/sky_workdir:/opt/airflow/sky_home_dir/sky_workdir

This example mounts the cloud credentials for AWS, Azure, GCP, and SCP, which have been made by SkyPilot could account setup. For SkyPilot metadata, check .sky/ and .ssh/ are placed in your ${HOME} directory and mount them. Additionally, you can mount your own directory like sky_workdir/ for user resources including user codes and yaml task definition files for Skypilot execution.

Note that all Sky directories are mounted under sky_home_dir/. They will be symbolic-linked to ${HOME}/ in workers where a SkyPilot provider process actually runs.

Usage

The SkyPilot provider includes the following operators:

  • SkyLaunchOperator
  • SkyExecOperator
  • SkyDownOperator
  • SkySSHOperator
  • SkyRsyncUpOperator
  • SkyRsyncDownOperator

SkyLaunchOperator creates an cloud cluster and executes a Sky task, as shown below:

sky_launch_task = SkyLaunchOperator(
    task_id="sky_launch_task",
    sky_task_yaml="~/sky_workdir/my_task.yaml",
    cloud="cheapest", # aws|azure|gcp|scp|ibm ...
    gpus="A100:1",
    minimum_cpus=16,
    minimum_memory=32,
    auto_down=False,
    sky_home_dir='/opt/airflow/sky_home_dir', #set by default
    dag=dag
)

Once SkyLaunchOperator creates a Sky cluster with auto_down=False, the created cluster can be utilized by the other Sky operators. Please refer to an example dag for multiple Sky operators running on a single Sky cluster.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

airflow_provider_skypilot-0.1.3-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file airflow_provider_skypilot-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for airflow_provider_skypilot-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 dd116fc12c7c31ca4e5358db0b51bbd53eb6cec7a633d4a3d3150184ed0eceec
MD5 b2764ace06294206ac5cf50daede2ed2
BLAKE2b-256 f11f5efdb71537ab63a8fd62f7a8dab386fa008d2374d62094afbf533cbd649a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page