Skip to main content

Custom cdk constructs for apache airflow

Project description

airflow-cdk

This project makes it simple to deploy airflow via ECS fargate using the aws cdk in Python.

Usage

There are two main ways that this package can be used.

Standalone Package

For those already familiar with the aws cdk, add this project as a dependency i.e. pip install airflow-cdk and/or add to requirement.txt and use the FargateAirflow construct like so.

from aws_cdk import core
from airflow_cdk import FargateAirflow


app = core.App()

FargateAirflow(
    app,
    "airflow-cdk",
    postgres_password="replacethiswithasecretpassword")

app.synth()

cdk deploy

That's it.

Cloning

You can also clone this repository and alter the FargateAirflow construct to your heart's content.

That also provides you an added benefit of utilizing the tasks.py tasks with invoke to do things like create new dags easily i.e. inv new-dag

You would then also easily be able to use the existing docker-compose for local development with some minor modifications for your setup.

The easiest way to get started would be just a one-line change to the app.py example above and to the docker-compose.yml file.

from aws_cdk import core
from airflow_cdk import FargateAirflow


app = core.App()

FargateAirflow(
    app,
    "airflow-cdk",
    postgres_password="replacethiswithasecretpassword"),
    # this is the only change to make when cloning
    base_image=aws_ecs.ContainerImage.from_asset(".")

app.synth()

Then, in the docker-compose.yml file, simply delete, comment out, or change the image name for the image: knowsuchagency/airflow-cdk line in x-airflow.

Now the same container that would be created by docker-compose build will be deployed to ECS for your web, worker, and scheduler images by cdk deploy.

Components

The following aws resources will be deployed as ecs tasks within the same cluster and vpc by default:

  • an airflow webserver task
    • and an internet-facing application load-balancer
  • an airflow scheduler task
  • an airflow worker task
    • (note) it will auto-scale based on cpu and memory usage up to a total of 16 instances at a time by default starting from 1
  • a rabbitmq broker
    • an application load balancer that will allow you to log in to the rabbitmq management console with the default user/pw guest/guest
  • an rds instance
  • an s3 bucket for logs

Why is this awesome?

Apart from the fact that we're able to describe our infrastructure using the same language and codebase we use to author our dags?

Since we're using cloudformation under-the-hood, whenever we change a part of our code or infrastructure, only those changes that are different from our last deployment will be deployed.

Meaning, if all we do is alter the code we want to run on our deployment, we simply re-build and publish our docker container (which is done for us if we use aws_ecs.ContainerImage.from_asset(".")) prior to cdk deploy!

Existing users of airflow will know how tricky it can be to manage deployments when you want to distinguish between pushing changes to your codebase i.e. dags and actual infrastructure deployments.

We just have to be careful not to deploy while we have some long-running worker task we don't want to interrupt since fargate will replace those worker instances with new ones running our updated code. Now there's basically no distinction.

Notes

  • Before running cdk destroy, you will want to empty the s3 bucket that's created otherwise the command may fail at that stage and the bucket can be left in a state that makes it difficult to delete later on

TODOs

  • create a custom component to deploy airflow to an ec2 cluster
  • improve documentation
  • (possibly) subsume the airflow stable helm chart as a cdk8s chart

Contributions Welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow_cdk-0.6.0.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

airflow_cdk-0.6.0-py3-none-any.whl (12.9 kB view details)

Uploaded Python 3

File details

Details for the file airflow_cdk-0.6.0.tar.gz.

File metadata

  • Download URL: airflow_cdk-0.6.0.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for airflow_cdk-0.6.0.tar.gz
Algorithm Hash digest
SHA256 b07b80f7f2adf93a23b356f8a8c3857ded8ece767df6a6416f7eba1db1995327
MD5 7e33a56b1d3fb07944155bbfab360bf9
BLAKE2b-256 0a7800908e34dde551f53119da36efa7a5e6afdb082be2a14cc1ea3329aaebc1

See more details on using hashes here.

File details

Details for the file airflow_cdk-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: airflow_cdk-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 12.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for airflow_cdk-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e96946ddbb340cce491367dd343e27e41486f235b73a4ed86d659cf71d58986f
MD5 c26c05cb032b2d0278d37289ce11e28c
BLAKE2b-256 fede5072d1c6212e5ebf0ba10ad00c0f0b67a86d7817c02aa2a78ecf452ba39c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page