Skip to main content

Apache Airflow operator for running Google Cloud Run Jobs using green energy

Project description

logo
VertFlow

Run Docker containers on Airflow using green energy

Video Demo

📖 About

VertFlow is an Airflow operator for running Cloud Run Jobs on Google Cloud Platform in green data centres.
Cloud Run is a serverless container runtime, meaning you BYO Docker image and emit carbon only when the job is running. This is easier, cheaper and greener than managing a Kubernetes cluster spinning 24/7.

Not all data centres are created equal.
Data centres run on electricity generated from various sources, including fossil fuels which emit harmful carbon emissions. Some data centres are greener than others, using electricity from renewable sources such as wind and hydro.
When you deploy a container on Airflow using the VertFlow operator, it will run your container in the greenest GCP data centre possible.

ℹ️ Use in tandem with Cloud Composer 2 to save even more money and CO2.

🔧 How to install

  1. pip install VertFlow on your Airflow instance.
  2. Ensure your Airflow scheduler has outbound access to the public internet.
  3. Get a free API Key for the CO2 Signal API.

ℹ️ If you're using Cloud Composer, follow these instructions to install VertFlow from PyPi.

️ ℹ️ If you're using Cloud Composer with Private IP, follow these instructions to set up internet access.

🖱 How to use

Use the VertFlowOperator to instantiate a task in your DAG. Provide:

  • The address of the Docker image to run.
  • A runtime specification, e.g. timeout and memory limits.
  • A set of allowed regions to run the job in, based on latency, data governance and other considerations. VertFlow picks the greenest one.
from VertFlow.operator import VertFlowOperator
from airflow import DAG

with DAG(
        dag_id="hourly_dag_in_green_region",
        schedule_interval="@hourly"
) as dag:
    task = VertFlowOperator(
        image_address="us-docker.pkg.dev/cloudrun/container/job:latest",
        project_id="embroidered-elephant-739",
        name="hello-world",
        allowed_regions=["europe-west1", "europe-west4"],
        command="echo",
        arguments=["Hello World"],
        service_account_email_address="my-service-account@embroidered-elephant-739.iam.gserviceaccount.com",
        co2_signal_api_key = "5bbWXo9PQv3outh45E4fsLHwgsXvf1Z"
        ...
    )

🤷 Limitations

  • Cloud Run Jobs is not yet Generally Available. Production use is not advised. It also has a series of limitations, e.g. tasks can run for no longer than 1 hour.
  • The container running the Cloud Run Job cannot (yet) access resources on a VPC.
  • VertFlow (currently) assumes no emissions from transmitting data between regions. These may infact be non-trivial if storage and compute are far from each other. Charges may also be incurred in this scenario.

🔌🗺 Shout out to CO2 Signal

VertFlow works thanks to real-time global carbon intensity data, gifted to the world for non-commercial use by CO2 Signal.

🤝 How to contribute

Found a bug or fancy resolving one of the limitations? We welcome Pull Requests!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

VertFlow-0.1.2.tar.gz (14.0 kB view hashes)

Uploaded Source

Built Distribution

VertFlow-0.1.2-py3-none-any.whl (17.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page