Skip to main content

Machine Learning Orchestration

Project description

Dbnd Airflow Operator

This plugin was written to provide an explicit way of declaratively passing messages between two airflow operators.

This plugin was inspired by AIP-31. Essentially, this plugin connects between dbnd's implementation of tasks and pipelines to airflow operators.

This implementation uses XCom communication and XCom templates to transfer said messages. This plugin is fully functional, however as soon as AIP-31 is implemented it will support all edge-cases.

Fully tested on airflow 1.10.X.

Code Example

Here is an example of how we achieve our goal:

import logging
from typing import Tuple
from datetime import timedelta, datetime
from airflow import DAG
from airflow.utils.dates import days_ago
from airflow.operators.python_operator import PythonOperator
from dbnd import task

# Define arguments that we will pass to our DAG
default_args = {
    "owner": "airflow",
    "depends_on_past": False,
    "start_date": days_ago(2),
    "retries": 1,
    "retry_delay": timedelta(seconds=10),
}
@task
def my_task(p_int=3, p_str="check", p_int_with_default=0) -> str:
    logging.info("I am running")
    return "success"


@task
def my_multiple_outputs(p_str="some_string") -> Tuple[int, str]:
    return (1, p_str + "_extra_postfix")


def some_python_function(input_path, output_path):
    logging.error("I am running")
    input_value = open(input_path, "r").read()
    with open(output_path, "w") as output_file:
        output_file.write(input_value)
        output_file.write("\n\n")
        output_file.write(str(datetime.now().strftime("%Y-%m-%dT%H:%M:%S")))
    return "success"

# Define DAG context
with DAG(dag_id="dbnd_operators", default_args=default_args) as dag_operators:
    # t1, t2 and t3 are examples of tasks created by instantiating operators
    # All tasks and operators created under this DAG context will be collected as a part of this DAG
    t1 = my_task(2)
    t2, t3 = my_multiple_outputs(t1)
    python_op = PythonOperator(
        task_id="some_python_function",
        python_callable=some_python_function,
        op_kwargs={"input_path": t3, "output_path": "/tmp/output.txt"},
    )
    """
    t3.op describes the operator used to execute my_multiple_outputs
    This call defines the some_python_function task's operator as dependent upon t3's operator
    """
    python_op.set_upstream(t3.op)

As you can see, messages are passed explicitly between all three tasks:

  • t1, the result of the first task is passed to the next task my_multiple_outputs
  • t2 and t3 represent the results of my_multiple_outputs
  • some_python_function is wrapped with an operator
  • The new python operator is defined as dependent upon t3's execution (downstream) - explicitly.

Note: If you run a function marked with the @task decorator without a DAG context, and without using the dbnd library to run it - it will execute absolutely normally!

Using this method to pass arguments between tasks not only improves developer user-experience, but also allows for pipeline execution support for many use-cases. It does not break currently existing DAGs.

Using dbnd_config

Let's look at the example again, but change the default_args defined at the very top:

default_args = {
    "owner": "airflow",
    "depends_on_past": False,
    "start_date": days_ago(2),
    "retries": 1,
    "retry_delay": timedelta(minutes=5),
    'dbnd_config': {
        "my_task.p_int_with_default": 4
    }
}

Added a new key-value pair to the arguments called dbnd_config

dbnd_config is expected to define a dictionary of configuration settings that you can pass to your tasks. For example, the dbnd_config in this code section defines that the int parameter p_int_with_default passed to my_task will be overridden and changed to 4 from the default value 0.

To see further possibilities of changing configuration settings, see our documentation

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbnd-airflow-operator-0.27.9.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbnd_airflow_operator-0.27.9-py2.py3-none-any.whl (23.0 kB view details)

Uploaded Python 2Python 3

File details

Details for the file dbnd-airflow-operator-0.27.9.tar.gz.

File metadata

  • Download URL: dbnd-airflow-operator-0.27.9.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.3

File hashes

Hashes for dbnd-airflow-operator-0.27.9.tar.gz
Algorithm Hash digest
SHA256 f7862d6cc5b3c4fe111eff2dc982adfa19cbe20f281b8d8b8e843b700325f758
MD5 8bc837fb61c912460fc064295e719d83
BLAKE2b-256 81bf5a6a7b607dfe3a0e2d17a651028a794fc0b115bba2d4359e5241748ceb22

See more details on using hashes here.

File details

Details for the file dbnd_airflow_operator-0.27.9-py2.py3-none-any.whl.

File metadata

  • Download URL: dbnd_airflow_operator-0.27.9-py2.py3-none-any.whl
  • Upload date:
  • Size: 23.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.3

File hashes

Hashes for dbnd_airflow_operator-0.27.9-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 c35d206eddf763579250ca661fd3eb27c26afc02502d6806a6d706837b6623a1
MD5 4b96171e09e73b6e5f439e163c36c46b
BLAKE2b-256 1468c4ea2225ebe9424ff95437f828b86b9fe6309e0ad0caf3d8c25ca6c5bae0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page