Skip to main content

ODD integration with Airflow

Project description

PyPI version

Open Data Discovery Airflow 2 Integrator

Airflow plugin which tracks DAGs, tasks, tasks runs and sends them to the platform since DAG is run via Airflow Listeners

Requirements

  • Python >= 3.9
  • Airflow >= 2.5.1
  • Presence of an HTTP Connection with the name 'odd'. That connection must have a host property with yours platforms host(fill a port property if required) and a password field with platform collectors token. This connection MUST be represented before your scheduler is in run, we recommend using AWS Param store, Azure KV or similar backends.

Installation

The package must be installed alongside Airflow

poetry add odd-airflow2-integration
# or
pip install odd-airflow2-integration

Lineage

To build a proper lineage for tasks we need somehow to deliver the information about what are the inputs and outputs for each task. So we decided to follow the old Airflow concepts for lineage creation and use the inlets and outlets attributes.

So inlets/outlets attributes are being used to list Datasets' ODDRNs that are considered to be the inputs/outputs for the task.

Example of defining inlets and outlets using TaskFlow:

@task(
    task_id="task_2",
    inlets=["//airflow/internal_host/dags/test_dag/tasks/task_1", ],
    outlets=["//airflow/internal_host/dags/test_dag/tasks/task_3", ]
)
def transform(data_dict: dict):
    pass

task_2 = transform()

Example using Operators:

task_2 = PythonOperator(
    task_id="task_2",
    python_callable=transform,
    inlets=["//airflow/internal_host/dags/test_dag/tasks/task_1", ],
    outlets=["//airflow/internal_host/dags/test_dag/tasks/task_3", ]
)

Also it is worth to mention that neither inlets nor outlets can not be templated using the template_fields of Operators that have this option. More information about this topic is presented in the comment section for the following issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

odd_airflow2_integration-0.0.8.tar.gz (8.4 kB view hashes)

Uploaded Source

Built Distribution

odd_airflow2_integration-0.0.8-py3-none-any.whl (13.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page