Skip to main content

ODD integration with Airflow

Project description

PyPI version

Open Data Discovery Airflow 2 Integrator

Airflow plugin which tracks DAGs, tasks, tasks runs and sends them to the platform since DAG is run via Airflow Listeners

Requirements

  • Python >= 3.9
  • Airflow >= 2.5.1
  • Presence of an HTTP Connection with the name 'odd'. That connection must have a host property with yours platforms host(fill a port property if required) and a password field with platform collectors token. This connection MUST be represented before your scheduler is in run, we recommend using AWS Param store, Azure KV or similar backends.

Installation

The package must be installed alongside Airflow

poetry add odd-airflow2-integration
# or
pip install odd-airflow2-integration

Lineage

To build a proper lineage for tasks we need somehow to deliver the information about what are the inputs and outputs for each task. So we decided to follow the old Airflow concepts for lineage creation and use the inlets and outlets attributes.

So inlets/outlets attributes are being used to list Datasets' ODDRNs that are considered to be the inputs/outputs for the task.

Example of defining inlets and outlets using TaskFlow:

@task(
    task_id="task_2",
    inlets=["//airflow/internal_host/dags/test_dag/tasks/task_1", ],
    outlets=["//airflow/internal_host/dags/test_dag/tasks/task_3", ]
)
def transform(data_dict: dict):
    pass

task_2 = transform()

Example using Operators:

task_2 = PythonOperator(
    task_id="task_2",
    python_callable=transform,
    inlets=["//airflow/internal_host/dags/test_dag/tasks/task_1", ],
    outlets=["//airflow/internal_host/dags/test_dag/tasks/task_3", ]
)

Also it is worth to mention that neither inlets nor outlets can not be templated using the template_fields of Operators that have this option. More information about this topic is presented in the comment section for the following issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

odd_airflow2_integration-0.0.8.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

odd_airflow2_integration-0.0.8-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file odd_airflow2_integration-0.0.8.tar.gz.

File metadata

  • Download URL: odd_airflow2_integration-0.0.8.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.1 CPython/3.9.16 Linux/6.2.0-1018-azure

File hashes

Hashes for odd_airflow2_integration-0.0.8.tar.gz
Algorithm Hash digest
SHA256 02496969d28632510b249a2651d445e939ea58de81c079d4a346303e71f75457
MD5 c992c897977c7a66d39be6b8f4a6f756
BLAKE2b-256 3d8067703f305690ab31662106d0701feac17c1d327b4eb964218dc451edfd32

See more details on using hashes here.

File details

Details for the file odd_airflow2_integration-0.0.8-py3-none-any.whl.

File metadata

File hashes

Hashes for odd_airflow2_integration-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 c8b10ab7b046634879dce060c8a2092c51058a8644c9ba12f861f50ee0ff05c3
MD5 139388990f6486fc0e2f529fe233b6d4
BLAKE2b-256 79cd91ab1e7b9c5019b3da8aa3a5d6574911adb53304bcc161637624e5269cc9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page