ODD integration with Airflow
Project description
Open Data Discovery Airflow 2 Integrator
Airflow plugin which tracks DAGs, tasks, tasks runs and sends them to the platform since DAG is run via Airflow Listeners
Requirements
- Python >= 3.9
- Airflow >= 2.5.1
- Presence of an HTTP Connection with the name 'odd'. That connection must have a host property with yours platforms host(fill a port property if required) and a password field with platform collectors token. This connection MUST be represented before your scheduler is in run, we recommend using AWS Param store, Azure KV or similar backends.
Installation
The package must be installed alongside Airflow
poetry add odd-airflow2-integration
# or
pip install odd-airflow2-integration
Lineage
To build a proper lineage for tasks we need somehow to deliver the information
about what are the inputs and outputs for each task. So we decided to follow the
old Airflow concepts for lineage creation and use the inlets and outlets
attributes.
So inlets/outlets attributes are being used to list Datasets' ODDRNs that
are considered to be the inputs/outputs for the task.
Example of defining inlets and outlets using TaskFlow:
@task(
task_id="task_2",
inlets=["//airflow/internal_host/dags/test_dag/tasks/task_1", ],
outlets=["//airflow/internal_host/dags/test_dag/tasks/task_3", ]
)
def transform(data_dict: dict):
pass
task_2 = transform()
Example using Operators:
task_2 = PythonOperator(
task_id="task_2",
python_callable=transform,
inlets=["//airflow/internal_host/dags/test_dag/tasks/task_1", ],
outlets=["//airflow/internal_host/dags/test_dag/tasks/task_3", ]
)
Also it is worth to mention that neither inlets nor outlets can not be
templated using the template_fields of Operators that have this option.
More information about this topic is presented in the comment section for
the following issue.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file odd_airflow2_integration-0.0.8.tar.gz.
File metadata
- Download URL: odd_airflow2_integration-0.0.8.tar.gz
- Upload date:
- Size: 8.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.1 CPython/3.9.16 Linux/6.2.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02496969d28632510b249a2651d445e939ea58de81c079d4a346303e71f75457
|
|
| MD5 |
c992c897977c7a66d39be6b8f4a6f756
|
|
| BLAKE2b-256 |
3d8067703f305690ab31662106d0701feac17c1d327b4eb964218dc451edfd32
|
File details
Details for the file odd_airflow2_integration-0.0.8-py3-none-any.whl.
File metadata
- Download URL: odd_airflow2_integration-0.0.8-py3-none-any.whl
- Upload date:
- Size: 13.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.1 CPython/3.9.16 Linux/6.2.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c8b10ab7b046634879dce060c8a2092c51058a8644c9ba12f861f50ee0ff05c3
|
|
| MD5 |
139388990f6486fc0e2f529fe233b6d4
|
|
| BLAKE2b-256 |
79cd91ab1e7b9c5019b3da8aa3a5d6574911adb53304bcc161637624e5269cc9
|