Datahub Airflow plugin to capture executions and send to Datahub
Project description
Datahub Airflow Plugin
Capabilities
DataHub supports integration of
- Airflow Pipeline (DAG) metadata
- DAG and Task run information
- Lineage information when present
Installation
- You need to install the required dependency in your airflow.
pip install acryl-datahub-airflow-plugin
::: note
We recommend you use the lineage plugin if you are on Airflow version >= 2.0.2 or on MWAA with an Airflow version >= 2.0.2 :::
- Disable lazy plugin load in your airflow.cfg
core.lazy_load_plugins : False
-
You must configure an Airflow hook for Datahub. We support both a Datahub REST hook and a Kafka-based hook, but you only need one.
# For REST-based: airflow connections add --conn-type 'datahub_rest' 'datahub_rest_default' --conn-host 'http://localhost:8080' # For Kafka-based (standard Kafka sink config can be passed via extras): airflow connections add --conn-type 'datahub_kafka' 'datahub_kafka_default' --conn-host 'broker:9092' --conn-extra '{}'
-
Add your
datahub_conn_id
and/orcluster
to yourairflow.cfg
file if it is not align with the default values. See configuration parameters belowConfiguration options:
Name Default value Description datahub.datahub_conn_id datahub_rest_deafault The name of the datahub connection you set in step 1. datahub.cluster prod name of the airflow cluster datahub.capture_ownership_info true If true, the owners field of the DAG will be capture as a DataHub corpuser. datahub.capture_tags_info true If true, the tags field of the DAG will be captured as DataHub tags. datahub.graceful_exceptions true If set to true, most runtime errors in the lineage backend will be suppressed and will not cause the overall task to fail. Note that configuration issues will still throw exceptions. -
Configure
inlets
andoutlets
for your Airflow operators. For reference, look at the sample DAG inlineage_backend_demo.py
, or referencelineage_backend_taskflow_demo.py
if you're using the TaskFlow API. -
[optional] Learn more about Airflow lineage, including shorthand notation and some automation.
How to validate installation
- Go and check in Airflow at Admin -> Plugins menu if you can see the Datahub plugin
- Run an Airflow DAG and you should see in the task logs Datahub releated log messages like:
Emitting Datahub ...
Additional references
Related Datahub videos: Airflow Lineage Airflow Run History in DataHub
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file acryl-datahub-airflow-plugin-0.8.44.1rc4.tar.gz
.
File metadata
- Download URL: acryl-datahub-airflow-plugin-0.8.44.1rc4.tar.gz
- Upload date:
- Size: 7.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf4ff0a8ff50d04315e16984ee3bb785b4b56efa7f79fbba5a4dd35f1f0b71d4 |
|
MD5 | 53a8c673b459317df87fa05f461c4e08 |
|
BLAKE2b-256 | 08f62f033bc3f7b07090044f75e926c41d8ffd75c47957ebed7f318f74a9d9b0 |
File details
Details for the file acryl_datahub_airflow_plugin-0.8.44.1rc4-py3-none-any.whl
.
File metadata
- Download URL: acryl_datahub_airflow_plugin-0.8.44.1rc4-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b9995774a01cffda2278220544a164623c43084c63dda7ee2343ebe66509166 |
|
MD5 | 971888eb8cd609c22bd17b688f029244 |
|
BLAKE2b-256 | c69815c812f0a805968f18c8cec6333d3754f43db455f06da3dc7fed2f61991b |