Skip to main content

Datahub Airflow plugin to capture executions and send to Datahub

Project description

Datahub Airflow Plugin

Capabilities

DataHub supports integration of

  • Airflow Pipeline (DAG) metadata
  • DAG and Task run information
  • Lineage information when present

Installation

  1. You need to install the required dependency in your airflow.
  pip install acryl-datahub-airflow-plugin

::: note

We recommend you use the lineage plugin if you are on Airflow version >= 2.0.2 or on MWAA with an Airflow version >= 2.0.2 :::

  1. Disable lazy plugin load in your airflow.cfg
core.lazy_load_plugins : False
  1. You must configure an Airflow hook for Datahub. We support both a Datahub REST hook and a Kafka-based hook, but you only need one.

    # For REST-based:
    airflow connections add  --conn-type 'datahub_rest' 'datahub_rest_default' --conn-host 'http://localhost:8080'
    # For Kafka-based (standard Kafka sink config can be passed via extras):
    airflow connections add  --conn-type 'datahub_kafka' 'datahub_kafka_default' --conn-host 'broker:9092' --conn-extra '{}'
    
  2. Add your datahub_conn_id and/or cluster to your airflow.cfg file if it is not align with the default values. See configuration parameters below

    Configuration options:

    Name Default value Description
    datahub.datahub_conn_id datahub_rest_deafault The name of the datahub connection you set in step 1.
    datahub.cluster prod name of the airflow cluster
    datahub.capture_ownership_info true If true, the owners field of the DAG will be capture as a DataHub corpuser.
    datahub.capture_tags_info true If true, the tags field of the DAG will be captured as DataHub tags.
    datahub.graceful_exceptions true If set to true, most runtime errors in the lineage backend will be suppressed and will not cause the overall task to fail. Note that configuration issues will still throw exceptions.
  3. Configure inlets and outlets for your Airflow operators. For reference, look at the sample DAG in lineage_backend_demo.py, or reference lineage_backend_taskflow_demo.py if you're using the TaskFlow API.

  4. [optional] Learn more about Airflow lineage, including shorthand notation and some automation.

How to validate installation

  1. Go and check in Airflow at Admin -> Plugins menu if you can see the Datahub plugin
  2. Run an Airflow DAG and you should see in the task logs Datahub releated log messages like:
Emitting Datahub ...

Additional references

Related Datahub videos: Airflow Lineage Airflow Run History in DataHub

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

acryl-datahub-airflow-plugin-0.8.43.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

acryl_datahub_airflow_plugin-0.8.43-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file acryl-datahub-airflow-plugin-0.8.43.tar.gz.

File metadata

File hashes

Hashes for acryl-datahub-airflow-plugin-0.8.43.tar.gz
Algorithm Hash digest
SHA256 3bc5c93b7cba9852af0a32f1e6c005ab69d65daf024e01d7417daa35aba83f7c
MD5 1611687b75053c8f3509a405c636ea6f
BLAKE2b-256 b43de16082f9ad7863a01d114205e5f01dd1064be43c12991e5fe62aee2e498a

See more details on using hashes here.

File details

Details for the file acryl_datahub_airflow_plugin-0.8.43-py3-none-any.whl.

File metadata

File hashes

Hashes for acryl_datahub_airflow_plugin-0.8.43-py3-none-any.whl
Algorithm Hash digest
SHA256 5b429ec81c452c5f9d4c0113f3a82e27d93de28a97da864c6fa281568df5b80d
MD5 db653140a1e05b410878775cccb36c2d
BLAKE2b-256 715f5e0ff3f5ebe5cf9fcf02ade74f91eac805a7ec6da7d871052f14293fa460

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page