Skip to main content

Datahub Airflow plugin to capture executions and send to Datahub

Project description

Datahub Airflow Plugin

See the DataHub Airflow docs for details.

Version Compatibility

The plugin supports Apache Airflow 3.0+. Airflow 2.x is not supported — pin acryl-datahub-airflow-plugin <= 1.6.0 (the last release with Airflow 2 support) if you need to integrate with Airflow 2.

Airflow Version Status Notes
2.x ❌ Unsupported Use version <= 1.6.0
3.0+ ✅ Fully Supported

Installation

pip install acryl-datahub-airflow-plugin

This installs:

  • acryl-datahub[sql-parser,datahub-rest] — DataHub SDK with SQL parsing and REST emitter
  • pydantic>=2.4.0
  • apache-airflow>=3.0.0,<4.0.0
  • apache-airflow-providers-openlineage>=2.1.0

Optional extras

pip install 'acryl-datahub-airflow-plugin[datahub-kafka]'   # Kafka emitter
pip install 'acryl-datahub-airflow-plugin[datahub-file]'    # File emitter (testing)

Configuration

The plugin can be configured via airflow.cfg under the [datahub] section. Below are the key configuration options:

Extractor Patching (OpenLineage Enhancements)

When enable_extractors=True (default), the DataHub plugin enhances OpenLineage extractors to provide better lineage. You can fine-tune these enhancements:

[datahub]
# Enable/disable all OpenLineage extractors
enable_extractors = True  # Default: True

# Enable multi-statement SQL parsing (resolves temp tables, merges lineage)
enable_multi_statement_sql_parsing = False  # Default: False

# Patch SQLParser to use DataHub's advanced SQL parser (enables column-level lineage)
patch_sql_parser = True  # Default: True

# Use DataHub's enhancements for specific operators
extract_athena_operator = True              # Default: True
extract_bigquery_insert_job_operator = True # Default: True
extract_teradata_operator = True            # Default: True

Multi-Statement SQL Parsing:

When enable_multi_statement_sql_parsing=True, if a task executes multiple SQL statements (e.g., CREATE TEMP TABLE ...; INSERT ... FROM temp_table;), DataHub parses all statements together and resolves temporary table dependencies within that task. By default (False), only the first statement is parsed.

How patches work:

The DataHub plugin monkey-patches OpenLineage extractors at runtime:

  • patch_sql_parser=True patches SQLParser.generate_openlineage_metadata_from_sql() to use DataHub's parser, enabling more accurate lineage and column-level lineage.
  • extract_athena_operator / extract_bigquery_insert_job_operator / extract_teradata_operator patch the corresponding operator's get_openlineage_facets_on_complete() method with DataHub's enhanced implementation.

Example: disable DataHub's SQL parser

[datahub]
enable_extractors = True
patch_sql_parser = False

Other Configuration Options

For a complete list of configuration options, see the DataHub Airflow documentation.

Developing

See the developing docs.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

acryl_datahub_airflow_plugin-1.6.0.1rc1.tar.gz (52.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file acryl_datahub_airflow_plugin-1.6.0.1rc1.tar.gz.

File metadata

File hashes

Hashes for acryl_datahub_airflow_plugin-1.6.0.1rc1.tar.gz
Algorithm Hash digest
SHA256 b725f2523dc4b5c19ea78ea6d09619de29f08ca74de7d2eabd23f81df0d6b0a0
MD5 b359f3eb15e380e70bd542f16832359f
BLAKE2b-256 aa6aed275d0c6895bdb8b8d6077d889945497157c9ef0768d11b5084080cccd2

See more details on using hashes here.

File details

Details for the file acryl_datahub_airflow_plugin-1.6.0.1rc1-py3-none-any.whl.

File metadata

File hashes

Hashes for acryl_datahub_airflow_plugin-1.6.0.1rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 817b0130c66a6dd6d09c17d3250f2f3d50edd523307006515466ccf31d8c260e
MD5 43cd71277d4a0746a4f6962830360017
BLAKE2b-256 d3f3c77a9a59476bee459c9860089d2ce4c9f291ba0afd4133c02d78738c81c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page