Skip to main content

Datahub Airflow plugin to capture executions and send to Datahub

Project description

Datahub Airflow Plugin

See the DataHub Airflow docs for details.

Version Compatibility

The plugin supports Apache Airflow 3.0+. Airflow 2.x is not supported — pin acryl-datahub-airflow-plugin <= 1.6.0 (the last release with Airflow 2 support) if you need to integrate with Airflow 2.

Airflow Version Status Notes
2.x ❌ Unsupported Use version <= 1.6.0
3.0+ ✅ Fully Supported

Installation

pip install acryl-datahub-airflow-plugin

This installs:

  • acryl-datahub[sql-parser,datahub-rest] — DataHub SDK with SQL parsing and REST emitter
  • pydantic>=2.4.0
  • apache-airflow>=3.0.0,<4.0.0
  • apache-airflow-providers-openlineage>=2.1.0

Optional extras

pip install 'acryl-datahub-airflow-plugin[datahub-kafka]'   # Kafka emitter
pip install 'acryl-datahub-airflow-plugin[datahub-file]'    # File emitter (testing)

Configuration

The plugin can be configured via airflow.cfg under the [datahub] section. Below are the key configuration options:

Extractor Patching (OpenLineage Enhancements)

When enable_extractors=True (default), the DataHub plugin enhances OpenLineage extractors to provide better lineage. You can fine-tune these enhancements:

[datahub]
# Enable/disable all OpenLineage extractors
enable_extractors = True  # Default: True

# Enable multi-statement SQL parsing (resolves temp tables, merges lineage)
enable_multi_statement_sql_parsing = False  # Default: False

# Patch SQLParser to use DataHub's advanced SQL parser (enables column-level lineage)
patch_sql_parser = True  # Default: True

# Use DataHub's enhancements for specific operators
extract_athena_operator = True              # Default: True
extract_bigquery_insert_job_operator = True # Default: True
extract_teradata_operator = True            # Default: True

Multi-Statement SQL Parsing:

When enable_multi_statement_sql_parsing=True, if a task executes multiple SQL statements (e.g., CREATE TEMP TABLE ...; INSERT ... FROM temp_table;), DataHub parses all statements together and resolves temporary table dependencies within that task. By default (False), only the first statement is parsed.

How patches work:

The DataHub plugin monkey-patches OpenLineage extractors at runtime:

  • patch_sql_parser=True patches SQLParser.generate_openlineage_metadata_from_sql() to use DataHub's parser, enabling more accurate lineage and column-level lineage.
  • extract_athena_operator / extract_bigquery_insert_job_operator / extract_teradata_operator patch the corresponding operator's get_openlineage_facets_on_complete() method with DataHub's enhanced implementation.

Example: disable DataHub's SQL parser

[datahub]
enable_extractors = True
patch_sql_parser = False

Other Configuration Options

For a complete list of configuration options, see the DataHub Airflow documentation.

Developing

See the developing docs.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

acryl_datahub_airflow_plugin-1.6.0.1.tar.gz (52.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

acryl_datahub_airflow_plugin-1.6.0.1-py3-none-any.whl (70.0 kB view details)

Uploaded Python 3

File details

Details for the file acryl_datahub_airflow_plugin-1.6.0.1.tar.gz.

File metadata

File hashes

Hashes for acryl_datahub_airflow_plugin-1.6.0.1.tar.gz
Algorithm Hash digest
SHA256 cb3a942dc73e1b3922b34b3704376f9a638a74460a661f5c0797deb434b31ad5
MD5 4270e2395b4352a6b2801474b4c998ff
BLAKE2b-256 5d2e31496b4f4a0bb14127f39cca54d55014421901f44a3d838db64863bc8d01

See more details on using hashes here.

File details

Details for the file acryl_datahub_airflow_plugin-1.6.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for acryl_datahub_airflow_plugin-1.6.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 32e16a73cb8823eea6fbc6e475e550f00d087e09079a4206938b3b8c78c16151
MD5 2d549e4b85b38101486c791d395f67e0
BLAKE2b-256 e817e50e1fb8742e83403a9157c44c23e16a96d72f0607d1c9f83ee808bfd626

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page