Skip to main content

Datahub Airflow plugin to capture executions and send to Datahub

Project description

Datahub Airflow Plugin

See the DataHub Airflow docs for details.

Version Compatibility

The plugin supports Apache Airflow 3.0+. Airflow 2.x is not supported — pin acryl-datahub-airflow-plugin <= 1.6.0 (the last release with Airflow 2 support) if you need to integrate with Airflow 2.

Airflow Version Status Notes
2.x ❌ Unsupported Use version <= 1.6.0
3.0+ ✅ Fully Supported

Installation

pip install acryl-datahub-airflow-plugin

This installs:

  • acryl-datahub[sql-parser,datahub-rest] — DataHub SDK with SQL parsing and REST emitter
  • pydantic>=2.4.0
  • apache-airflow>=3.0.0,<4.0.0
  • apache-airflow-providers-openlineage>=2.1.0

Optional extras

pip install 'acryl-datahub-airflow-plugin[datahub-kafka]'   # Kafka emitter
pip install 'acryl-datahub-airflow-plugin[datahub-file]'    # File emitter (testing)

Configuration

The plugin can be configured via airflow.cfg under the [datahub] section. Below are the key configuration options:

Extractor Patching (OpenLineage Enhancements)

When enable_extractors=True (default), the DataHub plugin enhances OpenLineage extractors to provide better lineage. You can fine-tune these enhancements:

[datahub]
# Enable/disable all OpenLineage extractors
enable_extractors = True  # Default: True

# Enable multi-statement SQL parsing (resolves temp tables, merges lineage)
enable_multi_statement_sql_parsing = False  # Default: False

# Patch SQLParser to use DataHub's advanced SQL parser (enables column-level lineage)
patch_sql_parser = True  # Default: True

# Use DataHub's enhancements for specific operators
extract_athena_operator = True              # Default: True
extract_bigquery_insert_job_operator = True # Default: True
extract_teradata_operator = True            # Default: True

Multi-Statement SQL Parsing:

When enable_multi_statement_sql_parsing=True, if a task executes multiple SQL statements (e.g., CREATE TEMP TABLE ...; INSERT ... FROM temp_table;), DataHub parses all statements together and resolves temporary table dependencies within that task. By default (False), only the first statement is parsed.

How patches work:

The DataHub plugin monkey-patches OpenLineage extractors at runtime:

  • patch_sql_parser=True patches SQLParser.generate_openlineage_metadata_from_sql() to use DataHub's parser, enabling more accurate lineage and column-level lineage.
  • extract_athena_operator / extract_bigquery_insert_job_operator / extract_teradata_operator patch the corresponding operator's get_openlineage_facets_on_complete() method with DataHub's enhanced implementation.

Example: disable DataHub's SQL parser

[datahub]
enable_extractors = True
patch_sql_parser = False

Other Configuration Options

For a complete list of configuration options, see the DataHub Airflow documentation.

Developing

See the developing docs.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

acryl_datahub_airflow_plugin-1.6.0.1rc2.tar.gz (52.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file acryl_datahub_airflow_plugin-1.6.0.1rc2.tar.gz.

File metadata

File hashes

Hashes for acryl_datahub_airflow_plugin-1.6.0.1rc2.tar.gz
Algorithm Hash digest
SHA256 b043c906bc38ca2875dc7c3e4128ecd45d5c6a4205bbf8ae3f7b717c3541aec1
MD5 360491dea4bfd4f63607812c668b43d2
BLAKE2b-256 4bbec3c614765fd032850a66b531bbcb6bae94e280c9142a5650c2134e182cc4

See more details on using hashes here.

File details

Details for the file acryl_datahub_airflow_plugin-1.6.0.1rc2-py3-none-any.whl.

File metadata

File hashes

Hashes for acryl_datahub_airflow_plugin-1.6.0.1rc2-py3-none-any.whl
Algorithm Hash digest
SHA256 ac5b592c7836dc9a1c99d70a819798ceef5ed72ba642ce32403115f7af00c1c2
MD5 af9ef1e6719e39acc045f702a42055c4
BLAKE2b-256 ed9df542ed3b2f84d39a14317b4fd6f7e962e555eb8b33f159566f559e1b7cca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page