Skip to main content

Datahub Airflow plugin to capture executions and send to Datahub

Project description

Datahub Airflow Plugin

See the DataHub Airflow docs for details.

Version Compatibility

The plugin supports Apache Airflow 3.0+. Airflow 2.x is not supported — pin acryl-datahub-airflow-plugin <= 1.6.0 (the last release with Airflow 2 support) if you need to integrate with Airflow 2.

Airflow Version Status Notes
2.x ❌ Unsupported Use version <= 1.6.0
3.0+ ✅ Fully Supported

Installation

pip install acryl-datahub-airflow-plugin

This installs:

  • acryl-datahub[sql-parser,datahub-rest] — DataHub SDK with SQL parsing and REST emitter
  • pydantic>=2.4.0
  • apache-airflow>=3.0.0,<4.0.0
  • apache-airflow-providers-openlineage>=2.1.0

Optional extras

pip install 'acryl-datahub-airflow-plugin[datahub-kafka]'   # Kafka emitter
pip install 'acryl-datahub-airflow-plugin[datahub-file]'    # File emitter (testing)

Configuration

The plugin can be configured via airflow.cfg under the [datahub] section. Below are the key configuration options:

Extractor Patching (OpenLineage Enhancements)

When enable_extractors=True (default), the DataHub plugin enhances OpenLineage extractors to provide better lineage. You can fine-tune these enhancements:

[datahub]
# Enable/disable all OpenLineage extractors
enable_extractors = True  # Default: True

# Enable multi-statement SQL parsing (resolves temp tables, merges lineage)
enable_multi_statement_sql_parsing = False  # Default: False

# Patch SQLParser to use DataHub's advanced SQL parser (enables column-level lineage)
patch_sql_parser = True  # Default: True

# Use DataHub's enhancements for specific operators
extract_athena_operator = True              # Default: True
extract_bigquery_insert_job_operator = True # Default: True
extract_teradata_operator = True            # Default: True

Multi-Statement SQL Parsing:

When enable_multi_statement_sql_parsing=True, if a task executes multiple SQL statements (e.g., CREATE TEMP TABLE ...; INSERT ... FROM temp_table;), DataHub parses all statements together and resolves temporary table dependencies within that task. By default (False), only the first statement is parsed.

How patches work:

The DataHub plugin monkey-patches OpenLineage extractors at runtime:

  • patch_sql_parser=True patches SQLParser.generate_openlineage_metadata_from_sql() to use DataHub's parser, enabling more accurate lineage and column-level lineage.
  • extract_athena_operator / extract_bigquery_insert_job_operator / extract_teradata_operator patch the corresponding operator's get_openlineage_facets_on_complete() method with DataHub's enhanced implementation.

Example: disable DataHub's SQL parser

[datahub]
enable_extractors = True
patch_sql_parser = False

Other Configuration Options

For a complete list of configuration options, see the DataHub Airflow documentation.

Developing

See the developing docs.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

acryl_datahub_airflow_plugin-1.7.0.1rc1.tar.gz (52.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file acryl_datahub_airflow_plugin-1.7.0.1rc1.tar.gz.

File metadata

File hashes

Hashes for acryl_datahub_airflow_plugin-1.7.0.1rc1.tar.gz
Algorithm Hash digest
SHA256 a6257010241b0442e6cb56d5a5cb75f61ee32c5fded44341cc5ba3c123ed77ab
MD5 556d6fa86695cf22d7652ccbc125f721
BLAKE2b-256 f6c3f819d6b252f495e2ea1191ef727c3ed2d3c84da082dc4929c24ef4418cfa

See more details on using hashes here.

File details

Details for the file acryl_datahub_airflow_plugin-1.7.0.1rc1-py3-none-any.whl.

File metadata

File hashes

Hashes for acryl_datahub_airflow_plugin-1.7.0.1rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 261df75e605445c1ca1f6555c7d6d98936388b4a34c045696b3cca24ee52b984
MD5 55947efb13a50c4ca79217b9ea444f81
BLAKE2b-256 829a17ef70a76717a57459e9b4336a49057f18c43bd82db474a41b1c26130605

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page