Datahub Airflow plugin to capture executions and send to Datahub
Project description
Datahub Airflow Plugin
See the DataHub Airflow docs for details.
Version Compatibility
The plugin supports Apache Airflow 3.0+. Airflow 2.x is not supported — pin
acryl-datahub-airflow-plugin <= 1.6.0 (the last release with Airflow 2 support)
if you need to integrate with Airflow 2.
| Airflow Version | Status | Notes |
|---|---|---|
| 2.x | ❌ Unsupported | Use version <= 1.6.0 |
| 3.0+ | ✅ Fully Supported |
Installation
pip install acryl-datahub-airflow-plugin
This installs:
acryl-datahub[sql-parser,datahub-rest]— DataHub SDK with SQL parsing and REST emitterpydantic>=2.4.0apache-airflow>=3.0.0,<4.0.0apache-airflow-providers-openlineage>=2.1.0
Optional extras
pip install 'acryl-datahub-airflow-plugin[datahub-kafka]' # Kafka emitter
pip install 'acryl-datahub-airflow-plugin[datahub-file]' # File emitter (testing)
Configuration
The plugin can be configured via airflow.cfg under the [datahub] section. Below are the key configuration options:
Extractor Patching (OpenLineage Enhancements)
When enable_extractors=True (default), the DataHub plugin enhances OpenLineage extractors to provide better lineage. You can fine-tune these enhancements:
[datahub]
# Enable/disable all OpenLineage extractors
enable_extractors = True # Default: True
# Enable multi-statement SQL parsing (resolves temp tables, merges lineage)
enable_multi_statement_sql_parsing = False # Default: False
# Patch SQLParser to use DataHub's advanced SQL parser (enables column-level lineage)
patch_sql_parser = True # Default: True
# Use DataHub's enhancements for specific operators
extract_athena_operator = True # Default: True
extract_bigquery_insert_job_operator = True # Default: True
extract_teradata_operator = True # Default: True
Multi-Statement SQL Parsing:
When enable_multi_statement_sql_parsing=True, if a task executes multiple SQL statements (e.g., CREATE TEMP TABLE ...; INSERT ... FROM temp_table;), DataHub parses all statements together and resolves temporary table dependencies within that task. By default (False), only the first statement is parsed.
How patches work:
The DataHub plugin monkey-patches OpenLineage extractors at runtime:
patch_sql_parser=TruepatchesSQLParser.generate_openlineage_metadata_from_sql()to use DataHub's parser, enabling more accurate lineage and column-level lineage.extract_athena_operator/extract_bigquery_insert_job_operator/extract_teradata_operatorpatch the corresponding operator'sget_openlineage_facets_on_complete()method with DataHub's enhanced implementation.
Example: disable DataHub's SQL parser
[datahub]
enable_extractors = True
patch_sql_parser = False
Other Configuration Options
For a complete list of configuration options, see the DataHub Airflow documentation.
Developing
See the developing docs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file acryl_datahub_airflow_plugin-1.6.0.1rc1.tar.gz.
File metadata
- Download URL: acryl_datahub_airflow_plugin-1.6.0.1rc1.tar.gz
- Upload date:
- Size: 52.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b725f2523dc4b5c19ea78ea6d09619de29f08ca74de7d2eabd23f81df0d6b0a0
|
|
| MD5 |
b359f3eb15e380e70bd542f16832359f
|
|
| BLAKE2b-256 |
aa6aed275d0c6895bdb8b8d6077d889945497157c9ef0768d11b5084080cccd2
|
File details
Details for the file acryl_datahub_airflow_plugin-1.6.0.1rc1-py3-none-any.whl.
File metadata
- Download URL: acryl_datahub_airflow_plugin-1.6.0.1rc1-py3-none-any.whl
- Upload date:
- Size: 70.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
817b0130c66a6dd6d09c17d3250f2f3d50edd523307006515466ccf31d8c260e
|
|
| MD5 |
43cd71277d4a0746a4f6962830360017
|
|
| BLAKE2b-256 |
d3f3c77a9a59476bee459c9860089d2ce4c9f291ba0afd4133c02d78738c81c5
|