DataHub Prefect plugin — automatically capture flow lineage and run metadata from Prefect workflows into your DataHub catalog
Project description
DataHub Prefect Plugin
Automatic lineage and run metadata from Prefect into DataHub — captures flow structure, task inputs/outputs, and run history with minimal setup.
What you can do
- Emit flow and task metadata to DataHub as pipeline runs
- Capture dataset lineage — declare inputs and outputs per task and see them in DataHub
- Configure via Prefect blocks — store your DataHub connection settings as a reusable block
- Works with any DataHub deployment — self-hosted or DataHub Cloud
Installation
pip install prefect-datahub
Quickstart
1. Save your DataHub connection as a Prefect block
from prefect_datahub.datahub_emitter import DatahubEmitter
DatahubEmitter(
datahub_rest_url="http://localhost:8080",
env="PROD",
).save("my-datahub")
2. Use it in your flows
from prefect import flow, task
from prefect_datahub.datahub_emitter import DatahubEmitter
from prefect_datahub.entities import Dataset
emitter = DatahubEmitter.load("my-datahub")
@task
def transform(data, emitter):
emitter.add_task(
inputs=[Dataset("snowflake", "mydb.schema.source_table")],
outputs=[Dataset("snowflake", "mydb.schema.output_table")],
)
return data
@flow
def my_pipeline():
data = extract()
transform(data, emitter)
emitter.emit_flow() # required — emits all metadata at the end
Configuration options
| Option | Default | Description |
|---|---|---|
datahub_rest_url |
http://localhost:8080 |
DataHub GMS URL |
env |
PROD |
Environment tag for assets |
platform_instance |
None |
Platform instance for assets |
token |
None |
Auth token (if GMS auth is enabled) |
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file prefect_datahub-1.6.0.10rc4.tar.gz.
File metadata
- Download URL: prefect_datahub-1.6.0.10rc4.tar.gz
- Upload date:
- Size: 13.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1dac72b38f9dbcb704a74baf12cd56f2f40bb80dde2b8804c8623bdc2e8b947e
|
|
| MD5 |
ee39066ff190c3ed4a39561d778fe102
|
|
| BLAKE2b-256 |
a9c3fab240f1f3bcea0cc29b80fd937509f5d2442daa8537459b24064ec99a92
|
File details
Details for the file prefect_datahub-1.6.0.10rc4-py3-none-any.whl.
File metadata
- Download URL: prefect_datahub-1.6.0.10rc4-py3-none-any.whl
- Upload date:
- Size: 11.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8645d2e0b0ef857a5e90a08dd3f79461b7a9c944e0a12a0a3e491ab261d62f4e
|
|
| MD5 |
84634a6f99dd3049cb8b7efefefd6f99
|
|
| BLAKE2b-256 |
64db4f7ac7de91e4200ba808de9ba5cc1239c2e5e3fc40b48ef9c2138f47f40b
|