Skip to main content

DataHub Prefect plugin — automatically capture flow lineage and run metadata from Prefect workflows into your DataHub catalog

Project description

DataHub Prefect Plugin

Automatic lineage and run metadata from Prefect into DataHub — captures flow structure, task inputs/outputs, and run history with minimal setup.

What you can do

  • Emit flow and task metadata to DataHub as pipeline runs
  • Capture dataset lineage — declare inputs and outputs per task and see them in DataHub
  • Configure via Prefect blocks — store your DataHub connection settings as a reusable block
  • Works with any DataHub deployment — self-hosted or DataHub Cloud

Installation

pip install prefect-datahub

Quickstart

1. Save your DataHub connection as a Prefect block

from prefect_datahub.datahub_emitter import DatahubEmitter

DatahubEmitter(
    datahub_rest_url="http://localhost:8080",
    env="PROD",
).save("my-datahub")

2. Use it in your flows

from prefect import flow, task
from prefect_datahub.datahub_emitter import DatahubEmitter
from prefect_datahub.entities import Dataset

emitter = DatahubEmitter.load("my-datahub")

@task
def transform(data, emitter):
    emitter.add_task(
        inputs=[Dataset("snowflake", "mydb.schema.source_table")],
        outputs=[Dataset("snowflake", "mydb.schema.output_table")],
    )
    return data

@flow
def my_pipeline():
    data = extract()
    transform(data, emitter)
    emitter.emit_flow()   # required — emits all metadata at the end

Configuration options

Option Default Description
datahub_rest_url http://localhost:8080 DataHub GMS URL
env PROD Environment tag for assets
platform_instance None Platform instance for assets
token None Auth token (if GMS auth is enabled)

Links

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prefect_datahub-1.6.0.10rc2.tar.gz (13.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prefect_datahub-1.6.0.10rc2-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file prefect_datahub-1.6.0.10rc2.tar.gz.

File metadata

  • Download URL: prefect_datahub-1.6.0.10rc2.tar.gz
  • Upload date:
  • Size: 13.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for prefect_datahub-1.6.0.10rc2.tar.gz
Algorithm Hash digest
SHA256 00522ad9bb61ae0e6acbf78a914c0a259ec0f480c6cb9758975e6357ab735f35
MD5 26b0e9885a87fa6ad4994d20d8bec468
BLAKE2b-256 955550cb4bddced3fdf69871259cab7b251e600d9c66e89203ee5251f128cccf

See more details on using hashes here.

File details

Details for the file prefect_datahub-1.6.0.10rc2-py3-none-any.whl.

File metadata

File hashes

Hashes for prefect_datahub-1.6.0.10rc2-py3-none-any.whl
Algorithm Hash digest
SHA256 148224841bc9a3a28f8d9d46774cb309be78de6854ab311f97eca8469c142fd1
MD5 0dc26712b8a5c102383a5912e5ef015c
BLAKE2b-256 a5378e50a782d7acb89942905c81dd9627b4a1fd3eb95399e6c87bfe6a1ca6b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page