Skip to main content

DataHub Prefect plugin — automatically capture flow lineage and run metadata from Prefect workflows into your DataHub catalog

Project description

DataHub Prefect Plugin

Automatic lineage and run metadata from Prefect into DataHub — captures flow structure, task inputs/outputs, and run history with minimal setup.

What you can do

  • Emit flow and task metadata to DataHub as pipeline runs
  • Capture dataset lineage — declare inputs and outputs per task and see them in DataHub
  • Configure via Prefect blocks — store your DataHub connection settings as a reusable block
  • Works with any DataHub deployment — self-hosted or DataHub Cloud

Installation

pip install prefect-datahub

Quickstart

1. Save your DataHub connection as a Prefect block

from prefect_datahub.datahub_emitter import DatahubEmitter

DatahubEmitter(
    datahub_rest_url="http://localhost:8080",
    env="PROD",
).save("my-datahub")

2. Use it in your flows

from prefect import flow, task
from prefect_datahub.datahub_emitter import DatahubEmitter
from prefect_datahub.entities import Dataset

emitter = DatahubEmitter.load("my-datahub")

@task
def transform(data, emitter):
    emitter.add_task(
        inputs=[Dataset("snowflake", "mydb.schema.source_table")],
        outputs=[Dataset("snowflake", "mydb.schema.output_table")],
    )
    return data

@flow
def my_pipeline():
    data = extract()
    transform(data, emitter)
    emitter.emit_flow()   # required — emits all metadata at the end

Configuration options

Option Default Description
datahub_rest_url http://localhost:8080 DataHub GMS URL
env PROD Environment tag for assets
platform_instance None Platform instance for assets
token None Auth token (if GMS auth is enabled)

Links

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prefect_datahub-1.6.0.10rc1.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prefect_datahub-1.6.0.10rc1-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file prefect_datahub-1.6.0.10rc1.tar.gz.

File metadata

  • Download URL: prefect_datahub-1.6.0.10rc1.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for prefect_datahub-1.6.0.10rc1.tar.gz
Algorithm Hash digest
SHA256 d111211d58b6dc4318c2ab2239b019b7158fe1fe14bbcc94adedb778b75d28d9
MD5 eedc365d7cc382eabb3003949954e8fc
BLAKE2b-256 d4858fbf727be410a7427394cf28823f84c19f277d106c39f5328ebc9e0269c2

See more details on using hashes here.

File details

Details for the file prefect_datahub-1.6.0.10rc1-py3-none-any.whl.

File metadata

File hashes

Hashes for prefect_datahub-1.6.0.10rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 db6fa3c66c3e04138a7ca0e96b2dcd77cd3a723798b4ba5bb3bbf93e963dc03b
MD5 38cf3388084ca4ec28df8934885432cc
BLAKE2b-256 aadd52f35dbc2d093f9c0819eea4bfe888cc120cf1bb5f10ad7f387b2dae33fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page