Skip to main content

DataHub Prefect plugin — automatically capture flow lineage and run metadata from Prefect workflows into your DataHub catalog

Project description

DataHub Prefect Plugin

Automatic lineage and run metadata from Prefect into DataHub — captures flow structure, task inputs/outputs, and run history with minimal setup.

What you can do

  • Emit flow and task metadata to DataHub as pipeline runs
  • Capture dataset lineage — declare inputs and outputs per task and see them in DataHub
  • Configure via Prefect blocks — store your DataHub connection settings as a reusable block
  • Works with any DataHub deployment — self-hosted or DataHub Cloud

Installation

pip install prefect-datahub

Quickstart

1. Save your DataHub connection as a Prefect block

from prefect_datahub.datahub_emitter import DatahubEmitter

DatahubEmitter(
    datahub_rest_url="http://localhost:8080",
    env="PROD",
).save("my-datahub")

2. Use it in your flows

from prefect import flow, task
from prefect_datahub.datahub_emitter import DatahubEmitter
from prefect_datahub.entities import Dataset

emitter = DatahubEmitter.load("my-datahub")

@task
def transform(data, emitter):
    emitter.add_task(
        inputs=[Dataset("snowflake", "mydb.schema.source_table")],
        outputs=[Dataset("snowflake", "mydb.schema.output_table")],
    )
    return data

@flow
def my_pipeline():
    data = extract()
    transform(data, emitter)
    emitter.emit_flow()   # required — emits all metadata at the end

Configuration options

Option Default Description
datahub_rest_url http://localhost:8080 DataHub GMS URL
env PROD Environment tag for assets
platform_instance None Platform instance for assets
token None Auth token (if GMS auth is enabled)

Links

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prefect_datahub-1.6.0.10rc4.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prefect_datahub-1.6.0.10rc4-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file prefect_datahub-1.6.0.10rc4.tar.gz.

File metadata

  • Download URL: prefect_datahub-1.6.0.10rc4.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for prefect_datahub-1.6.0.10rc4.tar.gz
Algorithm Hash digest
SHA256 1dac72b38f9dbcb704a74baf12cd56f2f40bb80dde2b8804c8623bdc2e8b947e
MD5 ee39066ff190c3ed4a39561d778fe102
BLAKE2b-256 a9c3fab240f1f3bcea0cc29b80fd937509f5d2442daa8537459b24064ec99a92

See more details on using hashes here.

File details

Details for the file prefect_datahub-1.6.0.10rc4-py3-none-any.whl.

File metadata

File hashes

Hashes for prefect_datahub-1.6.0.10rc4-py3-none-any.whl
Algorithm Hash digest
SHA256 8645d2e0b0ef857a5e90a08dd3f79461b7a9c944e0a12a0a3e491ab261d62f4e
MD5 84634a6f99dd3049cb8b7efefefd6f99
BLAKE2b-256 64db4f7ac7de91e4200ba808de9ba5cc1239c2e5e3fc40b48ef9c2138f47f40b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page