Skip to main content

DataHub Prefect plugin — automatically capture flow lineage and run metadata from Prefect workflows into your DataHub catalog

Project description

DataHub Prefect Plugin

Automatic lineage and run metadata from Prefect into DataHub — captures flow structure, task inputs/outputs, and run history with minimal setup.

What you can do

  • Emit flow and task metadata to DataHub as pipeline runs
  • Capture dataset lineage — declare inputs and outputs per task and see them in DataHub
  • Configure via Prefect blocks — store your DataHub connection settings as a reusable block
  • Works with any DataHub deployment — self-hosted or DataHub Cloud

Installation

pip install prefect-datahub

Quickstart

1. Save your DataHub connection as a Prefect block

from prefect_datahub.datahub_emitter import DatahubEmitter

DatahubEmitter(
    datahub_rest_url="http://localhost:8080",
    env="PROD",
).save("my-datahub")

2. Use it in your flows

from prefect import flow, task
from prefect_datahub.datahub_emitter import DatahubEmitter
from prefect_datahub.entities import Dataset

emitter = DatahubEmitter.load("my-datahub")

@task
def transform(data, emitter):
    emitter.add_task(
        inputs=[Dataset("snowflake", "mydb.schema.source_table")],
        outputs=[Dataset("snowflake", "mydb.schema.output_table")],
    )
    return data

@flow
def my_pipeline():
    data = extract()
    transform(data, emitter)
    emitter.emit_flow()   # required — emits all metadata at the end

Configuration options

Option Default Description
datahub_rest_url http://localhost:8080 DataHub GMS URL
env PROD Environment tag for assets
platform_instance None Platform instance for assets
token None Auth token (if GMS auth is enabled)

Links

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prefect_datahub-1.6.0.10rc3.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prefect_datahub-1.6.0.10rc3-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file prefect_datahub-1.6.0.10rc3.tar.gz.

File metadata

  • Download URL: prefect_datahub-1.6.0.10rc3.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for prefect_datahub-1.6.0.10rc3.tar.gz
Algorithm Hash digest
SHA256 81d9fad2a530825a85c94d03dedbd44f63ce08761e773a7483f0da776210d22a
MD5 26babb2974db67e11a5d7ace480012e2
BLAKE2b-256 425a687b96da68ea2b515d312e7009ba509da0bb6dd80bdeb5ef30611de24355

See more details on using hashes here.

File details

Details for the file prefect_datahub-1.6.0.10rc3-py3-none-any.whl.

File metadata

File hashes

Hashes for prefect_datahub-1.6.0.10rc3-py3-none-any.whl
Algorithm Hash digest
SHA256 242b9853c596892c571a3a7884ac4f7e8b53ce7ca13777d16d6584f5532bab37
MD5 d013cebad7f8edec5dadc66d9d6cf9f2
BLAKE2b-256 b43f6a2f88de2a2845055d10de53ffa4193b6e2d5fce10b5e9123d102260e14f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page