Skip to main content

Datahub prefect block to capture executions and send to Datahub

Reason this release was yanked:

request from sergio gomez

Project description

prefect-datahub

Emit flows & tasks metadata to DataHub REST API with prefect-datahub

PyPI

Introduction

The prefect-datahub collection allows you to easily integrate DataHub's metadata ingestion capabilities into your Prefect workflows. With this collection, you can emit metadata about your flows, tasks, and workspace to DataHub's metadata service.

Features

  • Seamless integration with Prefect workflows
  • Support for ingesting metadata of flows, tasks, and workspaces to DataHub GMS REST API
  • Easy configuration using Prefect blocks

Prerequisites

  • Python 3.10+
  • Prefect 2.0.0+ and < 3.0.0+
  • A running instance of DataHub

Installation

Install prefect-datahub using pip:

pip install prefect-datahub

We recommend using a Python virtual environment manager such as pipenv, conda, or virtualenv.

Getting Started

1. Set up DataHub

Before using prefect-datahub, you need to deploy an instance of DataHub. Follow the instructions on the DataHub Quickstart page to set up DataHub.

After successful deployment, the DataHub GMS service should be running on http://localhost:8080 if deployed locally.

2. Configure DataHub Emitter

Save your DataHub configuration as a Prefect block:

from prefect_datahub.datahub_emitter import DatahubEmitter

datahub_emitter = DatahubEmitter(
    datahub_rest_url="http://localhost:8080",
    env="DEV",
    platform_instance="local_prefect",
    token=None,  # generate auth token in the datahub and provide here if gms endpoint is secure
)
datahub_emitter.save("datahub-emitter-test")

Configuration options:

Config Type Default Description
datahub_rest_url str http://localhost:8080 DataHub GMS REST URL
env str PROD Environment for assets (see FabricType)
platform_instance str None Platform instance for assets (see Platform Instances)

3. Use DataHub Emitter in Your Workflows

Here's an example of how to use the DataHub Emitter in a Prefect workflow:

from prefect import flow, task
from prefect_datahub.datahub_emitter import DatahubEmitter
from prefect_datahub.entities import Dataset

datahub_emitter_block = DatahubEmitter.load("datahub-emitter-test")

@task(name="Extract", description="Extract the data")
def extract():
    return "This is data"

@task(name="Transform", description="Transform the data")
def transform(data, datahub_emitter):
    transformed_data = data.split(" ")
    datahub_emitter.add_task(
        inputs=[Dataset("snowflake", "mydb.schema.tableX")],
        outputs=[Dataset("snowflake", "mydb.schema.tableY")],
    )
    return transformed_data

@flow(name="ETL", description="Extract transform load flow")
def etl():
    datahub_emitter = datahub_emitter_block
    data = extract()
    transformed_data = transform(data, datahub_emitter)
    datahub_emitter.emit_flow()

if __name__ == "__main__":
    etl()

Note: To emit task metadata, you must call emit_flow() at the end of your flow. Otherwise, no metadata will be emitted.

Advanced Usage

For more advanced usage and configuration options, please refer to the prefect-datahub documentation.

Contributing

We welcome contributions to prefect-datahub! Please refer to our Contributing Guidelines for more information on how to get started.

Support

If you encounter any issues or have questions, you can:

License

prefect-datahub is released under the Apache 2.0 license. See the LICENSE file for details.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prefect_datahub-1.7.0.1rc1.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prefect_datahub-1.7.0.1rc1-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file prefect_datahub-1.7.0.1rc1.tar.gz.

File metadata

  • Download URL: prefect_datahub-1.7.0.1rc1.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for prefect_datahub-1.7.0.1rc1.tar.gz
Algorithm Hash digest
SHA256 f629a18fabd417ae2470d28f0c73f043f04b4e7fdadb38e69aa28f34a8584538
MD5 dce4a30cfc679d5478051d6533ed3cb0
BLAKE2b-256 02b81d01ee7ac61ed8dc6a4462364c219bc3bbd00a117646d247079bb0b1ff26

See more details on using hashes here.

File details

Details for the file prefect_datahub-1.7.0.1rc1-py3-none-any.whl.

File metadata

File hashes

Hashes for prefect_datahub-1.7.0.1rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 e40f8f9d59129590db7455cef49d3be6da2776dbc57a1bbc957da5fe7332ec34
MD5 0905836209c8f2f6246213c3d78caeb5
BLAKE2b-256 279ce8b51eac08fd698230e3012591a8b15eea36bb9e2b337b3fbb4240c4e893

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page