Skip to main content

Data lineage API for Python scripts

Project description

aissemble-foundation-data-lineage-python

This module serves as a generic wrapper for data lineage types, objects, and functions. While it is designed to be leveraged through the aiSSEMBLE™ ecosystem, it is not dependent on aiSSEMBLE for execution. It is intentionally designed for portability.

This Readme is intended to provide technical insight into the implementation of this package. For consumption guidance, please refer to the aiSSEMBLE Github Pages

Core Functionality

This module presently provides three main capabilities:

  • Generic types to represent Data Lineage metadata
  • Convenience methods for emitting Data Lineage through various mediums
  • Conversion functions to transform the generic types to and from popular Data Lineage formats.

Through these capabilities, this module fulfills the need for an easy-to-use, implementation-agnostic Data Lineage interface.

Developer Guidance

  • This module should display little to no dependence on any other aiSSEMBLE module. It is intentionally generic.
  • Any changes to method or function signatures must be reflected in any relevant template code within foundation-mda.
  • Any change or addition in functionality must be accompanied by associated automated tests.
    • As we are serving as an interface to third party libraries and services, any input parameters must be exhaustively validated.

Use Custom Microprofile config properties

By default, foundation-data-lineage uses the below microprofile-config.properties for the messaging service configurations

kafka.bootstrap.servers=kafka-cluster:9093
mp.messaging.outgoing.lineage-event-out.cloud-events=false
mp.messaging.outgoing.lineage-event-out.connector=smallrye-kafka
mp.messaging.outgoing.lineage-event-out.topic=lineage-event-out
mp.messaging.outgoing.lineage-event-out.key.serializer=org.apache.kafka.common.serialization.StringSerializer
mp.messaging.outgoing.lineage-event-out.value.serializer=org.apache.kafka.common.serialization.StringSerializer

You can also use your own microprofile-config.properties file. To do that, you must provide your own property path to the code:

  1. Set the path of the microprofile-config.properties on your file system
  # specify the new microprofile-config properties path
  self.emitter.set_messaging_properties_path(string_path_to_the_property_file)
  1. Set the new emitter topic
    add below property to the data-lineage.properties file of the appropriate docker image (e.g. spark-worker)
data.lineage.emission.topic=replace-with-your-topic
  1. (Optional) If you are using the v1 kafka-cluster helm chart, add the new topic to kafka-cluster values.yaml file; e.g.: add replace-with-your-topic:1:1 to the KAFKA_CREATE_TOPICS environment variable in the -deploy/src/main/resources/apps/kafka-cluster/values.yaml

NOTE:

There is a known issue with the confluent-kafka dependency that is brought in by the openlineage-client library that prevents it from installing this requirement on some arm64 environments. The problem is that librdkafka does not provide a wheel for all operating systems in the arm64 architecture, requiring that it be built from source in order for confluent-kafka to be installed. If you run into this problem, you can install it from source using the below commands:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file aissemble_foundation_data_lineage_python-1.7.0.dev1715242696.tar.gz.

File metadata

File hashes

Hashes for aissemble_foundation_data_lineage_python-1.7.0.dev1715242696.tar.gz
Algorithm Hash digest
SHA256 d90aa43db244d61b9ecf466c6a74660ef5abc5029d76a99ca9ae74be85d7d7e0
MD5 5874717ffbb8880fd095dcfdfe0b9282
BLAKE2b-256 38f2a63e58b6b313b90b7b37ee5b81423123a646c78957a4f600b0ade19e6af1

See more details on using hashes here.

File details

Details for the file aissemble_foundation_data_lineage_python-1.7.0.dev1715242696-py3-none-any.whl.

File metadata

File hashes

Hashes for aissemble_foundation_data_lineage_python-1.7.0.dev1715242696-py3-none-any.whl
Algorithm Hash digest
SHA256 a4d2e9a9d7c5e16814837999102e3e37d7ec439443305eb45dbcfadf6cf8e026
MD5 73581a972d559260964d7e1f0cff01fa
BLAKE2b-256 c0b2e5c8b8b0aa997b487ccf2031b7019c5c58e0ccef051230466073886a25d1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page