Skip to main content

VDK Lineage plugin collects lineage (input -> job -> output) information and send it to a pre-configured destination.

Project description

VDK Lineage

monthly download count for vdk-lineage

VDK Lineage plugin provides lineage data (input data -> job -> output data) information and send it to a pre-configured destination. The lineage data is send using OpenLineage standard

At POC level currently.

Currently, lineage data is collected

  • For each data job run/execution both start and end events including the status of the job (failed/succeeded)
  • For each execute query we collect input and output tables.

TODOs:

  • Collect status of the SQL query (failed, succeeded)
  • Create parent /child relationship between sql event and job run event to track them better (single job can have multiple queries)
  • Non-SQL lineage (ingest, load data,etc)
  • Extend support for all queries
  • provide more information using facets – op id, job version,
  • figure out how to visualize parent/child relationships in Marquez
  • Explore openlineage.sqlparser instead of sqllineage library as alternative

Usage

pip install vdk-lineage

And it will start collecting lineage from job and sql queries.

To send data using openlineage specify VDK_OPENLINEAGE_URL. For example:

export VDK_OPENLINEAGE_URL=http://localhost:5002
vdk marquez-server --start
vdk run some-job
# check UI for lineage
# stopping the server will delete any lineage data.
vdk marquez-server --stop

Build and testing

In order to build and test a plugin go to the plugin directory and use ../build-plugin.sh script to build it

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vdk-lineage-0.3.1227661975.tar.gz (12.7 kB view details)

Uploaded Source

File details

Details for the file vdk-lineage-0.3.1227661975.tar.gz.

File metadata

  • Download URL: vdk-lineage-0.3.1227661975.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.13

File hashes

Hashes for vdk-lineage-0.3.1227661975.tar.gz
Algorithm Hash digest
SHA256 b640c40d67cd464a4b354d41f8f3a2aa6a5c8a9978894eb55eaa8ce76fb9cbfb
MD5 51c04edfa3504f8cf061710e97fce9f5
BLAKE2b-256 962048ebe01f05d7579c12e79bb4cc87b7d3fc9d3a68843c692dacc77d9c836f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page