Skip to main content

A lightweight lineage tool based on Spark and Delta Lake

Project description

Lineage Keeper

A lightweight lineage tool based on Spark and Delta Lake

Instalation

pip install lineage-keeper

Basic use

from lineage_keeper import load_listener, LineageViewer
load_listener(spark)

df1 = spark.read.table("db.table_1")
df2 = spark.read.table("db.table_2")

df_join = df1.join(df2, "key")

df_join.write.saveAsTable("db.join_tables")

LineageViewer(spark).viewer()

Limitations

  • Its necessary to use tables sintax to read data
    • spark.read.table("db.table")
    • spark.sql("SELECT * FROM db.table")
  • To use load_listener to automatically input lineage information is necessary to use df.write.saveAsTable("db.table") otherwise need to call LineageListener(spark).listener(df, "db.table")

Functionalities

By default Lineage Keeper use "default._service_table_lineage_keeper" as a service table.

If wanted its possible to use a different service table.

Listener function

After initiate the Listener we can give a DataFrame and a target table name to be add on the service table

LineageListener(spark).listener(df, "target_db.target_table")

load listener

Change df.write.saveAsTable to use the listerner when called

load_listener(spark)

Lineage graph viewer

Generate a static HTML with the lineage graph

LineageViewer(spark).viewer()

Lineage graph writer

Save a static HTML with the lineage graph on disk

LineageViewer(spark).save_graph(path)

Google Colab Sample

Sample using Lineage Keeper

Graph_Sample

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lineage_keeper-0.1.tar.gz (4.5 kB view details)

Uploaded Source

File details

Details for the file lineage_keeper-0.1.tar.gz.

File metadata

  • Download URL: lineage_keeper-0.1.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.22.0 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.2 keyring/18.0.1 rfc3986/2.0.0 colorama/0.4.3 CPython/3.8.10

File hashes

Hashes for lineage_keeper-0.1.tar.gz
Algorithm Hash digest
SHA256 e2fe5d6d7c317bccb2af37efc313ae1d8782259f1aa9c7deeeb05078343acf24
MD5 07481c6c9bcad9c3c982adfe9889e22b
BLAKE2b-256 59e75db4d16469e895e8421f61715deab420aef92df338726e6bf78b9821a4c6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page