Skip to main content

A lightweight lineage tool based on Spark and Delta Lake

Project description

Lineage Keeper

A lightweight lineage tool based on Spark and Delta Lake

Architecture

Table of contents

Instalation

pip install lineage-keeper

Basic use

from lineage_keeper import load_listener, LineageViewer
load_listener(spark)

df1 = spark.read.table("db.table_1")
df2 = spark.read.table("db.table_2")

df_join = df1.join(df2, "key")

df_join.write.saveAsTable("db.join_tables")

LineageViewer(spark).viewer()

Functionalities

By default Lineage Keeper use "default._service_table_lineage_keeper" as a service table.

If wanted its possible to use a different service table.

Listener function

Manually input lineage information on the service table

LineageListener : spark sesison listener : source DataFrame, target table

ll = LineageListener(spark)
ll.listener(df, "target_db.target_table")

load listener

Change df.write.saveAsTable to automatically input lineage information on the service table

load_listener(spark)

Lineage graph viewer

Generate a static HTML with the lineage graph

LineageViewer(spark).viewer()

Lineage graph writer

Save a static HTML with the lineage graph on disk

LineageViewer(spark).save_graph(path)

Limitations

  • Its necessary to use tables sintax to read data
    • spark.read.table("db.table")
    • spark.sql("SELECT * FROM db.table")
  • To use load_listener to is necessary to use df.write.saveAsTable("db.table") otherwise need to call LineageListener(spark).listener(df, "db.table")

Demo Notebook

Sample using Lineage Keeper

Graph_Sample

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lineage_keeper-0.2.tar.gz (4.7 kB view details)

Uploaded Source

File details

Details for the file lineage_keeper-0.2.tar.gz.

File metadata

  • Download URL: lineage_keeper-0.2.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for lineage_keeper-0.2.tar.gz
Algorithm Hash digest
SHA256 2b783fbc7b75eecdb8e98d03042f63ab8b0ac74b2c885b76cdb08ca058df78f6
MD5 bca6467e3e62121bfd2baaa93da17ee1
BLAKE2b-256 86b49882a3952899f3f130864f6bb840ab6180af5b4e22e2961dd7d4121e3f4a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page