A lightweight lineage tool based on Spark and Delta Lake
Project description
Lineage Keeper
A lightweight lineage tool based on Spark and Delta Lake
Instalation
pip install lineage-keeper
Basic use
from lineage_keeper import load_listener, LineageViewer
load_listener(spark)
df1 = spark.read.table("db.table_1")
df2 = spark.read.table("db.table_2")
df_join = df1.join(df2, "key")
df_join.write.saveAsTable("db.join_tables")
LineageViewer(spark).viewer()
Limitations
- Its necessary to use tables sintax to read data
spark.read.table("db.table")
spark.sql("SELECT * FROM db.table")
- To use
load_listener
to automatically input lineage information is necessary to usedf.write.saveAsTable("db.table")
otherwise need to callLineageListener(spark).listener(df, "db.table")
Functionalities
By default Lineage Keeper use "default._service_table_lineage_keeper" as a service table.
If wanted its possible to use a different service table.
Listener function
After initiate the Listener we can give a DataFrame and a target table name to be add on the service table
LineageListener(spark).listener(df, "target_db.target_table")
load listener
Change df.write.saveAsTable to use the listerner when called
load_listener(spark)
Lineage graph viewer
Generate a static HTML with the lineage graph
LineageViewer(spark).viewer()
Lineage graph writer
Save a static HTML with the lineage graph on disk
LineageViewer(spark).save_graph(path)
Google Colab Sample
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
lineage_keeper-0.1.tar.gz
(4.5 kB
view details)
File details
Details for the file lineage_keeper-0.1.tar.gz
.
File metadata
- Download URL: lineage_keeper-0.1.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.22.0 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.2 keyring/18.0.1 rfc3986/2.0.0 colorama/0.4.3 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e2fe5d6d7c317bccb2af37efc313ae1d8782259f1aa9c7deeeb05078343acf24 |
|
MD5 | 07481c6c9bcad9c3c982adfe9889e22b |
|
BLAKE2b-256 | 59e75db4d16469e895e8421f61715deab420aef92df338726e6bf78b9821a4c6 |