Open Source Data Lineage Tool For AWS and GCP
Project description
Data Lineage for Databases and Data Lakes
Data Lineage is an open source application to query and visualize data lineage in databases, data warehouses and data lakes in AWS and GCP.
Features
- Generate lineage from SQL query history.
- Supports ANSI SQL queries
- Integrate with Jupyter Notebook
- Visualize data lineage using Plotly.
- Select source or target table.
- Pan, Zoom, Select graph
Checkout an example data lineage notebook.
Use Cases
Data Lineage enables the following use cases:
- Business Rules Verification
- Change Impact Analysis
- Data Quality Verification
Check out the post on using data lineage for cost control for an example of how data lineage can be used in production.
Quick Start
# Install packages
pip install data-lineage
pip install jupyter
jupyter notebook
# Checkout example notebook: http://tokern.io/docs/data-lineage/example/
Supported Technologies
- Postgres
Coming Soon
- MySQL
- AWS Redshift
- SparkSQL
- Presto
Developer Setup
# Install dependencies
pipenv install --dev
# Setup pre-commit and pre-push hooks
pipenv run pre-commit install -t pre-commit
pipenv run pre-commit install -t pre-push
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
data-lineage-0.2.0.tar.gz
(11.5 kB
view hashes)
Built Distribution
Close
Hashes for data_lineage-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f34bcc8055b9eb546d56f8aa134ba51ef085d834cf8fe8fa2855f5775af295ad |
|
MD5 | 8c30dcd7d798df7547874253ff6c7621 |
|
BLAKE2b-256 | 73cf89434efeac1244296872ebe48d55e9f74fcb5f5d1cd0045207f0c13cd6da |