Skip to main content

Open Source Data Lineage Tool For AWS and GCP

Project description

CircleCI codecov PyPI image image

Data Lineage for Databases and Data Lakes

data-lineage is an open source application to query and visualize data lineage in databases, data warehouses and data lakes in AWS and GCP.

data-lineage's goal is to be fast, simple setup and allow analysis of the lineage. To achieve these goals, data lineage has the following features :

  1. Generate data lineage from query history. Most databases maintain query history for a few days. Therefore the setup costs of an infrastructure to capture and store metadata is minimal.
  2. Use networkx graph library to create a DAG of the lineage. Networkx graphs provide programmatic access to data lineage providing rich opportunities to analyze data lineage.
  3. Integrate with Jupyter Notebooks. Jupyter Notebooks provide an excellent IDE to generate, manipulate and analyze data lineage graphs.
  4. Use Plotly to visualize the graph with rich annotations. Plotly provides a number of features to provide rich graphs with tool tips, color coding and weights based on different attributes of the graph.

Checkout an example data lineage notebook.

Use Cases

Data Lineage enables the following use cases:

  • Business Rules Verification
  • Change Impact Analysis
  • Data Quality Verification

Check out the post on using data lineage for cost control for an example of how data lineage can be used in production.

Quick Start

# Install packages
pip install data-lineage
pip install jupyter

jupyter notebook

# Checkout example notebook: http://tokern.io/docs/data-lineage/example/ 

Supported Technologies

  • Postgres
  • AWS Redshift
  • Snowflake

Coming Soon

  • MySQL
  • SparkSQL
  • Presto

Documentation

For advanced usage, please refer to data-lineage documentation

Survey

Please take this survey if you are a user or considering using data-lineage. Responses will help us prioritize features better.

Developer Setup

# Install dependencies
pipenv install --dev

# Setup pre-commit and pre-push hooks
pipenv run pre-commit install -t pre-commit
pipenv run pre-commit install -t pre-push

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data-lineage-0.5.2.tar.gz (16.4 kB view details)

Uploaded Source

Built Distribution

data_lineage-0.5.2-py3-none-any.whl (14.9 kB view details)

Uploaded Python 3

File details

Details for the file data-lineage-0.5.2.tar.gz.

File metadata

  • Download URL: data-lineage-0.5.2.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.3

File hashes

Hashes for data-lineage-0.5.2.tar.gz
Algorithm Hash digest
SHA256 9fbfd9d3d650f63a86eed1a2dc1b825ad22f79c760275ed7ac499498029d9594
MD5 a661001afc55e44bb9b5943acffd0044
BLAKE2b-256 7f0c985b9f3ad77d08b386885e5d9dc6983c61cb09bdc52b3dab873867f9a71c

See more details on using hashes here.

File details

Details for the file data_lineage-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: data_lineage-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 14.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.3

File hashes

Hashes for data_lineage-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 dfe251ddf61ebd443f6bf808a0caccf46f6f6db16db84d3d620a290ad24007a2
MD5 48f83933535559d8bfcd5907c4aa3305
BLAKE2b-256 70efa1a2e5ec9b8612bf8565cfc8ce5b00d3e7f78241ac9b7f8e6033de979ebe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page