Skip to main content

Open Source Data Lineage Tool For AWS and GCP

Project description

CircleCI codecov PyPI image image

Data Lineage for Databases and Data Lakes

data-lineage is an open source application to query and visualize data lineage in databases, data warehouses and data lakes in AWS and GCP.

data-lineage's goal is to be fast, simple setup and allow analysis of the lineage. To achieve these goals, data lineage has the following features :

  1. Generate data lineage from query history. Most databases maintain query history for a few days. Therefore the setup costs of an infrastructure to capture and store metadata is minimal.
  2. Use networkx graph library to create a DAG of the lineage. Networkx graphs provide programmatic access to data lineage providing rich opportunities to analyze data lineage.
  3. Integrate with Jupyter Notebooks. Jupyter Notebooks provide an excellent IDE to generate, manipulate and analyze data lineage graphs.
  4. Use Plotly to visualize the graph with rich annotations. Plotly provides a number of features to provide rich graphs with tool tips, color coding and weights based on different attributes of the graph.

Checkout an example data lineage notebook.

Use Cases

Data Lineage enables the following use cases:

  • Business Rules Verification
  • Change Impact Analysis
  • Data Quality Verification

Check out the post on using data lineage for cost control for an example of how data lineage can be used in production.

Quick Start

# Install packages
pip install data-lineage
pip install jupyter

jupyter notebook

# Checkout example notebook: http://tokern.io/docs/data-lineage/example/ 

Supported Technologies

  • Postgres
  • AWS Redshift
  • Snowflake

Coming Soon

  • MySQL
  • SparkSQL
  • Presto

Documentation

For advanced usage, please refer to data-lineage documentation

Survey

Please take this survey if you are a user or considering using data-lineage. Responses will help us prioritize features better.

Developer Setup

# Install dependencies
pipenv install --dev

# Setup pre-commit and pre-push hooks
pipenv run pre-commit install -t pre-commit
pipenv run pre-commit install -t pre-push

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data-lineage-0.6.0.tar.gz (16.6 kB view details)

Uploaded Source

Built Distribution

data_lineage-0.6.0-py3-none-any.whl (14.9 kB view details)

Uploaded Python 3

File details

Details for the file data-lineage-0.6.0.tar.gz.

File metadata

  • Download URL: data-lineage-0.6.0.tar.gz
  • Upload date:
  • Size: 16.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.3

File hashes

Hashes for data-lineage-0.6.0.tar.gz
Algorithm Hash digest
SHA256 563ee1e8c269d6499bb04cc5412f982d65b21d91b85c03469199b8ceb4def78b
MD5 a41aece2d136ce3bfd738606fe9542c0
BLAKE2b-256 56ab2f4cdc7d63cd472a04b2658f035a8bddc0bc4a59033d48c535f47e9bb082

See more details on using hashes here.

File details

Details for the file data_lineage-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: data_lineage-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 14.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.3

File hashes

Hashes for data_lineage-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dc574202ad64165d864d5b42ea056058723c54a7e1945afc2fe9318a9629f975
MD5 f8251fc45809996ad18be17f3534473a
BLAKE2b-256 bd55ff7fb91e7c6fb81652cce7748f7aa315892ad7153de5852541794d572282

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page