Skip to main content

Personal SQL lineage generator.

Project description

Python Poetry tests coverage GitHub last commit

code style: prettier code style: black Imports: isort pre-commit.ci status Sourcery


SQL Lineage 🔀

Personal SQL lineage generator.

Built on the following awesome libraries:

So... what is this? 🤔

[!NOTE]

This project is still in development and currently only parses the CTEs of an SQL query since that's all I need for now.

I write a lot of SQL, and I often need to understand "lineage" in lots of different ways.

Things like dbt and SQLMesh are great for object lineage (tables, columns, etc.), but I often need to understand "lineage" like the CTE lineage in a query, among other things.

This project is just a personal tool to help me generate lineage diagrams (in Mermaid syntax) for SQL queries to fit that requirement.

Upcoming improvements will (hopefully) include:

  • Semantic edges (e.g. distinguish between JOIN, WHERE, UNION, etc.)
  • Support for parameterised queries (e.g. with Jinja blocks, like dbt)
  • Parsing only the SELECT part of a SQL file that includes DDL/DML commands
  • Lineage for multiple files in a single diagram
  • Column lineage to Mermaid

Installation ⬇️

Grab a copy from PyPI like usual (note the bills- prefix):

pip install bills-sql-lineage

Usage 📖

[!WARNING]

This is likely to change significantly as the project evolves.

Pass the path to a SQL file to the lineage command to generate the lineage as a Mermaid diagram:

lineage path/to/file.sql

This will write a Mermaid diagram to path/to/file.mermaid. You can control the target path with the --target argument:

lineage path/to/file.sql --target path/to/output.mermaid

By default, the SQL dialect will be inferred by SQLGlot, but you can specify a dialect with the --dialect argument:

lineage path/to/file.sql --dialect snowflake

Example 📝

Given the following SQL query:

with

aaa as (select 1 as aa),
bbb as (select 2 as bb),
ccc as (select 3 as cc, aa from aaa),
ddd as (select 4 as dd, aa from aaa where aa not in (select bb from bbb))

select *
from ccc
    inner join ddd using (aa)

...the following Mermaid diagram will be generated:

graph TD
    aaa
    bbb
    ccc
    ddd
    final
    aaa --> ccc
    aaa --> ddd
    bbb --> ddd
    ccc --> final
    ddd --> final

Note that the final node is an alias for the final SELECT statement since the final SELECT statement is not a CTE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bills_sql_lineage-0.0.3.tar.gz (4.2 kB view hashes)

Uploaded Source

Built Distribution

bills_sql_lineage-0.0.3-py3-none-any.whl (5.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page