Skip to main content

Personal SQL lineage generator.

Project description

Python Poetry tests coverage GitHub last commit

code style: prettier code style: black Imports: isort pre-commit.ci status Sourcery


SQL Lineage 🔀

Personal SQL lineage generator.

Built on the following awesome libraries:

So... what is this? 🤔

[!NOTE]

This project is still in development and currently only parses the CTEs of an SQL query since that's all I need for now.

I write a lot of SQL, and I often need to understand "lineage" in lots of different ways.

Things like dbt and SQLMesh are great for object lineage (tables, columns, etc.), but I often need to understand "lineage" like the CTE lineage in a query, among other things.

This project is just a personal tool to help me generate lineage diagrams (in Mermaid syntax) for SQL queries to fit that requirement.

Upcoming improvements will (hopefully) include:

  • Semantic edges (e.g. distinguish between JOIN, WHERE, UNION, etc.)
  • Support for parameterised queries (e.g. with Jinja blocks, like dbt)
  • Parsing only the SELECT part of a SQL file that includes DDL/DML commands
  • Lineage for multiple files in a single diagram
  • Column lineage to Mermaid

Installation ⬇️

Grab a copy from PyPI like usual (note the bills- prefix):

pip install bills-sql-lineage

Usage 📖

[!WARNING]

This is likely to change significantly as the project evolves.

Pass the path to a SQL file to the lineage command to generate the lineage as a Mermaid diagram:

lineage path/to/file.sql

This will write a Mermaid diagram to path/to/file.mermaid. You can control the target path with the --target argument:

lineage path/to/file.sql --target path/to/output.mermaid

By default, the SQL dialect will be inferred by SQLGlot, but you can specify a dialect with the --dialect argument:

lineage path/to/file.sql --dialect snowflake

Example 📝

Given the following SQL query:

with

aaa as (select 1 as aa),
bbb as (select 2 as bb),
ccc as (select 3 as cc, aa from aaa),
ddd as (select 4 as dd, aa from aaa where aa not in (select bb from bbb))

select *
from ccc
    inner join ddd using (aa)

...the following Mermaid diagram will be generated:

graph TD
    aaa
    bbb
    ccc
    ddd
    final
    aaa --> ccc
    aaa --> ddd
    bbb --> ddd
    ccc --> final
    ddd --> final

Note that the final node is an alias for the final SELECT statement since the final SELECT statement is not a CTE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bills_sql_lineage-0.0.3.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

bills_sql_lineage-0.0.3-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file bills_sql_lineage-0.0.3.tar.gz.

File metadata

  • Download URL: bills_sql_lineage-0.0.3.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for bills_sql_lineage-0.0.3.tar.gz
Algorithm Hash digest
SHA256 2aa7844e6c089b88b4e37243d2c2c6d881356d8b92195c49812e3de1fefb2cb6
MD5 8add6859a0adc5e5549fb9347a5bd463
BLAKE2b-256 afbc6868ed150a190fbcab5f82db323e18a565f42f03aca03b396fc7795786b8

See more details on using hashes here.

File details

Details for the file bills_sql_lineage-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for bills_sql_lineage-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 83b4d14aba263ab6e5f35cfe64e8ee01f33cd5e73289e41eb60987ae0d1f8f7f
MD5 fea25254e65b4e7ba8bcef748038d788
BLAKE2b-256 84276cedc75f091d5036113fb7fdeaadd9492319dccfec81774fa85f22a88256

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page