Skip to main content

Create lineage graphs from SQL queries

Project description

sql2lineage

PyPI - Version PyPI - Python Version PyPI - Types Lint Code Base Test Code Base codecov

The sql2lineage package makes it easy to understand the data lineage of your SQL ETL files.

Features

  • Parse SQL strings to create data lineage
  • Build a graph to represent the data lineage
  • Search and print neighbourhoods

Example

Creating Lineage

With an example SQL file:

WITH orders_with_tax AS (
    SELECT
        order_id,
        customer_id,
        order_total * 1.2 AS total_with_tax
    FROM raw.orders
),
filtered_orders AS (
    SELECT
        order_id,
        customer_id,
        total_with_tax
    FROM orders_with_tax
)

CREATE TABLE big_orders AS
SELECT * FROM filtered_orders;

We can parse the content to create lineage.

from sql2lineage.graph import LineageGraph
from sql2lineage.parser import SQLLineageParser

with open("example.sql") as f:
    sql = f.read()

parser = SQLLineageParser()
r = parser.extract_lineage(sql)


graph = LineageGraph()
graph.from_parsed(r.expressions)
graph.pretty_print()

Output

filtered_orders --> big_orders [type: TABLE]
raw.orders --> orders_with_tax [type: TABLE]
orders_with_tax --> filtered_orders [type: TABLE]
filtered_orders.customer_id --> big_orders.customer_id [type: COLUMN, action: COPY]
filtered_orders.total_with_tax --> big_orders.total_with_tax [type: COLUMN, action: COPY]
filtered_orders.order_id --> big_orders.order_id [type: COLUMN, action: COPY]
orders_with_tax.customer_id --> filtered_orders.customer_id [type: COLUMN, action: COPY]
orders_with_tax.total_with_tax --> filtered_orders.total_with_tax [type: COLUMN, action: COPY]
orders_with_tax.order_id --> filtered_orders.order_id [type: COLUMN, action: COPY]
raw.orders.order_id --> orders_with_tax.order_id [type: COLUMN, action: COPY]
raw.orders.order_total --> orders_with_tax.order_total [type: COLUMN, action: TRANSFORM]
raw.orders.customer_id --> orders_with_tax.customer_id [type: COLUMN, action: COPY]

Searching Neighbours

Using the previously created graph, we can find all the neighbours of node orders_with_tax.order_id:

paths = graph.get_node_neighbours("orders_with_tax.order_id")
graph.print_neighbourhood(paths)

Output

Neighbourhood:
  ↳ {'source': 'raw.orders.order_id', 'target': 'orders_with_tax.order_id', 'type': 'COLUMN', 'action': 'COPY'}
  ↳ {'source': 'orders_with_tax.order_id', 'target': 'filtered_orders.order_id', 'type': 'COLUMN', 'action': 'COPY'}
  ↳ {'source': 'filtered_orders.order_id', 'target': 'big_orders.order_id', 'type': 'COLUMN', 'action': 'COPY'}

Bugs/Requests

Please use the GitHub Issue Tracker to submit bugs or requests.

License

Copyright Sean Conkie, 2025.

Distributed under the terms of the MIT license, sql2lineage is free and open source software.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sql2lineage-0.5.0.tar.gz (24.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sql2lineage-0.5.0-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file sql2lineage-0.5.0.tar.gz.

File metadata

  • Download URL: sql2lineage-0.5.0.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sql2lineage-0.5.0.tar.gz
Algorithm Hash digest
SHA256 cc7a58f5ed75d3ee4f2ffe4970591c98fcefb6e7e6807d7112ca8412d3af9139
MD5 475ef990dfd016bfc249cd2f40a47a3e
BLAKE2b-256 1a74e1d0ffde251835682efd0034aac29118401573662b0cf49b362774636362

See more details on using hashes here.

Provenance

The following attestation bundles were made for sql2lineage-0.5.0.tar.gz:

Publisher: publish.yml on sean-conkie/sql2lineage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sql2lineage-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: sql2lineage-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 21.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sql2lineage-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7c784941767640cf333e1f355b739222e0934b536fa1df931ff353cd97f6c829
MD5 b3c5fc318cf588f70e0a2682a0fadd7a
BLAKE2b-256 f97042ba9c8c651f6fdac6ea1931f646094b4a52f3a24300aa10fee51415efd1

See more details on using hashes here.

Provenance

The following attestation bundles were made for sql2lineage-0.5.0-py3-none-any.whl:

Publisher: publish.yml on sean-conkie/sql2lineage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page