Skip to main content

Create lineage graphs from SQL queries

Project description

sql2lineage

PyPI - Version PyPI - Python Version PyPI - Types Lint Code Base Test Code Base codecov

The sql2lineage package makes it easy to understand the data lineage of your SQL ETL files.

Features

  • Parse SQL strings to create data lineage
  • Build a graph to represent the data lineage
  • Search and print neighbourhoods

Example

Creating Lineage

With an example SQL file:

WITH orders_with_tax AS (
    SELECT
        order_id,
        customer_id,
        order_total * 1.2 AS total_with_tax
    FROM raw.orders
),
filtered_orders AS (
    SELECT
        order_id,
        customer_id,
        total_with_tax
    FROM orders_with_tax
)

CREATE TABLE big_orders AS
SELECT * FROM filtered_orders;

We can parse the content to create lineage.

from sql2lineage.graph import LineageGraph
from sql2lineage.parser import SQLLineageParser

with open("example.sql") as f:
    sql = f.read()

parser = SQLLineageParser()
r = parser.extract_lineage(sql)


graph = LineageGraph()
graph.from_parsed(r.expressions)
graph.pretty_print()

Output

filtered_orders --> big_orders [type: TABLE]
raw.orders --> orders_with_tax [type: TABLE]
orders_with_tax --> filtered_orders [type: TABLE]
filtered_orders.customer_id --> big_orders.customer_id [type: COLUMN, action: COPY]
filtered_orders.total_with_tax --> big_orders.total_with_tax [type: COLUMN, action: COPY]
filtered_orders.order_id --> big_orders.order_id [type: COLUMN, action: COPY]
orders_with_tax.customer_id --> filtered_orders.customer_id [type: COLUMN, action: COPY]
orders_with_tax.total_with_tax --> filtered_orders.total_with_tax [type: COLUMN, action: COPY]
orders_with_tax.order_id --> filtered_orders.order_id [type: COLUMN, action: COPY]
raw.orders.order_id --> orders_with_tax.order_id [type: COLUMN, action: COPY]
raw.orders.order_total --> orders_with_tax.order_total [type: COLUMN, action: TRANSFORM]
raw.orders.customer_id --> orders_with_tax.customer_id [type: COLUMN, action: COPY]

Searching Neighbours

Using the previously created graph, we can find all the neighbours of node orders_with_tax.order_id:

paths = graph.get_node_neighbours("orders_with_tax.order_id")
graph.print_neighbourhood(paths)

Output

Neighbourhood:
  ↳ {'source': 'raw.orders.order_id', 'target': 'orders_with_tax.order_id', 'type': 'COLUMN', 'action': 'COPY'}
  ↳ {'source': 'orders_with_tax.order_id', 'target': 'filtered_orders.order_id', 'type': 'COLUMN', 'action': 'COPY'}
  ↳ {'source': 'filtered_orders.order_id', 'target': 'big_orders.order_id', 'type': 'COLUMN', 'action': 'COPY'}

Bugs/Requests

Please use the GitHub Issue Tracker to submit bugs or requests.

License

Copyright Sean Conkie, 2025.

Distributed under the terms of the MIT license, sql2lineage is free and open source software.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sql2lineage-0.2.0.tar.gz (16.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sql2lineage-0.2.0-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file sql2lineage-0.2.0.tar.gz.

File metadata

  • Download URL: sql2lineage-0.2.0.tar.gz
  • Upload date:
  • Size: 16.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sql2lineage-0.2.0.tar.gz
Algorithm Hash digest
SHA256 2c3ae1f9acb689c6c420a4286f8efd86a262443b85ee09516f8c77b4d34711a6
MD5 9740ec4c0d46238a40ff923d29153fcb
BLAKE2b-256 0624b93c9990ee44a8bf816d353469f97f96e164975118cca76ed57d7e6d5121

See more details on using hashes here.

Provenance

The following attestation bundles were made for sql2lineage-0.2.0.tar.gz:

Publisher: publish.yml on sean-conkie/sql2lineage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sql2lineage-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: sql2lineage-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sql2lineage-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2ac34fe0fa76c3f258ad301aad1861409098184a34f39f23f8bb5ddfd342809d
MD5 cf67499af1d8293b4b46a551d1ec0f48
BLAKE2b-256 f4084867a9ff28d1b80ad8557e370503a7060a5e9d0203d096c3a343dc43ef0c

See more details on using hashes here.

Provenance

The following attestation bundles were made for sql2lineage-0.2.0-py3-none-any.whl:

Publisher: publish.yml on sean-conkie/sql2lineage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page