Skip to main content

Create lineage graphs from SQL queries

Project description

sql2lineage

PyPI - Version PyPI - Python Version PyPI - Types Lint Code Base Test Code Base codecov

The sql2lineage package makes it easy to understand the data lineage of your SQL ETL files.

Features

  • Parse SQL strings to create data lineage
  • Build a graph to represent the data lineage
  • Search and print neighbourhoods

Example

Creating Lineage

With an example SQL file:

WITH orders_with_tax AS (
    SELECT
        order_id,
        customer_id,
        order_total * 1.2 AS total_with_tax
    FROM raw.orders
),
filtered_orders AS (
    SELECT
        order_id,
        customer_id,
        total_with_tax
    FROM orders_with_tax
)

CREATE TABLE big_orders AS
SELECT * FROM filtered_orders;

We can parse the content to create lineage.

from sql2lineage.graph import LineageGraph
from sql2lineage.parser import SQLLineageParser

with open("example.sql") as f:
    sql = f.read()

parser = SQLLineageParser()
r = parser.extract_lineage(sql)


graph = LineageGraph()
graph.from_parsed(r.expressions)
graph.pretty_print()

Output

filtered_orders --> big_orders [type: TABLE]
raw.orders --> orders_with_tax [type: TABLE]
orders_with_tax --> filtered_orders [type: TABLE]
filtered_orders.customer_id --> big_orders.customer_id [type: COLUMN, action: COPY]
filtered_orders.total_with_tax --> big_orders.total_with_tax [type: COLUMN, action: COPY]
filtered_orders.order_id --> big_orders.order_id [type: COLUMN, action: COPY]
orders_with_tax.customer_id --> filtered_orders.customer_id [type: COLUMN, action: COPY]
orders_with_tax.total_with_tax --> filtered_orders.total_with_tax [type: COLUMN, action: COPY]
orders_with_tax.order_id --> filtered_orders.order_id [type: COLUMN, action: COPY]
raw.orders.order_id --> orders_with_tax.order_id [type: COLUMN, action: COPY]
raw.orders.order_total --> orders_with_tax.order_total [type: COLUMN, action: TRANSFORM]
raw.orders.customer_id --> orders_with_tax.customer_id [type: COLUMN, action: COPY]

Searching Neighbours

Using the previously created graph, we can find all the neighbours of node orders_with_tax.order_id:

paths = graph.get_node_neighbours("orders_with_tax.order_id")
graph.print_neighbourhood(paths)

Output

Neighbourhood:
  ↳ {'source': 'raw.orders.order_id', 'target': 'orders_with_tax.order_id', 'type': 'COLUMN', 'action': 'COPY'}
  ↳ {'source': 'orders_with_tax.order_id', 'target': 'filtered_orders.order_id', 'type': 'COLUMN', 'action': 'COPY'}
  ↳ {'source': 'filtered_orders.order_id', 'target': 'big_orders.order_id', 'type': 'COLUMN', 'action': 'COPY'}

Bugs/Requests

Please use the GitHub Issue Tracker to submit bugs or requests.

License

Copyright Sean Conkie, 2025.

Distributed under the terms of the MIT license, sql2lineage is free and open source software.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sql2lineage-0.2.1.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sql2lineage-0.2.1-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file sql2lineage-0.2.1.tar.gz.

File metadata

  • Download URL: sql2lineage-0.2.1.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sql2lineage-0.2.1.tar.gz
Algorithm Hash digest
SHA256 6c9d2e1315623a266fc8d24881b4dd98edf0ef5ed30d632c4ff26e42890a8397
MD5 38ab5942809e6bc0562052637f6a0494
BLAKE2b-256 b40826e3952796b6d4e1f65d0d8e4bcf8ff98117670c09ed901f04df12e3965e

See more details on using hashes here.

Provenance

The following attestation bundles were made for sql2lineage-0.2.1.tar.gz:

Publisher: publish.yml on sean-conkie/sql2lineage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sql2lineage-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: sql2lineage-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sql2lineage-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1addf464a6098a8ef9853e22bac925cf8ec07ab9e35e327268911d3ca2376da5
MD5 078909f58ac407ddd693f08afc991b34
BLAKE2b-256 49e168d0bf525ca913b2c0f5ca5ec757eaf1decd160267acf99980b0cef2a1c2

See more details on using hashes here.

Provenance

The following attestation bundles were made for sql2lineage-0.2.1-py3-none-any.whl:

Publisher: publish.yml on sean-conkie/sql2lineage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page