Skip to main content

Create lineage graphs from SQL queries

Project description

sql2lineage

PyPI - Version PyPI - Python Version PyPI - Types Lint Code Base Test Code Base codecov

The sql2lineage package makes it easy to understand the data lineage of your SQL ETL files.

Features

  • Parse SQL strings to create data lineage
  • Build a graph to represent the data lineage
  • Search and print neighbourhoods

Example

Creating Lineage

With an example SQL file:

WITH orders_with_tax AS (
    SELECT
        order_id,
        customer_id,
        order_total * 1.2 AS total_with_tax
    FROM raw.orders
),
filtered_orders AS (
    SELECT
        order_id,
        customer_id,
        total_with_tax
    FROM orders_with_tax
)

CREATE TABLE big_orders AS
SELECT * FROM filtered_orders;

We can parse the content to create lineage.

from sql2lineage.graph import LineageGraph
from sql2lineage.parser import SQLLineageParser

with open("example.sql") as f:
    sql = f.read()

parser = SQLLineageParser()
r = parser.extract_lineage(sql)


graph = LineageGraph()
graph.from_parsed(r.expressions)
graph.pretty_print()

Output

filtered_orders --> big_orders [type: TABLE]
raw.orders --> orders_with_tax [type: TABLE]
orders_with_tax --> filtered_orders [type: TABLE]
filtered_orders.customer_id --> big_orders.customer_id [type: COLUMN, action: COPY]
filtered_orders.total_with_tax --> big_orders.total_with_tax [type: COLUMN, action: COPY]
filtered_orders.order_id --> big_orders.order_id [type: COLUMN, action: COPY]
orders_with_tax.customer_id --> filtered_orders.customer_id [type: COLUMN, action: COPY]
orders_with_tax.total_with_tax --> filtered_orders.total_with_tax [type: COLUMN, action: COPY]
orders_with_tax.order_id --> filtered_orders.order_id [type: COLUMN, action: COPY]
raw.orders.order_id --> orders_with_tax.order_id [type: COLUMN, action: COPY]
raw.orders.order_total --> orders_with_tax.order_total [type: COLUMN, action: TRANSFORM]
raw.orders.customer_id --> orders_with_tax.customer_id [type: COLUMN, action: COPY]

Searching Neighbours

Using the previously created graph, we can find all the neighbours of node orders_with_tax.order_id:

paths = graph.get_node_neighbours("orders_with_tax.order_id")
graph.print_neighbourhood(paths)

Output

Neighbourhood:
  ↳ {'source': 'raw.orders.order_id', 'target': 'orders_with_tax.order_id', 'type': 'COLUMN', 'action': 'COPY'}
  ↳ {'source': 'orders_with_tax.order_id', 'target': 'filtered_orders.order_id', 'type': 'COLUMN', 'action': 'COPY'}
  ↳ {'source': 'filtered_orders.order_id', 'target': 'big_orders.order_id', 'type': 'COLUMN', 'action': 'COPY'}

Bugs/Requests

Please use the GitHub Issue Tracker to submit bugs or requests.

License

Copyright Sean Conkie, 2025.

Distributed under the terms of the MIT license, sql2lineage is free and open source software.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sql2lineage-0.3.2.tar.gz (20.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sql2lineage-0.3.2-py3-none-any.whl (16.1 kB view details)

Uploaded Python 3

File details

Details for the file sql2lineage-0.3.2.tar.gz.

File metadata

  • Download URL: sql2lineage-0.3.2.tar.gz
  • Upload date:
  • Size: 20.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sql2lineage-0.3.2.tar.gz
Algorithm Hash digest
SHA256 46343660a0216a15b96084f5f9d803c920e339a2e617f7001f11bfc192caf7be
MD5 afceb108c91e39c6dbd8b70a5c5ce833
BLAKE2b-256 02fb27c4725115380826479e63190e51c5a5acc5b0daa2c30807478591dee511

See more details on using hashes here.

Provenance

The following attestation bundles were made for sql2lineage-0.3.2.tar.gz:

Publisher: publish.yml on sean-conkie/sql2lineage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sql2lineage-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: sql2lineage-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 16.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sql2lineage-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 dc81d40108a635e2369f71f1654911030e95332cec1445d51795b9769f87a165
MD5 68355825f93f265c4345f1214a132bd5
BLAKE2b-256 4eb7f7dce0d5d16fab11eeb7eef746dfdc0f4ad699a7a58ff01575e60e0617c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for sql2lineage-0.3.2-py3-none-any.whl:

Publisher: publish.yml on sean-conkie/sql2lineage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page