Create lineage graphs from SQL queries
Project description
sql2lineage
The sql2lineage package makes it easy to understand the data lineage of your SQL ETL files.
Features
- Parse SQL strings to create data lineage
- Build a graph to represent the data lineage
- Search and print neighbourhoods
Example
Creating Lineage
With an example SQL file:
WITH orders_with_tax AS (
SELECT
order_id,
customer_id,
order_total * 1.2 AS total_with_tax
FROM raw.orders
),
filtered_orders AS (
SELECT
order_id,
customer_id,
total_with_tax
FROM orders_with_tax
)
CREATE TABLE big_orders AS
SELECT * FROM filtered_orders;
We can parse the content to create lineage.
from sql2lineage.graph import LineageGraph
from sql2lineage.parser import SQLLineageParser
with open("example.sql") as f:
sql = f.read()
parser = SQLLineageParser()
r = parser.extract_lineage(sql)
graph = LineageGraph()
graph.from_parsed(r.expressions)
graph.pretty_print()
Output
filtered_orders --> big_orders [type: TABLE]
raw.orders --> orders_with_tax [type: TABLE]
orders_with_tax --> filtered_orders [type: TABLE]
filtered_orders.customer_id --> big_orders.customer_id [type: COLUMN, action: COPY]
filtered_orders.total_with_tax --> big_orders.total_with_tax [type: COLUMN, action: COPY]
filtered_orders.order_id --> big_orders.order_id [type: COLUMN, action: COPY]
orders_with_tax.customer_id --> filtered_orders.customer_id [type: COLUMN, action: COPY]
orders_with_tax.total_with_tax --> filtered_orders.total_with_tax [type: COLUMN, action: COPY]
orders_with_tax.order_id --> filtered_orders.order_id [type: COLUMN, action: COPY]
raw.orders.order_id --> orders_with_tax.order_id [type: COLUMN, action: COPY]
raw.orders.order_total --> orders_with_tax.order_total [type: COLUMN, action: TRANSFORM]
raw.orders.customer_id --> orders_with_tax.customer_id [type: COLUMN, action: COPY]
Searching Neighbours
Using the previously created graph, we can find all the neighbours of node orders_with_tax.order_id:
paths = graph.get_node_neighbours("orders_with_tax.order_id")
graph.print_neighbourhood(paths)
Output
Neighbourhood:
↳ {'source': 'raw.orders.order_id', 'target': 'orders_with_tax.order_id', 'type': 'COLUMN', 'action': 'COPY'}
↳ {'source': 'orders_with_tax.order_id', 'target': 'filtered_orders.order_id', 'type': 'COLUMN', 'action': 'COPY'}
↳ {'source': 'filtered_orders.order_id', 'target': 'big_orders.order_id', 'type': 'COLUMN', 'action': 'COPY'}
Bugs/Requests
Please use the GitHub Issue Tracker to submit bugs or requests.
License
Copyright Sean Conkie, 2025.
Distributed under the terms of the MIT license, sql2lineage is free and open source software.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sql2lineage-0.3.0.tar.gz.
File metadata
- Download URL: sql2lineage-0.3.0.tar.gz
- Upload date:
- Size: 18.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1107da2418d910f93e197babfb1d46fbfe32119ea862dc5d89dcdd73e68f1d2c
|
|
| MD5 |
04292affbe70beea73f1a14abcbaa478
|
|
| BLAKE2b-256 |
5cbe5d022ce1efb1ea5808bac004ea3bbf2708149de0c6517a910a9789c8e3d8
|
Provenance
The following attestation bundles were made for sql2lineage-0.3.0.tar.gz:
Publisher:
publish.yml on sean-conkie/sql2lineage
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sql2lineage-0.3.0.tar.gz -
Subject digest:
1107da2418d910f93e197babfb1d46fbfe32119ea862dc5d89dcdd73e68f1d2c - Sigstore transparency entry: 200321754
- Sigstore integration time:
-
Permalink:
sean-conkie/sql2lineage@d2414312bad0b5952e51e231e7ece7a08e76e50a -
Branch / Tag:
refs/tags/0.3.0 - Owner: https://github.com/sean-conkie
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d2414312bad0b5952e51e231e7ece7a08e76e50a -
Trigger Event:
release
-
Statement type:
File details
Details for the file sql2lineage-0.3.0-py3-none-any.whl.
File metadata
- Download URL: sql2lineage-0.3.0-py3-none-any.whl
- Upload date:
- Size: 14.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d9ac884938e884836ff92f2e19a94897752d45aef152111404e8b94b8e7fb15
|
|
| MD5 |
83b004e8b149e8fc69057b1ca1f4606e
|
|
| BLAKE2b-256 |
b8a23223ea8a79befee5024f26c387baf03f01995d0a465cd661af7399abd76a
|
Provenance
The following attestation bundles were made for sql2lineage-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on sean-conkie/sql2lineage
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sql2lineage-0.3.0-py3-none-any.whl -
Subject digest:
2d9ac884938e884836ff92f2e19a94897752d45aef152111404e8b94b8e7fb15 - Sigstore transparency entry: 200321759
- Sigstore integration time:
-
Permalink:
sean-conkie/sql2lineage@d2414312bad0b5952e51e231e7ece7a08e76e50a -
Branch / Tag:
refs/tags/0.3.0 - Owner: https://github.com/sean-conkie
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d2414312bad0b5952e51e231e7ece7a08e76e50a -
Trigger Event:
release
-
Statement type: