Extract column-level lineage from Polars LazyFrame transformations.
Project description
polars-lineage
Extract column-level lineage from Polars LazyFrame transformations.
This PoC uses a metadata-first workflow through a LazyFrame.lineage namespace.
Attach source metadata with lineage.add_source(...), then run lineage.extract().
Install and Setup
pip install polars-lineage
For local development:
uv sync --dev
Python API (Metadata First)
Importing polars_lineage registers LazyFrame.lineage.
lineage.add_source(...) accepts:
name: logical source nameuri: source URI- optional
destination_table: explicit destination FQN
If destination_table is not provided, a deterministic destination FQN is derived.
Example:
import polars as pl
import polars_lineage # registers LazyFrame.lineage namespace
df_orders = (
pl.DataFrame({"id": [1, 2], "amount": [10, 20]})
.lazy()
.lineage.add_source(name="orders", uri="postgres://warehouse/svc.db.raw.orders")
)
df_accounts = (
pl.DataFrame({"id": [1, 2], "segment": ["A", "B"]})
.lazy()
.lineage.add_source(name="accounts", uri="https://crm/accounts")
)
lineage = (
df_orders.join(df_accounts, on="id", how="left")
.with_columns(pl.col("amount").alias("amount_copy"))
.lineage.extract()
)
print(lineage)
markdown = (
df_orders.join(df_accounts, on="id", how="left")
.with_columns(pl.col("amount").alias("amount_copy"))
.lineage.to_markdown()
)
print(markdown)
URI parsing notes:
- If the URI path ends with a table FQN (
service.database.schema.table), that FQN is used. - Otherwise lineage derives source FQN from URI parts:
- service: URI scheme
- database: URI hostname (or
external) - schema:
public - table: final URI path segment
Namespace Notes
lineage.add_source(...)returns the samepl.LazyFrameinstance.- Metadata is propagated through common lazy operations (including joins).
lineage.extract()returns deterministic OpenMetadata-style payloads.
Current Capabilities
- Projection lineage (
select,with_columns) - Literals and aliases
- Basic expression dependency extraction (arithmetic, casts, conditional-like patterns)
- Transitive dependency resolution
- Join-aware attribution with explicit
left/rightmapping aliases - Group-by aggregation expression and key coverage
- Deterministic OpenMetadata payload export
- Deterministic custom JSON export via typed
LineageDocumentmodel - Deterministic Markdown lineage rendering
Current Constraints
- Multiple joins in one parsed plan are rejected.
- Ambiguous non-join overlapping columns are rejected with clear errors.
- For static type checking, dynamically registered
LazyFrame.lineagemay require stubs.
Development
uv run pytest
uv run ruff check .
uv run mypy
uv build
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polars_lineage-0.1.4.tar.gz.
File metadata
- Download URL: polars_lineage-0.1.4.tar.gz
- Upload date:
- Size: 11.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0b15dc88d403bc3e6940f5a260c226351440b8b8f746bdfed1e2b7eade93da4
|
|
| MD5 |
9e85ec694223cc614a81d1d82414c2e2
|
|
| BLAKE2b-256 |
bcf1275b0420262fcedf4425b6e89dfb618c9f31fbf9598303a55a8532b877dc
|
Provenance
The following attestation bundles were made for polars_lineage-0.1.4.tar.gz:
Publisher:
release.yml on davzucky/polars-lineage
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polars_lineage-0.1.4.tar.gz -
Subject digest:
b0b15dc88d403bc3e6940f5a260c226351440b8b8f746bdfed1e2b7eade93da4 - Sigstore transparency entry: 1022838492
- Sigstore integration time:
-
Permalink:
davzucky/polars-lineage@70e97793ba1fcb1725f54e577fd4b049ac74ab6d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/davzucky
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@70e97793ba1fcb1725f54e577fd4b049ac74ab6d -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file polars_lineage-0.1.4-py3-none-any.whl.
File metadata
- Download URL: polars_lineage-0.1.4-py3-none-any.whl
- Upload date:
- Size: 18.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a5d2732532b509197a5d7b55c12a29d8958656d0809a16d07cc3ad1c49ced48
|
|
| MD5 |
2c95c916672c3cb5e6ea1699b08f248b
|
|
| BLAKE2b-256 |
af02373b7b0c06f898a65ae0f343f68107980e5184c8711793dbd79451de68c0
|
Provenance
The following attestation bundles were made for polars_lineage-0.1.4-py3-none-any.whl:
Publisher:
release.yml on davzucky/polars-lineage
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polars_lineage-0.1.4-py3-none-any.whl -
Subject digest:
1a5d2732532b509197a5d7b55c12a29d8958656d0809a16d07cc3ad1c49ced48 - Sigstore transparency entry: 1022838555
- Sigstore integration time:
-
Permalink:
davzucky/polars-lineage@70e97793ba1fcb1725f54e577fd4b049ac74ab6d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/davzucky
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@70e97793ba1fcb1725f54e577fd4b049ac74ab6d -
Trigger Event:
workflow_dispatch
-
Statement type: