Skip to main content

Extract column-level lineage from Polars LazyFrame transformations.

Project description

polars-lineage

Extract column-level lineage from Polars LazyFrame transformations.

This PoC uses a metadata-first workflow through a LazyFrame.lineage namespace. Attach source metadata with lineage.add_source(...), then run lineage.extract().

Install and Setup

pip install polars-lineage

For local development:

uv sync --dev

Python API (Metadata First)

Importing polars_lineage registers LazyFrame.lineage.

lineage.add_source(...) accepts:

  • name: logical source name
  • uri: source URI
  • optional destination_table: explicit destination FQN

If destination_table is not provided, a deterministic destination FQN is derived.

Example:

import polars as pl
import polars_lineage  # registers LazyFrame.lineage namespace

df_orders = (
    pl.DataFrame({"id": [1, 2], "amount": [10, 20]})
    .lazy()
    .lineage.add_source(name="orders", uri="postgres://warehouse/svc.db.raw.orders")
)

df_accounts = (
    pl.DataFrame({"id": [1, 2], "segment": ["A", "B"]})
    .lazy()
    .lineage.add_source(name="accounts", uri="https://crm/accounts")
)

lineage = (
    df_orders.join(df_accounts, on="id", how="left")
    .with_columns(pl.col("amount").alias("amount_copy"))
    .lineage.extract()
)

print(lineage)

markdown = (
    df_orders.join(df_accounts, on="id", how="left")
    .with_columns(pl.col("amount").alias("amount_copy"))
    .lineage.to_markdown()
)

print(markdown)

URI parsing notes:

  • If the URI path ends with a table FQN (service.database.schema.table), that FQN is used.
  • Otherwise lineage derives source FQN from URI parts:
    • service: URI scheme
    • database: URI hostname (or external)
    • schema: public
    • table: final URI path segment

Namespace Notes

  • lineage.add_source(...) returns the same pl.LazyFrame instance.
  • Metadata is propagated through common lazy operations (including joins).
  • lineage.extract() returns deterministic OpenMetadata-style payloads.

Current Capabilities

  • Projection lineage (select, with_columns)
  • Literals and aliases
  • Basic expression dependency extraction (arithmetic, casts, conditional-like patterns)
  • Transitive dependency resolution
  • Join-aware attribution with explicit left/right mapping aliases
  • Group-by aggregation expression and key coverage
  • Deterministic OpenMetadata payload export
  • Deterministic custom JSON export via typed LineageDocument model
  • Deterministic Markdown lineage rendering

Current Constraints

  • Multiple joins in one parsed plan are rejected.
  • Ambiguous non-join overlapping columns are rejected with clear errors.
  • For static type checking, dynamically registered LazyFrame.lineage may require stubs.

Development

uv run pytest
uv run ruff check .
uv run mypy
uv build

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_lineage-0.1.4.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polars_lineage-0.1.4-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file polars_lineage-0.1.4.tar.gz.

File metadata

  • Download URL: polars_lineage-0.1.4.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polars_lineage-0.1.4.tar.gz
Algorithm Hash digest
SHA256 b0b15dc88d403bc3e6940f5a260c226351440b8b8f746bdfed1e2b7eade93da4
MD5 9e85ec694223cc614a81d1d82414c2e2
BLAKE2b-256 bcf1275b0420262fcedf4425b6e89dfb618c9f31fbf9598303a55a8532b877dc

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_lineage-0.1.4.tar.gz:

Publisher: release.yml on davzucky/polars-lineage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_lineage-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: polars_lineage-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polars_lineage-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 1a5d2732532b509197a5d7b55c12a29d8958656d0809a16d07cc3ad1c49ced48
MD5 2c95c916672c3cb5e6ea1699b08f248b
BLAKE2b-256 af02373b7b0c06f898a65ae0f343f68107980e5184c8711793dbd79451de68c0

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_lineage-0.1.4-py3-none-any.whl:

Publisher: release.yml on davzucky/polars-lineage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page