Skip to main content

Extract column-level lineage from Polars LazyFrame transformations.

Project description

polars-lineage

Extract column-level lineage from Polars LazyFrame transformations.

This PoC uses a metadata-first workflow through a LazyFrame.lineage namespace. Attach source metadata with lineage.add_source(...), then run lineage.extract().

Install and Setup

pip install polars-lineage

For local development:

uv sync --dev

Python API (Metadata First)

Importing polars_lineage registers LazyFrame.lineage.

lineage.add_source(...) accepts:

  • name: logical source name
  • uri: source URI
  • optional destination_table: explicit destination FQN

If destination_table is not provided, a deterministic destination FQN is derived.

Example:

import polars as pl
import polars_lineage  # registers LazyFrame.lineage namespace

df_orders = (
    pl.DataFrame({"id": [1, 2], "amount": [10, 20]})
    .lazy()
    .lineage.add_source(name="orders", uri="postgres://warehouse/svc.db.raw.orders")
)

df_accounts = (
    pl.DataFrame({"id": [1, 2], "segment": ["A", "B"]})
    .lazy()
    .lineage.add_source(name="accounts", uri="https://crm/accounts")
)

lineage = (
    df_orders.join(df_accounts, on="id", how="left")
    .with_columns(pl.col("amount").alias("amount_copy"))
    .lineage.extract()
)

print(lineage)

markdown = (
    df_orders.join(df_accounts, on="id", how="left")
    .with_columns(pl.col("amount").alias("amount_copy"))
    .lineage.to_markdown()
)

print(markdown)

URI parsing notes:

  • If the URI path ends with a table FQN (service.database.schema.table), that FQN is used.
  • Otherwise lineage derives source FQN from URI parts:
    • service: URI scheme
    • database: URI hostname (or external)
    • schema: public
    • table: final URI path segment

Namespace Notes

  • lineage.add_source(...) returns the same pl.LazyFrame instance.
  • Metadata is propagated through common lazy operations (including joins).
  • lineage.extract() returns deterministic OpenMetadata-style payloads.

Current Capabilities

  • Projection lineage (select, with_columns)
  • Literals and aliases
  • Basic expression dependency extraction (arithmetic, casts, conditional-like patterns)
  • Transitive dependency resolution
  • Join-aware attribution with explicit left/right mapping aliases
  • Group-by aggregation expression and key coverage
  • Deterministic OpenMetadata payload export
  • Deterministic custom JSON export via typed LineageDocument model
  • Deterministic Markdown lineage rendering

Current Constraints

  • Multiple joins in one parsed plan are rejected.
  • Ambiguous non-join overlapping columns are rejected with clear errors.
  • For static type checking, dynamically registered LazyFrame.lineage may require stubs.

Development

uv run pytest
uv run ruff check .
uv run mypy
uv build

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_lineage-0.1.6.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polars_lineage-0.1.6-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file polars_lineage-0.1.6.tar.gz.

File metadata

  • Download URL: polars_lineage-0.1.6.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polars_lineage-0.1.6.tar.gz
Algorithm Hash digest
SHA256 b5e99f7cd5d348bd6265fe29508783320a8b0648c25d7f5e5345e59df60ad513
MD5 3d3c0ac983ada0e89beea06f08e618e0
BLAKE2b-256 7f27574557c44f4077b191e9484df36a3d0b55820cab52d6f1e8158d2bcdb0fe

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_lineage-0.1.6.tar.gz:

Publisher: release.yml on davzucky/polars-lineage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_lineage-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: polars_lineage-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 18.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polars_lineage-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 b111cf7226bba731a19ea818f4ff88acad523fec98598928351139bbe08c5ecc
MD5 b30f336ceb13849117044532e5ad81bb
BLAKE2b-256 eca0591d408249bda71950878d55ef7cba893caf9ec194ae036f0285d6189afa

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_lineage-0.1.6-py3-none-any.whl:

Publisher: release.yml on davzucky/polars-lineage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page