Skip to main content

Extract column-level lineage from Polars LazyFrame transformations.

Project description

polars-lineage

Extract column-level lineage from Polars LazyFrame transformations.

This PoC uses a metadata-first workflow: attach metadata to each source LazyFrame with add_metadata(...), then run .extract_lineage() on the wrapped result.

Install and Setup

pip install polars-lineage

For local development:

uv sync --dev

Python API (Metadata First)

Importing polars_lineage registers LazyFrame.add_metadata(...).

add_metadata(...) accepts:

  • name: logical source name
  • uri: source URI
  • optional destination_table: explicit destination FQN

If destination_table is not provided, a deterministic destination FQN is derived.

Example:

import polars as pl
import polars_lineage  # registers LazyFrame.add_metadata

df_orders = (
    pl.DataFrame({"id": [1, 2], "amount": [10, 20]})
    .lazy()
    .add_metadata(name="orders", uri="postgres://warehouse/svc.db.raw.orders")
)

df_accounts = (
    pl.DataFrame({"id": [1, 2], "segment": ["A", "B"]})
    .lazy()
    .add_metadata(name="accounts", uri="https://crm/accounts")
)

lineage = (
    df_orders.join(df_accounts, on="id", how="left")
    .with_columns(pl.col("amount").alias("amount_copy"))
    .extract_lineage()
)

print(lineage)

URI parsing notes:

  • If the URI path ends with a table FQN (service.database.schema.table), that FQN is used.
  • Otherwise lineage derives source FQN from URI parts:
    • service: URI scheme
    • database: URI hostname (or external)
    • schema: public
    • table: final URI path segment

Wrapper Notes

  • LineageLazyFrame preserves metadata through chained LazyFrame operations.
  • Joining two wrapped frames merges source metadata as left and right.
  • .extract_lineage() returns deterministic OpenMetadata-style payloads.

Current Capabilities

  • Projection lineage (select, with_columns)
  • Literals and aliases
  • Basic expression dependency extraction (arithmetic, casts, conditional-like patterns)
  • Transitive dependency resolution
  • Join-aware attribution with explicit left/right mapping aliases
  • Group-by aggregation expression and key coverage
  • Deterministic OpenMetadata payload export
  • Deterministic custom JSON export via typed LineageDocument model
  • Deterministic Markdown lineage rendering

Current Constraints

  • Multiple joins in one parsed plan are rejected.
  • Ambiguous non-join overlapping columns are rejected with clear errors.
  • For static type checking, dynamically added LazyFrame.add_metadata(...) may require stubs.

Development

uv run pytest
uv run ruff check .
uv run mypy
uv build

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_lineage-0.1.3.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polars_lineage-0.1.3-py3-none-any.whl (17.3 kB view details)

Uploaded Python 3

File details

Details for the file polars_lineage-0.1.3.tar.gz.

File metadata

  • Download URL: polars_lineage-0.1.3.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polars_lineage-0.1.3.tar.gz
Algorithm Hash digest
SHA256 2a3df67722e4a3413fb73145c917de221211c8bc8b4dd3455dc259fae426ecfb
MD5 e2a09b966ca5086b538a7b12b8411a0e
BLAKE2b-256 fe03927b6c15a648199ea02c7799099f48593420def329d741d5a0e1cdf435af

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_lineage-0.1.3.tar.gz:

Publisher: release.yml on davzucky/polars-lineage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_lineage-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: polars_lineage-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 17.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polars_lineage-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 dcf8066967d1dc6b4144dc086a01d2d7a24e8570d236596df03c0b1ef0366f66
MD5 66c4a0896ec5099f8abccb69b902f778
BLAKE2b-256 e2aa303029c56043c9e1fbd931c97c28f2c16df42ece209a084fb4f6a636596b

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_lineage-0.1.3-py3-none-any.whl:

Publisher: release.yml on davzucky/polars-lineage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page