Skip to main content

Visual data pipeline debugger for Polars — stop print-debugging your pipelines

Project description

flowview

CI PyPI Python License

Visual data pipeline debugger for Polars. Stop print-debugging your pipelines.

flowview demo

Install

pip install flowview

or with uv:

uv add flowview

Quick Start

Add @fv.trace to any function that transforms a Polars DataFrame. flowview traces every method call and renders a visual flow in your terminal.

import polars as pl
import flowview as fv

@fv.trace
def process(df: pl.DataFrame) -> pl.DataFrame:
    return (
        df.filter(pl.col("status") == "active")
          .with_columns((pl.col("price") * pl.col("quantity")).alias("revenue"))
          .group_by("category")
          .agg(pl.col("revenue").sum().alias("total_revenue"))
          .sort("total_revenue", descending=True)
    )

df = pl.DataFrame({
    "status": ["active", "inactive", "active"],
    "category": ["Books", "Books", "Electronics"],
    "price": [14.99, 599.99, 299.99],
    "quantity": [5, 1, 2],
})

result = process(df)

.pipe() chains work too:

@fv.trace
def process(df: pl.DataFrame) -> pl.DataFrame:
    return df.pipe(clean).pipe(filter_active).pipe(add_revenue)

What You See

Each step in your pipeline is displayed as a box showing:

  • Row count with diff from the previous step (e.g., 700 rows x 4 cols (-300 rows))
  • Schema changes — columns added or removed (e.g., +cols: revenue -cols: status)
  • Sample data — first N rows at each transformation
  • Execution time per step

Steps are connected with arrows to show the flow. A summary footer shows the total step count and wall-clock time.

Supported Operations

flowview traces any DataFrame method that returns a new DataFrame. These methods get human-readable step names:

Method Step name example
filter(expr) filter((col("status")) == ("active"))
with_columns(exprs) with_columns(revenue, tax)
select(cols) select(status, price)
drop(cols) drop(status, category)
rename(mapping) rename(price->unit_price)
sort(cols) sort(price, quantity)
head(n) / tail(n) head(10) / tail(5)
unique(subset) unique(id)
join(other, ...) join(on=id, how=left)
group_by(cols).agg(exprs) group_by(category).agg(total_revenue)
pipe(fn) uses the function name, e.g. clean_data

Other methods (e.g., explode, melt, unpivot) are traced with a fallback name like explode('tags').

Options

@fv.trace(sample_rows=3, show_sample=True, show_schema=True)
def process(df):
    ...
Option Type Default Description
sample_rows int 5 Number of sample rows to capture at each step
show_sample bool True Display sample data tables in the output
show_schema bool False Display the full schema at each step

How It Works

The @fv.trace decorator wraps the first DataFrame argument in a lightweight proxy before calling your function. The proxy intercepts every method call, captures a snapshot of the result (row count, schema, sample rows, timing), and delegates to the real Polars DataFrame underneath. When your function returns, the proxy is unwrapped and you get back a regular pl.DataFrame.

There is no monkey-patching and no global state. Each decorated call is fully isolated.

Limitations

  • LazyFrame is not supported — df.lazy() exits the proxy. Only eager DataFrames are traced.
  • GroupBy shortcuts like .count(), .sum(), .first() on a GroupBy object are not traced — use .agg() instead.
  • Pipe internals are not individually traced — df.pipe(fn) produces a single step named after fn, not one step per operation inside fn.
  • IDE autocomplete may not show DataFrame methods inside the decorated function body.
  • type(df) returns TracedDataFrame inside the decorated function. isinstance(df, pl.DataFrame) works correctly.
  • Only the first DataFrame argument is wrapped when a function takes multiple DataFrames.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flowview-0.2.0.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flowview-0.2.0-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file flowview-0.2.0.tar.gz.

File metadata

  • Download URL: flowview-0.2.0.tar.gz
  • Upload date:
  • Size: 18.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for flowview-0.2.0.tar.gz
Algorithm Hash digest
SHA256 bbafe5c131640e9e3fb11c087c2a577fad4d98867c3287aecf0fa6746beef7cd
MD5 9093e7364ccdd5b953966c00f70aa1fc
BLAKE2b-256 404a1a363d6bd0ad208871f316b9c73d62d1a5b261be70469f342e4ceff64a3f

See more details on using hashes here.

Provenance

The following attestation bundles were made for flowview-0.2.0.tar.gz:

Publisher: cd.yml on guillermodotn/flowview

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flowview-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: flowview-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 13.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for flowview-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7f20e640f9c229f4fcba290f230ace8bfd91a2dbd56b7fda2b18544e338dd1c8
MD5 e9234ef5dd2808f64a1efdee2f3f6ee2
BLAKE2b-256 475cf25db383b9daeb959fd7fa7de9a4ba0fc12b51b08ca48dfae2ab3c8ab6a1

See more details on using hashes here.

Provenance

The following attestation bundles were made for flowview-0.2.0-py3-none-any.whl:

Publisher: cd.yml on guillermodotn/flowview

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page