Skip to main content

Dagster integration library for Polars

Project description

dagster-polars

Polars integration library for Dagster.

Features

  • PolarsIOManager is a base class for IO managers that work with Polars DataFrames. Shouldn't be used directly unless you want to implement your own IOManager.
    • returns the correct type (polars.DataFrame or polars.LazyFrame) based on the type annotation
    • logs various metadata about the DataFrame - size, schema, sample, stats, ...
    • the "columns" input metadata value can be used to select a subset of columns
    • inherits all the features of the UPathIOManager - works with local and remote filesystems (like S3), supports loading multiple partitions (use dict[str, pl.DataFrame] type annotation), ...
    • Implemented serialization formats:
      • PolarsParquetIOManager - for reading and writing files in Apache Parquet format. Supports reading partitioned Parquet datasets (for example, often produced by Spark).

Quickstart

Installation

pip install dagster-polars

Usage

from dagster import asset, Definitions
from dagster_polars import PolarsParquetIOManager
import polars as pl


@asset(io_manager_key="polars_parquet_io_manager")
def upstream() -> pl.DataFrame:
    df: pl.DataFrame = ...
    return df


@asset(io_manager_key="polars_parquet_io_manager")
def downstream(upstream: pl.LazyFrame) -> pl.DataFrame:
    df = ...  # some lazy operations with `upstream`
    return df.collect()


definitions = Definitions(
    assets=[upstream, downstream],
    resources={
        "polars_parquet_io_manager": PolarsParquetIOManager(base_dir="/remote/or/local/path")
    }
)

Development

Installation

poetry install
poetry run pre-commit install

Testing

poetry run pytest

TODO

  • Data validation like in dagster-pandas
  • Maybe use DagsterTypeLoader ?

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dagster_polars-0.0.1.tar.gz (8.9 kB view hashes)

Uploaded Source

Built Distribution

dagster_polars-0.0.1-py3-none-any.whl (10.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page