Dagster integration library for Polars
Project description
dagster-polars
Polars integration library for Dagster.
Features
- All IOManagers log various metadata about the DataFrame - size, schema, sample, stats, ...
- For all IOManagers the
"columns"
input metadata key can be used to select a subset of columns to load BasePolarsUPathIOManager
is a base class for IO managers that work with Polars DataFrames. Shouldn't be used directly unless you want to implement your ownIOManager
.- returns the correct type (
polars.DataFrame
orpolars.LazyFrame
) based on the type annotation - inherits all the features of the
UPathIOManager
- works with local and remote filesystems (like S3), supports loading multiple partitions (usedict[str, pl.DataFrame]
type annotation), ... - Implemented serialization formats:
PolarsParquetIOManager
- for reading and writing files in Apache Parquet format. Supports reading partitioned Parquet datasets (for example, often produced by Spark).
- returns the correct type (
BigQueryPolarsIOManager
- for reading and writing data from/to BigQuery. Supports writing partitioned tables ("partition_expr"
input metadata key must be specified).
Quickstart
Installation
pip install dagster-polars
To use the BigQueryPolarsIOManager
you need to install the gcp
extra:
pip install 'dagster-polars[gcp]'
Usage
import polars as pl
from dagster import asset, Definitions
from dagster_polars import PolarsParquetIOManager
@asset(io_manager_key="polars_parquet_io_manager")
def upstream() -> pl.DataFrame:
df: pl.DataFrame = ...
return df
@asset(io_manager_key="polars_parquet_io_manager")
def downstream(upstream: pl.LazyFrame) -> pl.DataFrame:
df = ... # some lazy operations with `upstream`
return df.collect()
definitions = Definitions(
assets=[upstream, downstream],
resources={
"polars_parquet_io_manager": PolarsParquetIOManager(base_dir="/remote/or/local/path")
}
)
Development
Installation
poetry install
poetry run pre-commit install
Testing
poetry run pytest
TODO
- Add
PolarsDeltaIOManager
- Data validation like in dagster-pandas
- Maybe use
DagsterTypeLoader
?
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dagster_polars-0.0.2.tar.gz
(11.9 kB
view hashes)
Built Distribution
Close
Hashes for dagster_polars-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 481a43bad3fc3ee72211399b10a8e2b45d858652a6252ab5a764ca715b1be5c1 |
|
MD5 | 07cd90f09ef338b36fe0223beb847b30 |
|
BLAKE2b-256 | c64e71018af066fa8ffce68a27f8f04f8752619143b2bb795d436df687a67e7f |