Dagster integration library for Polars
Project description
dagster-polars
Polars integration library for Dagster.
Features
PolarsIOManager
is a base class for IO managers that work with Polars DataFrames. Shouldn't be used directly unless you want to implement your ownIOManager
.- returns the correct type (
polars.DataFrame
orpolars.LazyFrame
) based on the type annotation - logs various metadata about the DataFrame - size, schema, sample, stats, ...
- the "columns" input metadata value can be used to select a subset of columns
- inherits all the features of the
UPathIOManager
- works with local and remote filesystems (like S3), supports loading multiple partitions (usedict[str, pl.DataFrame]
type annotation), ... - Implemented serialization formats:
PolarsParquetIOManager
- for reading and writing files in Apache Parquet format. Supports reading partitioned Parquet datasets (for example, often produced by Spark).
- returns the correct type (
Quickstart
Installation
pip install dagster-polars
Usage
from dagster import asset, Definitions
from dagster_polars import PolarsParquetIOManager
import polars as pl
@asset(io_manager_key="polars_parquet_io_manager")
def upstream() -> pl.DataFrame:
df: pl.DataFrame = ...
return df
@asset(io_manager_key="polars_parquet_io_manager")
def downstream(upstream: pl.LazyFrame) -> pl.DataFrame:
df = ... # some lazy operations with `upstream`
return df.collect()
definitions = Definitions(
assets=[upstream, downstream],
resources={
"polars_parquet_io_manager": PolarsParquetIOManager(base_dir="/remote/or/local/path")
}
)
Development
Installation
poetry install
poetry run pre-commit install
Testing
poetry run pytest
TODO
- Data validation like in dagster-pandas
- Maybe use
DagsterTypeLoader
?
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dagster_polars-0.0.1.tar.gz
(8.9 kB
view hashes)
Built Distribution
Close
Hashes for dagster_polars-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 291372e1b8f6506be97c41ec813b94e16409f04b93fc26efbdbd6f1f6e210578 |
|
MD5 | 47152ea9e8d19c4572b5414d16a37dd6 |
|
BLAKE2b-256 | 718c4a8dac31f51565507a0ff4627769cabb7c04a4a138a964149b9ecb20adf8 |