Polars IO
Project description
Polars IO utility library
Helpers to make it easier to read and write Hive partitioned parquet dataset with Polars.
It is meant to be a library to deal with datasets easily, but also contains a commandline interface which allows you to inspect parquet files and datasets more easily.
Dataset
Example of use of polario.dataset.HiveDataset
from polario.dataset import HiveDataset
import polars as pl
df = pl.from_dicts(
[
{"p1": 1, "v": 1},
{"p1": 2, "v": 1},
]
)
ds = HiveDataset("file:///tmp/", partition_columns=["p1"])
ds.write(df)
for partition_df in ds.read_partitions():
print(partition_df)
To model data storage, we use three layers: dataset, partition, fragment.
Each dataset is a lexical ordered set of partitions Each partition is a lexical ordered set of fragments Each fragment is a file on disk with rows in any order
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
polario-0.3.1.tar.gz
(10.7 kB
view hashes)
Built Distribution
polario-0.3.1-py3-none-any.whl
(11.6 kB
view hashes)