Skip to main content

Polars IO

Project description

Polars IO utility library

Helpers to make it easier to read and write Hive partitioned parquet dataset with Polars.

It is meant to be a library to deal with datasets easily, but also contains a commandline interface which allows you to inspect parquet files and datasets more easily.

Dataset

Example of use of polario.dataset.HiveDataset

from polario.dataset import HiveDataset
import polars as pl
df = pl.from_dicts(
        [
            {"p1": 1, "v": 1},
            {"p1": 2, "v": 1},
        ]
    )

ds = HiveDataset("file:///tmp/", partition_columns=["p1"])

ds.write(df)

for partition_df in ds.read_partitions():
    print(partition_df)

To model data storage, we use three layers: dataset, partition, fragment.

Each dataset is a lexical ordered set of partitions Each partition is a lexical ordered set of fragments Each fragment is a file on disk with rows in any order

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polario-0.3.1.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

polario-0.3.1-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file polario-0.3.1.tar.gz.

File metadata

  • Download URL: polario-0.3.1.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for polario-0.3.1.tar.gz
Algorithm Hash digest
SHA256 c0fd2b319a2cf6afe0002f591bb8cf978dcaf7aeba9eaa1dfa092b818e917d2b
MD5 60a41276b7f0379916afcec309aad197
BLAKE2b-256 3c4ed4528725b88d9b5599fd2e4342e30312c355605b73822cacba4ce367b9e2

See more details on using hashes here.

File details

Details for the file polario-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: polario-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for polario-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 81211e7fc6ff625e31534e4ae4078fee2bd15b6f44dfff2f00f772223b71c866
MD5 290090232097d0a0d6c901bbfbcd13e0
BLAKE2b-256 dc07439369f7315f2c070fc3bd47bb41f265b5e9848e0f697bdfb00f4c40ef4d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page