Skip to main content

A light library to preprocess data with polars

Project description

grizz

CI Nightly Tests Nightly Package Tests
Documentation Documentation
Codecov
Code style: black Doc style: google Ruff Doc style: google
PYPI version Python BSD-3-Clause
Downloads Monthly downloads

Overview

grizz is a light library to ingest and transform data in polars DataFrame. grizz uses an object-oriented strategy, where ingestors and transformers are building blocks that can be combined together. grizz can be extend to add custom DataFrame ingestors and transformers. For example, the following example shows how to change the casting of some columns.

>>> import polars as pl
>>> from grizz.transformer import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ str  ┆ i32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

Documentation

  • latest (stable): documentation from the latest stable release.
  • main (unstable): documentation associated to the main branch of the repo. This documentation may contain a lot of work-in-progress/outdated/missing parts.

Installation

We highly recommend installing a virtual environment. grizz can be installed from pip using the following command:

pip install grizz

To make the package as slim as possible, only the minimal packages required to use grizz are installed. To include all the dependencies, you can use the following command:

pip install grizz[all]

Please check the get started page to see how to install only some specific dependencies or other alternatives to install the library. The following is the corresponding grizz versions and their dependencies.

grizz coola iden objectory polars python
main >=0.8.5,<1.0 >=0.1.0,<1.0 >=0.2,<1.0 >=1.0,<2.0 >=3.9,<3.14
0.1.1 >=0.8.5,<1.0 >=0.1.0,<1.0 >=0.2,<1.0 >=1.0,<2.0 >=3.9,<3.14
0.1.0 >=0.8.4,<1.0 >=0.1.0,<1.0 >=0.2,<1.0 >=1.0,<2.0 >=3.9,<3.14
0.0.5 >=0.7,<1.0 >=0.0.4,<1.0 >=0.1,<1.0 >=1.0,<2.0 >=3.9,<3.13
0.0.4 >=0.7,<1.0 >=0.0.4,<1.0 >=0.1,<1.0 >=1.0,<2.0 >=3.9,<3.13

Optional dependencies

grizz clickhouse-connect* pyarrow* tqdm*
main >=0.7,<1.0 >=10.0,<19.0 >=4.65,<5.0
0.1.1 >=0.7,<1.0 >=10.0,<19.0 >=4.65,<5.0
0.1.0 >=0.7,<1.0 >=10.0,<18.0 >=4.65,<5.0
0.0.5 >=0.7,<1.0 >=10.0,<18.0 >=4.65,<5.0
0.0.4 >=0.7,<1.0 >=10.0,<17.0 >=4.65,<5.0

* indicates an optional dependency

Contributing

Please check the instructions in CONTRIBUTING.md.

Suggestions and Communication

Everyone is welcome to contribute to the community. If you have any questions or suggestions, you can submit Github Issues. We will reply to you as soon as possible. Thank you very much.

API stability

:warning: While grizz is in development stage, no API is guaranteed to be stable from one release to the next. In fact, it is very likely that the API will change multiple times before a stable 1.0.0 release. In practice, this means that upgrading grizz to a new version will possibly break any code that was using the old version of grizz.

License

grizz is licensed under BSD 3-Clause "New" or "Revised" license available in LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grizz-0.1.1.tar.gz (28.1 kB view details)

Uploaded Source

Built Distribution

grizz-0.1.1-py3-none-any.whl (45.5 kB view details)

Uploaded Python 3

File details

Details for the file grizz-0.1.1.tar.gz.

File metadata

  • Download URL: grizz-0.1.1.tar.gz
  • Upload date:
  • Size: 28.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.5.0-1025-azure

File hashes

Hashes for grizz-0.1.1.tar.gz
Algorithm Hash digest
SHA256 63fc80e1b526d7830a9eb8e152bf8dad315ad10d799bca5443de063f9eea9e69
MD5 8df28871bebc81af6ad227fb5a3bea62
BLAKE2b-256 0b2da3fc6f6a5765f8517c0f3745384710d2fec8eea511886eb2eed4c3c6b8c3

See more details on using hashes here.

File details

Details for the file grizz-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: grizz-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 45.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.5.0-1025-azure

File hashes

Hashes for grizz-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b99789f7f02dfc6bec4d1633c66cc2c77b164993538f04122825216a1712293d
MD5 a7ab30bfd0c664e86703da989823b54e
BLAKE2b-256 7e39e46d9db946767f8db808a5f824fb62041787046521a9aa833c010a05ed3f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page