A light library to preprocess data with polars
Project description
grizz
Overview
grizz
is a light library to ingest and transform data
in polars DataFrame.
grizz
uses an object-oriented strategy, where ingestors and transformers are building blocks that
can be combined together.
grizz
can be extend to add custom DataFrame ingestors and transformers.
For example, the following example shows how to change the casting of some columns.
>>> import polars as pl
>>> from grizz.transformer import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ str ┆ i32 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
Documentation
- latest (stable): documentation from the latest stable release.
- main (unstable): documentation associated to the main branch of the repo. This documentation may contain a lot of work-in-progress/outdated/missing parts.
Installation
We highly recommend installing
a virtual environment.
grizz
can be installed from pip using the following command:
pip install grizz
To make the package as slim as possible, only the minimal packages required to use grizz
are
installed.
To include all the dependencies, you can use the following command:
pip install grizz[all]
Please check the get started page to see how to
install only some specific dependencies or other alternatives to install the library.
The following is the corresponding grizz
versions and their dependencies.
grizz |
coola |
iden |
objectory |
polars |
python |
---|---|---|---|---|---|
main |
>=0.7,<1.0 |
>=0.0.4,<1.0 |
>=0.1,<1.0 |
>=1.0,<2.0 |
>=3.9,<3.13 |
Optional dependencies
grizz |
clickhouse-connect * |
pyarrow * |
tqdm * |
---|---|---|---|
main |
>=0.7,<1.0 |
>=10.0,<17.0 |
>=4.65,<5.0 |
* indicates an optional dependency
Contributing
Please check the instructions in CONTRIBUTING.md.
Suggestions and Communication
Everyone is welcome to contribute to the community. If you have any questions or suggestions, you can submit Github Issues. We will reply to you as soon as possible. Thank you very much.
API stability
:warning: While grizz
is in development stage, no API is guaranteed to be stable from one
release to the next.
In fact, it is very likely that the API will change multiple times before a stable 1.0.0 release.
In practice, this means that upgrading grizz
to a new version will possibly break any code that
was using the old version of grizz
.
License
grizz
is licensed under BSD 3-Clause "New" or "Revised" license available in LICENSE
file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file grizz-0.0.3.tar.gz
.
File metadata
- Download URL: grizz-0.0.3.tar.gz
- Upload date:
- Size: 25.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.4 Linux/6.5.0-1022-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | baa3db5bc8b6a096bb18b596ecfbdba806f2a49193c89d03d174bc0881ef57ae |
|
MD5 | 918b27b826aef0dbeb0cf64d4cbbbabf |
|
BLAKE2b-256 | c65e9a1fb66eced8b54eae9649b64d9bb4fd47966f0f1ba7b86772c1838b05e3 |
File details
Details for the file grizz-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: grizz-0.0.3-py3-none-any.whl
- Upload date:
- Size: 40.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.4 Linux/6.5.0-1022-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9202b03d319c5ec5baee65858aa118005ba1c2f5029d4235d7d056bd3e022e01 |
|
MD5 | 254be64523247033a4c958fb3c143b52 |
|
BLAKE2b-256 | 39eb3ca9b4b90bd4799da6ed6612cf61f766d79b2a7b0252c340946472f1cf61 |