A collection of various scikit-learn compatible transformers for all kinds of preprocessing and feature engineering

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

The machine

sk-transformers

A collection of various scikit-learn transformers for all kinds of preprocessing and feature engineering steps 🛠

python version

Introduction

Every data tabular is different. Every column needs to be treated differently. Scikit-learn has a nice collection of dataset transformers. But the possibilities of data transformation are infinite - one collection is simply not enough. This project provides a brought collection of data transformers. The idea is simple. It is like a well-equipped toolbox 🧰: You always find the tool you need and sometimes you get inspired by seeing a tool you did not know before. Please feel free to contribute your tools and ideas.

Installation

If you are using Poetry, you can install the package with the following command:

poetry add sk_transformers

If you are using pip, you can install the package with the following command:

pip install sk_transformers

installing dependencies

With Poetry:

poetry install

With pip:

pip install -r requirements.txt

The transformers

Data preprocessing often involves similar processes. No matter whether it's manipulating strings or numbers, etc. Scikit-learn's pipeline implementation makes it easy to structure and sequence such preprocessing processes. To take advantage of this, the transformers contain multiple methods that can be easily pipelined to simplify preprocessing. The list of transformers is open and will be extended permanently. Feel free to contribute! 🛠

Usage

Let's assume you want to use some method from NumPy's mathematical functions, to sum up the values of column foo and column bar. You could use the MathExpressionTransformer:

from sk_transformers import MathExpressionTransformer
import pandas as pd
X = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
transformer = MathExpressionTransformer([("foo", "np.sum", "bar", {"axis": 0})])
transformer.fit_transform(X).to_numpy()

array([[1, 4, 5],
       [2, 5, 7],
       [3, 6, 9]])

Even if we only pass one tuple to the transformer - in this example. Like with most other transformers the idea is to simplify preprocessing by giving the possibility to operate on multiple columns at the same time. In this case, the MathExpressionTransformer has created an extra column with the name foo_sum_bar.

Contributing

We're all kind of in the same boat. Preprocessing/feature engineering in data science is somehow very individual - every feature is different and must be handled and processed differently. But somehow we all have the same problems: sometimes date columns have to be changed. Sometimes strings have to be formatted, sometimes durations have to be calculated, etc. There is a huge number of preprocessing possibilities but we all use the same tools.

Scikit-learns pipelines help to use formalized functions. So why not also share these so-called transformers with others? This open source project has the goal to collect useful preprocessing pipeline steps. Let us all collect what we used for preprocessing and share it with others. This way we can all benefit from each other's work and save a lot of time. So if you have a preprocessing step that you use regularly, please feel free to contribute it to this project. The idea is that this is not only a toolbox but also an inspiration for what is possible. Maybe you have not thought about this preprocessing step before.

Please check out the guide on how to contribute to this project.

Further information

For further information, please refer to the documentation.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.11.0

Apr 25, 2023

0.10.3

Mar 10, 2023

0.10.2

Mar 9, 2023

0.10.0

Jan 26, 2023

0.9.1

Jan 23, 2023

0.9.0

Jan 20, 2023

0.8.0

Jan 18, 2023

0.7.4

Jan 16, 2023

0.7.3

Jan 12, 2023

0.7.2

Jan 10, 2023

0.7.1

Jan 6, 2023

0.7.0

Jan 6, 2023

0.6.3

Jan 4, 2023

0.6.2

Dec 22, 2022

0.6.1

Dec 22, 2022

0.6.0

Dec 22, 2022

0.5.8

Dec 16, 2022

0.5.7

Dec 15, 2022

0.5.6

Dec 15, 2022

0.5.5

Dec 15, 2022

0.5.4

Dec 15, 2022

0.5.3

Dec 13, 2022

This version

0.5.2

Dec 13, 2022

0.5.1

Dec 13, 2022

0.5.0

Dec 13, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sk_transformers-0.5.2.tar.gz (16.1 kB view hashes)

Uploaded Dec 13, 2022 Source

Built Distribution

sk_transformers-0.5.2-py3-none-any.whl (17.4 kB view hashes)

Uploaded Dec 13, 2022 Python 3

Hashes for sk_transformers-0.5.2.tar.gz

Hashes for sk_transformers-0.5.2.tar.gz
Algorithm	Hash digest
SHA256	`38a7fd5af2e3dce462d8ff8e94ec1409635b774fbd18a6aec38250c7829eeb8d`
MD5	`d741ad532a086df1f78ef66e6c833b8f`
BLAKE2b-256	`dcf19008444e987cb76ac8ce0dd758306d97bbac9ae80328638048ec8c619c7a`

Hashes for sk_transformers-0.5.2-py3-none-any.whl

Hashes for sk_transformers-0.5.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`82a40f30343e3078a1cb259d10c63ac87b030c7421116052c3864474acd509d5`
MD5	`844d912c5a642884a05e3a4d9041f2b2`
BLAKE2b-256	`6c9b1d2364245004f72c0b8c7c2deb9e7beb06ad46dcd112123922662fc18d4b`