Skip to main content

scikit-transformers is a very usefull package to enable and provide custom transformers such as LogColumnTransformer, BoolColumnTransformers and others fancy transformers.

Project description

image License: GPL v3 Python Repo Size PEP8 Poetry Coverage Tests Statics Doc Pypi GitHub commit activity

Scikit-transformers : Scikit-learn + Custom transformers

About

scikit-transformers is a very usefull package to enable and provide custom transformers such as LogColumnTransformer, BoolColumnTransformers and others fancy transformers.

It was created to provide a simple way to use custom transformers in scikit-learn pipelines, and allow to use them in a scikit-learn model, using GridSearchCV for testing and tuning hyperparameters.

The starting point was to provide a simple LogColumnTransformer, which is a simple wrapper around the numpy log function, making possible to use a skew threshold to apply the log transformation only on columns with a skew superior to a given threshold.

With scikit-transformers, it is now possible to use this LogColumnTransformer in transformer in a GridSearchCV using a skew threshold as hyperparameter to find what columns are good to log or not.

LogColumnTransformer is one of the many transformers implemented in scikit-transformers.

Installation

Using regular pip and venv tools :

python3 -m venv .venv
source .venv/bin/activate
pip install scikit-transformers

Usage

For a very basic usage :

import pandas as pd

from sktransf.trasnformer import LogColumnTransformer

df = pd.DataFrame(
    { "a": range(10),
      "b": range(10)
    }
)

logger = LogColumnTransformer()
logger.fit_transform(df)
df_transf = logger.transform(df)

Using common transformers :

import pandas as pd

from sktransf.transformer import LogColumnTransformer, BoolColumnTransformer
from sktransf.selector import DropUniqueColumnSelector

df = pd.DataFrame(
    { "a": range(10),
      "b": range(10)
    }
)

df_bool = BoolColumnTransformer().fit_transform(df)
df_unique = DropUniqueColumnTransformer().fit_transform(df)
df_logged = LogColumnTransformer().fit_transform(df)

Using a pipeline with a scikit-learn model :

import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression

from sktransf.transformer import LogColumnTransformer, BoolColumnTransformer
from sktransf.selector import DropUniqueColumnSelector

pipe = Pipeline([
    ('bool', BoolColumnTransformer()),
    ('unique', DropUniqueColumnTransformer()),
    ('log', LogColumnTransformer()),
    ('model', LinearRegression())
])

X = pd.DataFrame(
    { "a": range(10),
      "b": range(10)
    }
)

y = range(10)

pipe.fit(X, y)

y_pred = pipe.predict(X)

Documentation

For more specific information, please refer to the notebooks:

A complete documentation is be available on the github page.

Changelog, Releases and Roadmap

Please refer to the changelog page for more information.

Contributing

Pull requests are welcome.

For major changes, please open an issue first to discuss what you would like to change.

For more information, please refer to the contributing page.

License

GPLv3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_transformers-0.3.1.tar.gz (22.6 kB view details)

Uploaded Source

Built Distribution

scikit_transformers-0.3.1-py3-none-any.whl (26.1 kB view details)

Uploaded Python 3

File details

Details for the file scikit_transformers-0.3.1.tar.gz.

File metadata

  • Download URL: scikit_transformers-0.3.1.tar.gz
  • Upload date:
  • Size: 22.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.11.3 Linux/6.5.0-15-generic

File hashes

Hashes for scikit_transformers-0.3.1.tar.gz
Algorithm Hash digest
SHA256 5c1578daf6c0a93f0f015a7db4ecb675f2a59b3e0ed243f53fc6ee23eb030138
MD5 ea985016a40b9af0db50e592a6bc259d
BLAKE2b-256 dd01b95d328f3dfcd3313590a21ac6842780c7739c0ae3b1b8e148e23b18f110

See more details on using hashes here.

File details

Details for the file scikit_transformers-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for scikit_transformers-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 750a47393836b2a74ffb2febf1fbbe947339c124d058d30085ad842a72db786a
MD5 be4cdb53b097488ccd0b0a00d8f32143
BLAKE2b-256 71fbaf1077afa931cc6a69e3a48e1e8a094eafe9bc5e7e9efe381f7a7bb1ef6e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page