scikit-transformers is a very usefull package to enable and provide custom transformers such as LogColumnTransformer, BoolColumnTransformers and others fancy transformers.
Project description
Scikit-transformers : Scikit-learn + Custom transformers
About
scikit-transformers is a very usefull package to enable and provide custom transformers such as LogColumnTransformer
, BoolColumnTransformers
and others fancy transformers.
It was created to provide a simple way to use custom transformers in scikit-learn
pipelines, and allow to use them in a scikit-learn
model, using GridSearchCV
for testing and tuning hyperparameters.
The starting point was to provide a simple LogColumnTransformer
, which is a simple wrapper around the numpy log function, making possible to use a skew threshold to apply the log transformation only on columns with a skew superior to a given threshold.
With scikit-transformers
, it is now possible to use this LogColumnTransformer
in transformer in a GridSearchCV
using a skew threshold as hyperparameter to find what columns are good to log or not.
LogColumnTransformer
is one of the many transformers implemented in scikit-transformers
.
Installation
Using regular pip and venv tools :
python3 -m venv .venv
source .venv/bin/activate
pip install scikit-transformers
Usage
For a very basic usage :
import pandas as pd
from sktransf.trasnformer import LogColumnTransformer
df = pd.DataFrame(
{ "a": range(10),
"b": range(10)
}
)
logger = LogColumnTransformer()
logger.fit_transform(df)
df_transf = logger.transform(df)
Using common transformers :
import pandas as pd
from sktransf.transformer import LogColumnTransformer, BoolColumnTransformer
from sktransf.selector import DropUniqueColumnSelector
df = pd.DataFrame(
{ "a": range(10),
"b": range(10)
}
)
df_bool = BoolColumnTransformer().fit_transform(df)
df_unique = DropUniqueColumnTransformer().fit_transform(df)
df_logged = LogColumnTransformer().fit_transform(df)
Using a pipeline with a scikit-learn model :
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sktransf.transformer import LogColumnTransformer, BoolColumnTransformer
from sktransf.selector import DropUniqueColumnSelector
pipe = Pipeline([
('bool', BoolColumnTransformer()),
('unique', DropUniqueColumnTransformer()),
('log', LogColumnTransformer()),
('model', LinearRegression())
])
X = pd.DataFrame(
{ "a": range(10),
"b": range(10)
}
)
y = range(10)
pipe.fit(X, y)
y_pred = pipe.predict(X)
Documentation
For more specific information, please refer to the notebooks:
- Transformers :
- Selectors :
- Pipelines :
A complete documentation is be available on the github page.
Changelog, Releases and Roadmap
Please refer to the changelog page for more information.
Contributing
Pull requests are welcome.
For major changes, please open an issue first to discuss what you would like to change.
For more information, please refer to the contributing page.
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scikit_transformers-0.3.2.tar.gz
.
File metadata
- Download URL: scikit_transformers-0.3.2.tar.gz
- Upload date:
- Size: 22.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.0 CPython/3.11.3 Linux/6.5.0-17-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 076d10a0be99c0858457172725ef0075c849e8091187c7c0cd12f2c280a8569e |
|
MD5 | 9a94491c634bf915b4144baf3d03205c |
|
BLAKE2b-256 | 9f63c1bbb6b9c63a4988c727fce07b87eacf4e29b20939fb1f5324d136ce57ee |
File details
Details for the file scikit_transformers-0.3.2-py3-none-any.whl
.
File metadata
- Download URL: scikit_transformers-0.3.2-py3-none-any.whl
- Upload date:
- Size: 26.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.0 CPython/3.11.3 Linux/6.5.0-17-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1805228b8d30f8e468ac4dce1c0ea8990585f8d8ad7a7db3d5724f67cc751d1a |
|
MD5 | 6af6deb3400a38b75ec20b25e272b321 |
|
BLAKE2b-256 | ac995b02f2bb36441679b01c268ac2445efae2e021b4ae5cb3e20fbfe018c06e |