Feature engineering package that follows sklearn functionality
Project description
Feature Engine
Feature-engine is a Python library that contains several transformers to engineer features for use in machine learning models. Feature-engine's transformers follow Scikit-learn like functionality with fit() and transform() methods to first learn the transforming paramenters from data and then transform the data. Current Feature-engine's transformers include functionality for:
- Missing data imputation
- Categorical variable encoding
- Outlier removal
- Discretisation
- Numerical Variable Transformation
Important Links
- Documentation: http://feature-engine.readthedocs.io
- Home page: https://www.trainindata.com/feature-engine
Imputing Methods
- MeanMedianImputer
- RandomSampleImputer
- EndTailImputer
- AddNaNBinaryImputer
- CategoricalVariableImputer
- FrequentCategoryImputer
- ArbitraryNumberImputer
Encoding Methods
- CountFrequencyCategoricalEncoder
- OrdinalCategoricalEncoder
- MeanCategoricalEncoder
- WoERatioCategoricalEncoder
- OneHotCategoricalEncoder
- RareLabelCategoricalEncoder
Outlier Handling methods
- Winsorizer
- ArbitraryOutlierCapper
Discretisation methods
- EqualFrequencyDiscretiser
- EqualWidthDiscretiser
- DecisionTreeDiscretiser
Variable Transformation methods
- LogTransformer
- ReciprocalTransformer
- PowerTransformer
- BoxCoxTransformer
- YeoJohnsonTransformer
Installing
pip install feature_engine
or
git clone https://github.com/solegalli/feature_engine.git
Usage
from feature_engine.categorical_encoders import RareLabelEncoder
rare_encoder = RareLabelEncoder(tol = 0.05, n_categories=5)
rare_encoder.fit(data, variables = ['Cabin', 'Age'])
data_encoded = rare_encoder.transform(data)
See more usage examples in the jupyter notebooks in the example folder of this repository, or in the documentation: http://feature-engine.readthedocs.io
License
BSD 3-Clause
Authors
- Soledad Galli - Initial work - Feature Engineering Online Course.
References
Many of the engineering and encoding functionality is inspired by this series of articles from the 2009 KDD competition.
To learn more about the rationale, functionality, pros and cos of each imputer, encoder and transformer, refer to the Feature Engineering Online Course
For a summary of the methods check this presentation and this article
To stay alert of latest releases, sign up at trainindata
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for feature_engine-0.3.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24bd68dd78f8fe47a752e6978d3605495946f47028a154740e3c21d25f9c80ef |
|
MD5 | 7164874d70b36bf9b88d58488558eda9 |
|
BLAKE2b-256 | b30f7f7f60195879fc487aeaecba343f02c6f4426bc239b378b73655d40c1d06 |