Feature engineering package with Scikit-learn's fit transform functionality
Project description
Feature Engine
Feature-engine is a Python library with multiple transformers to engineer features for use in machine learning models. Feature-engine's transformers follow scikit-learn's functionality with fit() and transform() methods to first learn the transforming parameters from data and then transform the data.
Feature-engine features in the following resources:
Blogs about Feature-engine:
-
Feature-engine: A new open-source Python package for feature engineering
-
Practical Code Implementations of Feature Engineering for Machine Learning with Python
Documentation
En Español:
More resources will be added as they appear online!
Current Feature-engine's transformers include functionality for:
- Missing Data Imputation
- Categorical Variable Encoding
- Outlier Capping or Removal
- Discretisation
- Numerical Variable Transformation
- Scikit-learn Wrappers
- Variables Combination
- Variable Selection
Imputing Methods
- MeanMedianImputer
- RandomSampleImputer
- EndTailImputer
- AddNaNBinaryImputer
- CategoricalVariableImputer
- FrequentCategoryImputer
- ArbitraryNumberImputer
Encoding Methods
- CountFrequencyCategoricalEncoder
- OrdinalCategoricalEncoder
- MeanCategoricalEncoder
- WoERatioCategoricalEncoder
- OneHotCategoricalEncoder
- RareLabelCategoricalEncoder
Outlier Handling methods
- Winsorizer
- ArbitraryOutlierCapper
- OutlierTrimmer
Discretisation methods
- EqualFrequencyDiscretiser
- EqualWidthDiscretiser
- DecisionTreeDiscretiser
- UserInputDiscreriser
Variable Transformation methods
- LogTransformer
- ReciprocalTransformer
- PowerTransformer
- BoxCoxTransformer
- YeoJohnsonTransformer
Scikit-learn Wrapper:
- SklearnTransformerWrapper
Variable Combinations:
- MathematicalCombinator
Feature Selection:
- DropFeatures
Installing
From PyPI using pip:
pip install feature_engine
From Anaconda:
conda install -c conda-forge feature_engine
Or simply clone it:
git clone https://github.com/solegalli/feature_engine.git
Usage
>>> from feature_engine.categorical_encoders import RareLabelCategoricalEncoder
>>> import pandas as pd
>>> data = {'var_A': ['A'] * 10 + ['B'] * 10 + ['C'] * 2 + ['D'] * 1}
>>> data = pd.DataFrame(data)
>>> data['var_A'].value_counts()
Out[1]:
A 10
B 10
C 2
D 1
Name: var_A, dtype: int64
>>> rare_encoder = RareLabelCategoricalEncoder(tol=0.10, n_categories=3)
>>> data_encoded = rare_encoder.fit_transform(data)
>>> data_encoded['var_A'].value_counts()
Out[2]:
A 10
B 10
Rare 3
Name: var_A, dtype: int64
See more usage examples in the Jupyter Notebooks in the example folder of this repository, or in the documentation.
Contributing
Details about how to contribute can be found in the Contributing Page
In short:
Local Setup Steps
- Fork the repo
- Clone your fork into your local computer:
git clone https://github.com/<YOURUSERNAME>/feature_engine.git
- cd into the repo
cd feature_engine
- Install as a developer:
pip install -e .
- Create and activate a virtual environment with any tool of choice
- Install the dependencies as explained in the Contributing Page
- Create a feature branch with a meaningful name for your feature:
git checkout -b myfeaturebranch
- Develop your feature, tests and documentation
- Make sure the tests pass
- Make a PR
Thank you!!
Opening Pull Requests
PR's are welcome! Please make sure the CI tests pass on your branch.
Tests
We prefer tox. In your environment:
- Run
pip install tox
- cd into the root directory of the repo:
cd feature_engine
- Run
tox
If the tests pass, the code is functional.
You can also run the tests in your environment (without tox). For guidelines on how to do so, check the Contributing Page.
Documentation
Feature-engine documentation is built using Sphinx and is hosted on Read the Docs.
To build the documentation make sure you have the dependencies installed. From the root directory: pip install -r docs/requirements.txt
.
Now you can build the docs: sphinx-build -b html docs build
License
BSD 3-Clause
References
Many of the engineering and encoding functionalities are inspired by this series of articles from the 2009 KDD Competition.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for feature_engine-0.6.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52e54860c0ee7e256a8db35d2a9c03bdd6f8dfcdde377cd4f72a3545daad3b18 |
|
MD5 | ba529c17f2beb0bdbef8b4c4a7fc62f0 |
|
BLAKE2b-256 | 14ed5680bf401855b788f79cadc1298c210c5860eb5d54c4008cfa234b752ef1 |