Feature engineering and selection package with Scikit-learn's fit transform functionality
Project description
Feature-engine
Package | |
Meta | |
Documentation | |
Testing |
Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models. Feature-engine's transformers follow Scikit-learn's functionality with fit() and transform() methods to learn the transforming parameters from the data and then transform it.
Feature-engine features in the following resources
Blogs about Feature-engine
-
Feature-engine: A new open-source Python package for feature engineering
-
Practical Code Implementations of Feature Engineering for Machine Learning with Python
Documentation
Pst! How did you find us?
We want to share Feature-engine with more people. It'd help us loads if you tell us how you discovered us.
Then we'd know what we are doing right and which channels to use to share the love.
Please share your story by answering 1 quick question at this link . 😃
Current Feature-engine's transformers include functionality for:
- Missing Data Imputation
- Categorical Encoding
- Discretisation
- Outlier Capping or Removal
- Variable Transformation
- Variable Creation
- Variable Selection
- Datetime Features
- Time Series
- Preprocessing
- Scaling
- Scikit-learn Wrappers
Imputation Methods
- MeanMedianImputer
- ArbitraryNumberImputer
- RandomSampleImputer
- EndTailImputer
- CategoricalImputer
- AddMissingIndicator
- DropMissingData
Encoding Methods
- OneHotEncoder
- OrdinalEncoder
- CountFrequencyEncoder
- MeanEncoder
- WoEEncoder
- RareLabelEncoder
- DecisionTreeEncoder
- StringSimilarityEncoder
Discretisation methods
- EqualFrequencyDiscretiser
- EqualWidthDiscretiser
- GeometricWidthDiscretiser
- DecisionTreeDiscretiser
- ArbitraryDiscreriser
Outlier Handling methods
- Winsorizer
- ArbitraryOutlierCapper
- OutlierTrimmer
Variable Transformation methods
- LogTransformer
- LogCpTransformer
- ReciprocalTransformer
- ArcsinTransformer
- PowerTransformer
- BoxCoxTransformer
- YeoJohnsonTransformer
Variable Scaling methods
- MeanNormalizationScaler
Variable Creation:
- MathFeatures
- RelativeFeatures
- CyclicalFeatures
- DecisionTreeFeatures()
Feature Selection:
- DropFeatures
- DropConstantFeatures
- DropDuplicateFeatures
- DropCorrelatedFeatures
- SmartCorrelationSelection
- ShuffleFeaturesSelector
- SelectBySingleFeaturePerformance
- SelectByTargetMeanPerformance
- RecursiveFeatureElimination
- RecursiveFeatureAddition
- DropHighPSIFeatures
- SelectByInformationValue
- ProbeFeatureSelection
- MRMR
Datetime
- DatetimeFeatures
- DatetimeSubtraction
Time Series
- LagFeatures
- WindowFeatures
- ExpandingWindowFeatures
Pipelines
- Pipeline
- make_pipeline
Preprocessing
- MatchCategories
- MatchVariables
Wrappers:
- SklearnTransformerWrapper
Installation
From PyPI using pip:
pip install feature_engine
From Anaconda:
conda install -c conda-forge feature_engine
Or simply clone it:
git clone https://github.com/feature-engine/feature_engine.git
Example Usage
>>> import pandas as pd
>>> from feature_engine.encoding import RareLabelEncoder
>>> data = {'var_A': ['A'] * 10 + ['B'] * 10 + ['C'] * 2 + ['D'] * 1}
>>> data = pd.DataFrame(data)
>>> data['var_A'].value_counts()
Out[1]:
A 10
B 10
C 2
D 1
Name: var_A, dtype: int64
>>> rare_encoder = RareLabelEncoder(tol=0.10, n_categories=3)
>>> data_encoded = rare_encoder.fit_transform(data)
>>> data_encoded['var_A'].value_counts()
Out[2]:
A 10
B 10
Rare 3
Name: var_A, dtype: int64
Find more examples in our Jupyter Notebook Gallery or in the documentation.
Contribute
Details about how to contribute can be found in the Contribute Page
Briefly:
- Fork the repo
- Clone your fork into your local computer:
git clone https://github.com/<YOURUSERNAME>/feature_engine.git
- navigate into the repo folder
cd feature_engine
- Install Feature-engine as a developer:
pip install -e .
- Optional: Create and activate a virtual environment with any tool of choice
- Install Feature-engine dependencies:
pip install -r requirements.txt
and
pip install -r test_requirements.txt
- Create a feature branch with a meaningful name for your feature:
git checkout -b myfeaturebranch
- Develop your feature, tests and documentation
- Make sure the tests pass
- Make a PR
Thank you!!
Documentation
Feature-engine documentation is built using Sphinx and is hosted on Read the Docs.
To build the documentation make sure you have the dependencies installed: from the root directory:
pip install -r docs/requirements.txt
Now you can build the docs using:
sphinx-build -b html docs build
License
The content of this repository is licensed under a BSD 3-Clause license.
Sponsor us
Sponsor us and support further our mission to democratize machine learning and programming tools through open-source software.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file feature_engine-1.8.2.tar.gz
.
File metadata
- Download URL: feature_engine-1.8.2.tar.gz
- Upload date:
- Size: 232.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d51d3197b9245ec1c286f6562111788c1be927ba4f700d2064ea94f623acab01 |
|
MD5 | 2ed2dd8455ea09615c5e29b9fab68d79 |
|
BLAKE2b-256 | 6efbf00c5c3153d97faea382ee17ea26fc4cc6351e31bc9c0438da70c685faae |
File details
Details for the file feature_engine-1.8.2-py2.py3-none-any.whl
.
File metadata
- Download URL: feature_engine-1.8.2-py2.py3-none-any.whl
- Upload date:
- Size: 375.0 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2315b0625beec8a52801d048e937591ef36225ad5ef32e5475615a235a491dd0 |
|
MD5 | 0607aa0aea9b5eea3323eef0d2200ccb |
|
BLAKE2b-256 | 266af947404b55d8008035895ce33d6b1326cfb2412478f31e4548c184fefab2 |