Skip to main content

Feature engineering package with Scikit-learn's fit transform functionality

Project description

Feature Engine

Python 3.6 Python 3.7 Python 3.8 License CircleCI Documentation Status

Feature-engine is a Python library with multiple transformers to engineer features for use in machine learning models. Feature-engine's transformers follow scikit-learn's functionality with fit() and transform() methods to first learn the transforming parameters from data and then transform the data.

Feature-engine features in the following resources:

Blogs about Feature-engine:

Documentation

Current Feature-engine's transformers include functionality for:

  • Missing Data Imputation
  • Categorical Variable Encoding
  • Outlier Removal
  • Discretisation
  • Numerical Variable Transformation

Imputing Methods

  • MeanMedianImputer
  • RandomSampleImputer
  • EndTailImputer
  • AddNaNBinaryImputer
  • CategoricalVariableImputer
  • FrequentCategoryImputer
  • ArbitraryNumberImputer

Encoding Methods

  • CountFrequencyCategoricalEncoder
  • OrdinalCategoricalEncoder
  • MeanCategoricalEncoder
  • WoERatioCategoricalEncoder
  • OneHotCategoricalEncoder
  • RareLabelCategoricalEncoder

Outlier Handling methods

  • Winsorizer
  • ArbitraryOutlierCapper
  • OutlierTrimmer

Discretisation methods

  • EqualFrequencyDiscretiser
  • EqualWidthDiscretiser
  • DecisionTreeDiscretiser
  • UserInputDiscreriser

Variable Transformation methods

  • LogTransformer
  • ReciprocalTransformer
  • PowerTransformer
  • BoxCoxTransformer
  • YeoJohnsonTransformer

Scikit-learn Wrapper:

  • SklearnTransformerWrapper

Installing

pip install feature_engine

or

git clone https://github.com/solegalli/feature_engine.git

Usage

>>> from feature_engine.categorical_encoders import RareLabelCategoricalEncoder
>>> import pandas as pd

>>> data = {'var_A': ['A'] * 10 + ['B'] * 10 + ['C'] * 2 + ['D'] * 1}
>>> data = pd.DataFrame(data)
>>> data['var_A'].value_counts()
Out[1]:
A    10
B    10
C     2
D     1
Name: var_A, dtype: int64
>>> rare_encoder = RareLabelCategoricalEncoder(tol=0.10, n_categories=3)
>>> data_encoded = rare_encoder.fit_transform(data)
>>> data_encoded['var_A'].value_counts()
Out[2]:
A       10
B       10
Rare     3
Name: var_A, dtype: int64

See more usage examples in the Jupyter Notebooks in the example folder of this repository, or in the documentation: http://feature-engine.readthedocs.io

Contributing

Local Setup Steps

  • Clone the repo and cd into it
  • Run pip install tox
  • Run tox if the tests pass, your local setup is complete

Opening Pull Requests

PR's are welcome! Please make sure the CI tests pass on your branch.

License

BSD 3-Clause

Authors

References

Many of the engineering and encoding functionalities are inspired by this series of articles from the 2009 KDD Competition.

To learn more about the rationale, functionality, pros and cons of each imputer, encoder, and transformer, refer to the Feature Engineering for Machine Learning, Online Course

For a summary of the methods check this presentation and this article

To stay alert of latest releases, sign up at trainindata

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feature_engine-0.6.0.tar.gz (28.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

feature_engine-0.6.0-py2.py3-none-any.whl (34.3 kB view details)

Uploaded Python 2Python 3

File details

Details for the file feature_engine-0.6.0.tar.gz.

File metadata

  • Download URL: feature_engine-0.6.0.tar.gz
  • Upload date:
  • Size: 28.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.2

File hashes

Hashes for feature_engine-0.6.0.tar.gz
Algorithm Hash digest
SHA256 f611fb962a7d7d8096fca83c2d6608fbbc2e7c7daa83a0d1a218c6e01699d417
MD5 471cb927fec61e320dbc83d62b23264b
BLAKE2b-256 eae2e03ec0837d97d306f2c65b0f873c3b9020a4fad16daa4a0c3460d2cffe5a

See more details on using hashes here.

File details

Details for the file feature_engine-0.6.0-py2.py3-none-any.whl.

File metadata

  • Download URL: feature_engine-0.6.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 34.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.2

File hashes

Hashes for feature_engine-0.6.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 7cbcc9e5504f19ecebc072eb10d677ea80ba5db030c63758d7f3aa313fc07968
MD5 ab10e2aaa91b1fd814d76b781dfcfbff
BLAKE2b-256 d336651f586a52495f6eba6613eafb6e9238a259fd78ece78b03486042a0ff71

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page