Skip to main content

Feature engineering package that follows sklearn functionality

Project description

Feature Engine

Feature Engine is a python library that contains several transformers to engineer features for use in machine learning models. The transformers follow scikit-learn like functionality. They first learn the imputing or encoding methods from the training set, and subsequently transform the dataset. Currently the transformers include functionality for:

  • Missing value imputation
  • Categorical variable encoding
  • Outlier removal
  • Discretisation
  • Numerical Variable Transformation

Important Links

Documentation: http://feature-engine.readthedocs.io

Imputing Methods

  • MeanMedianImputer
  • RandomSampleImputer
  • EndTailImputer
  • AddNaNBinaryImputer
  • CategoricalVariableImputer
  • FrequentCategoryImputer
  • ArbitraryNumberImputer

Encoding Methods

  • CountFrequencyCategoricalEncoder
  • OrdinalCategoricalEncoder
  • MeanCategoricalEncoder
  • WoERatioCategoricalEncoder
  • OneHotCategoricalEncoder
  • RareLabelCategoricalEncoder

Outlier Handling methods

  • Windsorizer
  • ArbitraryOutlierCapper

Discretisation methods

  • EqualFrequencyDiscretiser
  • EqualWidthDiscretiser
  • DecisionTreeDiscretiser

Variable Transformation methods

  • LogTransformer
  • ReciprocalTransformer
  • ExponentialTransformer
  • BoxCoxTransformer

Installing

pip install feature_engine

or

git clone https://github.com/solegalli/feature_engine.git

Usage

from feature_engine.categorical_encoders import RareLabelEncoder

rare_encoder = RareLabelEncoder(tol = 0.05, n_categories=5)
rare_encoder.fit(data, variables = ['Cabin', 'Age'])
data_encoded = rare_encoder.transform(data)

See more usage examples in the jupyter notebooks in the example section

Examples

You can find jupyter notebooks in the examples folder, with directions on how to use this package and its multiple transformers.

License

BSD 3-Clause

Authors

References

Most of the engineering and encoding functionality is inspired by this series of articles from the 2009 KDD competition

To learn more about the rationale, functionality, pros and cos of each imputer, encoder and transformer, refer to the Feature Engineering Online Course

For a summary of the methods check this presentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feature_engine-0.3.0.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

feature_engine-0.3.0-py3-none-any.whl (23.0 kB view details)

Uploaded Python 3

File details

Details for the file feature_engine-0.3.0.tar.gz.

File metadata

  • Download URL: feature_engine-0.3.0.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for feature_engine-0.3.0.tar.gz
Algorithm Hash digest
SHA256 3d482d739a7366e4ac491ce77a4acef851ec504aa09d49e9c43c338845a8aca8
MD5 05602569979b6bee0b8edc95e8ad81c4
BLAKE2b-256 a289a672f4d6e5f4005ba911e2f9a6d7cbea84aa70cadae9ca7e4a7df53c7f54

See more details on using hashes here.

File details

Details for the file feature_engine-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: feature_engine-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 23.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for feature_engine-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f6db8f6366aec3f69963bac9f1b9ac5d663774af35a79bc1bbed211ed14a3767
MD5 5b3c610ae6635510aebee96003ab41ff
BLAKE2b-256 3fbec2ee5b18e60423f58e20749b40964fc49ce5c12a269fafed5933775f80f2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page