Skip to main content

Feature engineering package that follows sklearn functionality

Project description

Feature Engine

Feature-engine is a Python library that contains several transformers to engineer features for use in machine learning models. Feature-engine's transformers follow Scikit-learn like functionality with fit() and transform() methods to first learn the transforming paramenters from data and then transform the data. Current Feature-engine's transformers include functionality for:

  • Missing data imputation
  • Categorical variable encoding
  • Outlier removal
  • Discretisation
  • Numerical Variable Transformation

Important Links

Imputing Methods

  • MeanMedianImputer
  • RandomSampleImputer
  • EndTailImputer
  • AddNaNBinaryImputer
  • CategoricalVariableImputer
  • FrequentCategoryImputer
  • ArbitraryNumberImputer

Encoding Methods

  • CountFrequencyCategoricalEncoder
  • OrdinalCategoricalEncoder
  • MeanCategoricalEncoder
  • WoERatioCategoricalEncoder
  • OneHotCategoricalEncoder
  • RareLabelCategoricalEncoder

Outlier Handling methods

  • Winsorizer
  • ArbitraryOutlierCapper

Discretisation methods

  • EqualFrequencyDiscretiser
  • EqualWidthDiscretiser
  • DecisionTreeDiscretiser

Variable Transformation methods

  • LogTransformer
  • ReciprocalTransformer
  • PowerTransformer
  • BoxCoxTransformer
  • YeoJohnsonTransformer

Installing

pip install feature_engine

or

git clone https://github.com/solegalli/feature_engine.git

Usage

from feature_engine.categorical_encoders import RareLabelEncoder

rare_encoder = RareLabelEncoder(tol = 0.05, n_categories=5)
rare_encoder.fit(data, variables = ['Cabin', 'Age'])
data_encoded = rare_encoder.transform(data)

See more usage examples in the jupyter notebooks in the example folder of this repository, or in the documentation: http://feature-engine.readthedocs.io

License

BSD 3-Clause

Authors

References

Many of the engineering and encoding functionality is inspired by this series of articles from the 2009 KDD competition.

To learn more about the rationale, functionality, pros and cos of each imputer, encoder and transformer, refer to the Feature Engineering Online Course

For a summary of the methods check this presentation and this article

To stay alert of latest releases, sign up at trainindata

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feature_engine-0.3.1.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

feature_engine-0.3.1-py3-none-any.whl (23.0 kB view details)

Uploaded Python 3

File details

Details for the file feature_engine-0.3.1.tar.gz.

File metadata

  • Download URL: feature_engine-0.3.1.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for feature_engine-0.3.1.tar.gz
Algorithm Hash digest
SHA256 f172fb61e1183c31e97c81a3567efb4a8ac66aef44e45edc09e2e5f1e073df68
MD5 5d238650b0f53906707d54f9c5ee8705
BLAKE2b-256 39250f14dba9b6bc5a741ac097d77ded51ebc57f8fc795ce89dde54bcb105b58

See more details on using hashes here.

File details

Details for the file feature_engine-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: feature_engine-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 23.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for feature_engine-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 24bd68dd78f8fe47a752e6978d3605495946f47028a154740e3c21d25f9c80ef
MD5 7164874d70b36bf9b88d58488558eda9
BLAKE2b-256 b30f7f7f60195879fc487aeaecba343f02c6f4426bc239b378b73655d40c1d06

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page