Skip to main content

Feature engineering package that follows sklearn functionality

Project description

Feature Engine

Feature-engine is a Python library that contains several transformers to engineer features for use in machine learning models. Feature-engine's transformers follow Scikit-learn like functionality with fit() and transform() methods to first learn the transforming paramenters from data and then transform the data. Current Feature-engine's transformers include functionality for:

  • Missing data imputation
  • Categorical variable encoding
  • Outlier removal
  • Discretisation
  • Numerical Variable Transformation

Important Links

Imputing Methods

  • MeanMedianImputer
  • RandomSampleImputer
  • EndTailImputer
  • AddNaNBinaryImputer
  • CategoricalVariableImputer
  • FrequentCategoryImputer
  • ArbitraryNumberImputer

Encoding Methods

  • CountFrequencyCategoricalEncoder
  • OrdinalCategoricalEncoder
  • MeanCategoricalEncoder
  • WoERatioCategoricalEncoder
  • OneHotCategoricalEncoder
  • RareLabelCategoricalEncoder

Outlier Handling methods

  • Winsorizer
  • ArbitraryOutlierCapper

Discretisation methods

  • EqualFrequencyDiscretiser
  • EqualWidthDiscretiser
  • DecisionTreeDiscretiser

Variable Transformation methods

  • LogTransformer
  • ReciprocalTransformer
  • PowerTransformer
  • BoxCoxTransformer
  • YeoJohnsonTransformer

Installing

pip install feature_engine

or

git clone https://github.com/solegalli/feature_engine.git

Usage

from feature_engine.categorical_encoders import RareLabelEncoder

rare_encoder = RareLabelEncoder(tol = 0.05, n_categories=5)
rare_encoder.fit(data, variables = ['Cabin', 'Age'])
data_encoded = rare_encoder.transform(data)

See more usage examples in the jupyter notebooks in the example folder of this repository, or in the documentation: http://feature-engine.readthedocs.io

License

BSD 3-Clause

Authors

References

Many of the engineering and encoding functionality is inspired by this series of articles from the 2009 KDD competition.

To learn more about the rationale, functionality, pros and cos of each imputer, encoder and transformer, refer to the Feature Engineering Online Course

For a summary of the methods check this presentation and this article

To stay alert of latest releases, sign up at trainindata

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for feature-engine, version 0.3.1
Filename, size File type Python version Upload date Hashes
Filename, size feature_engine-0.3.1-py3-none-any.whl (23.0 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size feature_engine-0.3.1.tar.gz (18.0 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page