Skip to main content

Feature engineering package with Scikit-learn's fit transform functionality

Project description

Feature Engine

PythonVersion License https://github.com/feature-engine/feature_engine/blob/master/LICENSE.md PyPI version Conda https://anaconda.org/conda-forge/feature_engine CircleCI https://app.circleci.com/pipelines/github/feature-engine/feature_engine Documentation Status https://feature-engine.readthedocs.io/en/latest/index.html Join the chat at https://gitter.im/feature_engine/community Sponsorship https://www.trainindata.com/ Downloads Downloads DOI DOI

Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models. Feature-engine's transformers follow Scikit-learn's functionality with fit() and transform() methods to learn the transforming parameters from the data and then transform it.

Feature-engine features in the following resources

Blogs about Feature-engine

En Español

Documentation

Current Feature-engine's transformers include functionality for:

  • Missing Data Imputation
  • Categorical Encoding
  • Discretisation
  • Outlier Capping or Removal
  • Variable Transformation
  • Variable Creation
  • Variable Selection
  • Datetime Features
  • Time Series
  • Preprocessing
  • Scikit-learn Wrappers

Imputation Methods

  • MeanMedianImputer
  • RandomSampleImputer
  • EndTailImputer
  • AddMissingIndicator
  • CategoricalImputer
  • ArbitraryNumberImputer
  • DropMissingData

Encoding Methods

  • OneHotEncoder
  • OrdinalEncoder
  • CountFrequencyEncoder
  • MeanEncoder
  • WoEEncoder
  • PRatioEncoder
  • RareLabelEncoder
  • DecisionTreeEncoder

Discretisation methods

  • EqualFrequencyDiscretiser
  • EqualWidthDiscretiser
  • DecisionTreeDiscretiser
  • ArbitraryDiscreriser

Outlier Handling methods

  • Winsorizer
  • ArbitraryOutlierCapper
  • OutlierTrimmer

Variable Transformation methods

  • LogTransformer
  • LogCpTransformer
  • ReciprocalTransformer
  • PowerTransformer
  • BoxCoxTransformer
  • YeoJohnsonTransformer

Variable Creation:

  • MathFeatures
  • RelativeFeatures
  • CyclicalFeatures

Feature Selection:

  • DropFeatures
  • DropConstantFeatures
  • DropDuplicateFeatures
  • DropCorrelatedFeatures
  • SmartCorrelationSelection
  • ShuffleFeaturesSelector
  • SelectBySingleFeaturePerformance
  • SelectByTargetMeanPerformance
  • RecursiveFeatureElimination
  • RecursiveFeatureAddition
  • DropHighPSIFeatures

Datetime

  • DatetimeFeatures

Time Series

  • LagFeatures
  • WindowFeatures
  • ExpandingWindowFeatures

Preprocessing

  • MatchVariables

Wrappers:

  • SklearnTransformerWrapper

Installation

From PyPI using pip:

pip install feature_engine

From Anaconda:

conda install -c conda-forge feature_engine

Or simply clone it:

git clone https://github.com/feature-engine/feature_engine.git

Example Usage

>>> import pandas as pd
>>> from feature_engine.encoding import RareLabelEncoder

>>> data = {'var_A': ['A'] * 10 + ['B'] * 10 + ['C'] * 2 + ['D'] * 1}
>>> data = pd.DataFrame(data)
>>> data['var_A'].value_counts()
Out[1]:
A    10
B    10
C     2
D     1
Name: var_A, dtype: int64
>>> rare_encoder = RareLabelEncoder(tol=0.10, n_categories=3)
>>> data_encoded = rare_encoder.fit_transform(data)
>>> data_encoded['var_A'].value_counts()
Out[2]:
A       10
B       10
Rare     3
Name: var_A, dtype: int64

Find more examples in our Jupyter Notebook Gallery or in the documentation.

Contribute

Details about how to contribute can be found in the Contribute Page

Briefly:

  • Fork the repo
  • Clone your fork into your local computer: git clone https://github.com/<YOURUSERNAME>/feature_engine.git
  • navigate into the repo folder cd feature_engine
  • Install Feature-engine as a developer: pip install -e .
  • Optional: Create and activate a virtual environment with any tool of choice
  • Install Feature-engine dependencies: pip install -r requirements.txt and pip install -r test_requirements.txt
  • Create a feature branch with a meaningful name for your feature: git checkout -b myfeaturebranch
  • Develop your feature, tests and documentation
  • Make sure the tests pass
  • Make a PR

Thank you!!

Documentation

Feature-engine documentation is built using Sphinx and is hosted on Read the Docs.

To build the documentation make sure you have the dependencies installed: from the root directory: pip install -r docs/requirements.txt.

Now you can build the docs using: sphinx-build -b html docs build

License

BSD 3-Clause

Sponsor us

Sponsor us and support further our mission to democratize machine learning and programming tools through open-source software.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feature_engine-1.3.0.tar.gz (150.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

feature_engine-1.3.0-py2.py3-none-any.whl (260.6 kB view details)

Uploaded Python 2Python 3

File details

Details for the file feature_engine-1.3.0.tar.gz.

File metadata

  • Download URL: feature_engine-1.3.0.tar.gz
  • Upload date:
  • Size: 150.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.0

File hashes

Hashes for feature_engine-1.3.0.tar.gz
Algorithm Hash digest
SHA256 992fb3c615cd2e86dd4555da0a9edc2a6d7fa2e72b14b6e2be2c65c84921a76d
MD5 5562ad30ec517ceb61e5042c2302bc9e
BLAKE2b-256 28413cc065491ac2a652f22736acaab348817357ac7bd0c2e88c51a7c27686f1

See more details on using hashes here.

File details

Details for the file feature_engine-1.3.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for feature_engine-1.3.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b906cf3f96b9b3a8e3dd98b6a01329f48f64c359dfb0f9bed132cc504c0007b8
MD5 b147cfa109cfb9127e34708edd3972b6
BLAKE2b-256 6f1c686d6aafe44f8bed3cb140b5d130ea0a2557acbab24b01564254fffd638e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page