Skip to main content

Feature engineering package with Scikit-learn's fit transform functionality

Project description

Feature Engine

Python 3.6 Python 3.7 Python 3.8 License CircleCI Documentation Status

Feature-engine is a Python library with multiple transformers to engineer features for use in machine learning models. Feature-engine's transformers follow scikit-learn's functionality with fit() and transform() methods to first learn the transforming parameters from data and then transform the data.

Feature-engine features in the following resources:

Blogs about Feature-engine:

Documentation

En Español:

More resources will be added as they appear online!

Current Feature-engine's transformers include functionality for:

  • Missing Data Imputation
  • Categorical Variable Encoding
  • Outlier Capping or Removal
  • Discretisation
  • Numerical Variable Transformation
  • Scikit-learn Wrappers
  • Variable Combination
  • Variable Selection

Imputing Methods

  • MeanMedianImputer
  • RandomSampleImputer
  • EndTailImputer
  • AddNaNBinaryImputer
  • CategoricalImputer
  • ArbitraryNumberImputer

Encoding Methods

  • OneHotEncoder
  • OrdinalEncoder
  • CountFrequencyEncoder
  • MeanEncoder
  • WoEEncoder
  • PRatioEncoder
  • RareLabelEncoder
  • DecisionTreeEncoder

Outlier Handling methods

  • Winsorizer
  • ArbitraryOutlierCapper
  • OutlierTrimmer

Discretisation methods

  • EqualFrequencyDiscretiser
  • EqualWidthDiscretiser
  • DecisionTreeDiscretiser
  • UserInputDiscreriser

Variable Transformation methods

  • LogTransformer
  • ReciprocalTransformer
  • PowerTransformer
  • BoxCoxTransformer
  • YeoJohnsonTransformer

Scikit-learn Wrapper:

  • SklearnTransformerWrapper

Variable Combinations:

  • MathematicalCombination

Feature Selection:

  • DropFeatures
  • DropConstantFeatures
  • DropDuplicateFeatures
  • DropCorrelatedFeatures
  • SmartCorrelationSelection
  • ShuffleFeaturesSelector
  • SelectBySingleFeaturePerformance
  • SelectByTargetMeanPerformance
  • RecursiveFeatureElimination
  • RecursiveFeatureAddition

Installing

From PyPI using pip:

pip install feature_engine

From Anaconda:

conda install -c conda-forge feature_engine

Or simply clone it:

git clone https://github.com/solegalli/feature_engine.git

Usage

>>> import pandas as pd
>>> from feature_engine.encoding import RareLabelEncoder

>>> data = {'var_A': ['A'] * 10 + ['B'] * 10 + ['C'] * 2 + ['D'] * 1}
>>> data = pd.DataFrame(data)
>>> data['var_A'].value_counts()
Out[1]:
A    10
B    10
C     2
D     1
Name: var_A, dtype: int64
>>> rare_encoder = RareLabelEncoder(tol=0.10, n_categories=3)
>>> data_encoded = rare_encoder.fit_transform(data)
>>> data_encoded['var_A'].value_counts()
Out[2]:
A       10
B       10
Rare     3
Name: var_A, dtype: int64

See more usage examples in the Jupyter Notebooks in the example folder of this repository, or in the documentation.

Contributing

Details about how to contribute can be found in the Contributing Page

In short:

Local Setup Steps

  • Fork the repo
  • Clone your fork into your local computer: git clone https://github.com/<YOURUSERNAME>/feature_engine.git
  • cd into the repo cd feature_engine
  • Install as a developer: pip install -e .
  • Create and activate a virtual environment with any tool of choice
  • Install the dependencies as explained in the Contributing Page
  • Create a feature branch with a meaningful name for your feature: git checkout -b myfeaturebranch
  • Develop your feature, tests and documentation
  • Make sure the tests pass
  • Make a PR

Thank you!!

Opening Pull Requests

PR's are welcome! Please make sure the CI tests pass on your branch.

Tests

We prefer tox. In your environment:

  • Run pip install tox
  • cd into the root directory of the repo: cd feature_engine
  • Run tox

If the tests pass, the code is functional.

You can also run the tests in your environment (without tox). For guidelines on how to do so, check the Contributing Page.

Documentation

Feature-engine documentation is built using Sphinx and is hosted on Read the Docs.

To build the documentation make sure you have the dependencies installed. From the root directory: pip install -r docs/requirements.txt.

Now you can build the docs: sphinx-build -b html docs build

License

BSD 3-Clause

References

Many of the engineering and encoding functionalities are inspired by this series of articles from the 2009 KDD Competition.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feature_engine-1.0.1.tar.gz (71.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

feature_engine-1.0.1-py2.py3-none-any.whl (145.1 kB view details)

Uploaded Python 2Python 3

File details

Details for the file feature_engine-1.0.1.tar.gz.

File metadata

  • Download URL: feature_engine-1.0.1.tar.gz
  • Upload date:
  • Size: 71.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.2

File hashes

Hashes for feature_engine-1.0.1.tar.gz
Algorithm Hash digest
SHA256 d7d7263e84248b3462c5fbbe8423ad92a6e03474bdcb2e99eafb163e0deae147
MD5 ed84dd652009f99cd3382df2843a8080
BLAKE2b-256 f1d642ea90f47ca12ba34ebe5859f79902f979ad63439b9fadd0a6f1c97afa53

See more details on using hashes here.

File details

Details for the file feature_engine-1.0.1-py2.py3-none-any.whl.

File metadata

  • Download URL: feature_engine-1.0.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 145.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.2

File hashes

Hashes for feature_engine-1.0.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 37d81d0b8c81aa14d55cdfb8297d2fe550e580a6f1ecc8a11256c751f2ff214b
MD5 2cfd7c5ec5705e9c7cce4057efd67a6a
BLAKE2b-256 6f94c607f34727b1474e12f2282187d01288c1b4370eefe0bad70a5deb6e658e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page