Skip to main content

Feature engineering package with Scikit-learn's fit transform functionality

Project description

Feature Engine

Python 3.6 Python 3.7 Python 3.8 License CircleCI Documentation Status

Feature-engine is a Python library with multiple transformers to engineer features for use in machine learning models. Feature-engine's transformers follow scikit-learn's functionality with fit() and transform() methods to first learn the transforming parameters from data and then transform the data.

Feature-engine features in the following resources:

Blogs about Feature-engine:

Documentation

En Español:

More resources will be added as they appear online!

Current Feature-engine's transformers include functionality for:

  • Missing Data Imputation
  • Categorical Variable Encoding
  • Outlier Capping or Removal
  • Discretisation
  • Numerical Variable Transformation
  • Scikit-learn Wrappers
  • Variables Combination
  • Variable Selection

Imputing Methods

  • MeanMedianImputer
  • RandomSampleImputer
  • EndTailImputer
  • AddNaNBinaryImputer
  • CategoricalVariableImputer
  • FrequentCategoryImputer
  • ArbitraryNumberImputer

Encoding Methods

  • CountFrequencyCategoricalEncoder
  • OrdinalCategoricalEncoder
  • MeanCategoricalEncoder
  • WoERatioCategoricalEncoder
  • OneHotCategoricalEncoder
  • RareLabelCategoricalEncoder

Outlier Handling methods

  • Winsorizer
  • ArbitraryOutlierCapper
  • OutlierTrimmer

Discretisation methods

  • EqualFrequencyDiscretiser
  • EqualWidthDiscretiser
  • DecisionTreeDiscretiser
  • UserInputDiscreriser

Variable Transformation methods

  • LogTransformer
  • ReciprocalTransformer
  • PowerTransformer
  • BoxCoxTransformer
  • YeoJohnsonTransformer

Scikit-learn Wrapper:

  • SklearnTransformerWrapper

Variable Combinations:

  • MathematicalCombinator

Feature Selection:

  • DropFeatures

Installing

From PyPI using pip:

pip install feature_engine

From Anaconda:

conda install -c conda-forge feature_engine

Or simply clone it:

git clone https://github.com/solegalli/feature_engine.git

Usage

>>> from feature_engine.categorical_encoders import RareLabelCategoricalEncoder
>>> import pandas as pd

>>> data = {'var_A': ['A'] * 10 + ['B'] * 10 + ['C'] * 2 + ['D'] * 1}
>>> data = pd.DataFrame(data)
>>> data['var_A'].value_counts()
Out[1]:
A    10
B    10
C     2
D     1
Name: var_A, dtype: int64
>>> rare_encoder = RareLabelCategoricalEncoder(tol=0.10, n_categories=3)
>>> data_encoded = rare_encoder.fit_transform(data)
>>> data_encoded['var_A'].value_counts()
Out[2]:
A       10
B       10
Rare     3
Name: var_A, dtype: int64

See more usage examples in the Jupyter Notebooks in the example folder of this repository, or in the documentation.

Contributing

Details about how to contribute can be found in the Contributing Page

In short:

Local Setup Steps

  • Fork the repo
  • Clone your fork into your local computer: git clone https://github.com/<YOURUSERNAME>/feature_engine.git
  • cd into the repo cd feature_engine
  • Install as a developer: pip install -e .
  • Create and activate a virtual environment with any tool of choice
  • Install the dependencies as explained in the Contributing Page
  • Create a feature branch with a meaningful name for your feature: git checkout -b myfeaturebranch
  • Develop your feature, tests and documentation
  • Make sure the tests pass
  • Make a PR

Thank you!!

Opening Pull Requests

PR's are welcome! Please make sure the CI tests pass on your branch.

Tests

We prefer tox. In your environment:

  • Run pip install tox
  • cd into the root directory of the repo: cd feature_engine
  • Run tox

If the tests pass, the code is functional.

You can also run the tests in your environment (without tox). For guidelines on how to do so, check the Contributing Page.

Documentation

Feature-engine documentation is built using Sphinx and is hosted on Read the Docs.

To build the documentation make sure you have the dependencies installed. From the root directory: pip install -r docs/requirements.txt.

Now you can build the docs: sphinx-build -b html docs build

License

BSD 3-Clause

References

Many of the engineering and encoding functionalities are inspired by this series of articles from the 2009 KDD Competition.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feature_engine-0.6.1.tar.gz (29.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

feature_engine-0.6.1-py2.py3-none-any.whl (34.9 kB view details)

Uploaded Python 2Python 3

File details

Details for the file feature_engine-0.6.1.tar.gz.

File metadata

  • Download URL: feature_engine-0.6.1.tar.gz
  • Upload date:
  • Size: 29.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.2

File hashes

Hashes for feature_engine-0.6.1.tar.gz
Algorithm Hash digest
SHA256 165a002e3a6e9141b6b406946ee1b4f46a37e42176870731419b41e07249e1bb
MD5 32ade87c8da221015cf4ddaaf2168838
BLAKE2b-256 e565904b53499e7c5859fd84a52012d2c7a91dd850076b72a0cac39380fb0be9

See more details on using hashes here.

File details

Details for the file feature_engine-0.6.1-py2.py3-none-any.whl.

File metadata

  • Download URL: feature_engine-0.6.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 34.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.2

File hashes

Hashes for feature_engine-0.6.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 52e54860c0ee7e256a8db35d2a9c03bdd6f8dfcdde377cd4f72a3545daad3b18
MD5 ba529c17f2beb0bdbef8b4c4a7fc62f0
BLAKE2b-256 14ed5680bf401855b788f79cadc1298c210c5860eb5d54c4008cfa234b752ef1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page