Feature engineering package with Scikit-learn's fit transform functionality

These details have not been verified by PyPI

Project links

Homepage

Project description

Feature Engine

Python 3.6 Python 3.7 Python 3.8 License CircleCI Documentation Status

Feature-engine is a Python library with multiple transformers to engineer features for use in machine learning models. Feature-engine's transformers follow Scikit-learn functionality with fit() and transform() methods to first learn the transforming paramenters from data and then transform the data.

Feature-engine features in the following resources:

Blogs about Feature-engine:

Documentation

Documentation: http://feature-engine.readthedocs.io
Home page: https://www.trainindata.com/feature-engine

Current Feature-engine's transformers include functionality for:

Missing data imputation
Categorical variable encoding
Outlier removal
Discretisation
Numerical Variable Transformation

Imputing Methods

MeanMedianImputer
RandomSampleImputer
EndTailImputer
AddNaNBinaryImputer
CategoricalVariableImputer
FrequentCategoryImputer
ArbitraryNumberImputer

Encoding Methods

CountFrequencyCategoricalEncoder
OrdinalCategoricalEncoder
MeanCategoricalEncoder
WoERatioCategoricalEncoder
OneHotCategoricalEncoder
RareLabelCategoricalEncoder

Outlier Handling methods

Winsorizer
ArbitraryOutlierCapper
OutlierTrimmer

Discretisation methods

EqualFrequencyDiscretiser
EqualWidthDiscretiser
DecisionTreeDiscretiser
UserInputDiscreriser

Variable Transformation methods

LogTransformer
ReciprocalTransformer
PowerTransformer
BoxCoxTransformer
YeoJohnsonTransformer

Scikit-learn Wrapper:

SklearnTransformerWrapper

Installing

pip install feature_engine

git clone https://github.com/solegalli/feature_engine.git

Usage

>>> from feature_engine.categorical_encoders import RareLabelCategoricalEncoder
>>> import pandas as pd

>>> data = {'var_A': ['A'] * 10 + ['B'] * 10 + ['C'] * 2 + ['D'] * 1}
>>> data = pd.DataFrame(data)
>>> data['var_A'].value_counts()

Out[1]:
A    10
B    10
C     2
D     1
Name: var_A, dtype: int64

>>> rare_encoder = RareLabelCategoricalEncoder(tol=0.10, n_categories=3)
>>> data_encoded = rare_encoder.fit_transform(data)
>>> data_encoded['var_A'].value_counts()

Out[2]:
A       10
B       10
Rare     3
Name: var_A, dtype: int64

See more usage examples in the jupyter notebooks in the example folder of this repository, or in the documentation: http://feature-engine.readthedocs.io

Contributing

Local Setup Steps

Clone the repo and cd into it
Run pip install tox
Run tox if the tests pass, your local setup is complete

Opening Pull Requests

PR's are welcome! Please make sure the CI tests pass on your branch.

License

BSD 3-Clause

Authors

Soledad Galli - Initial work - Feature Engineering for Machine Learning, Online Course.

References

Many of the engineering and encoding functionality is inspired by this series of articles from the 2009 KDD competition.

To learn more about the rationale, functionality, pros and cos of each imputer, encoder and transformer, refer to the Feature Engineering for Machine Learning, Online Course

For a summary of the methods check this presentation and this article

To stay alert of latest releases, sign up at trainindata

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.9.4

Feb 27, 2026

1.9.3

Aug 28, 2025

1.8.3

Jan 22, 2025

1.8.2

Nov 3, 2024

1.8.1

Aug 31, 2024

1.8.0

May 26, 2024

1.7.0

Mar 24, 2024

1.6.2

Sep 18, 2023

1.6.1

Jun 8, 2023

1.6.0

Mar 27, 2023

1.5.2

Nov 22, 2022

1.5.1

Oct 24, 2022

1.5.0

Oct 17, 2022

1.4.1

Jun 13, 2022

1.4.0

Jun 10, 2022

1.3.0

May 5, 2022

1.2.0

Jan 4, 2022

1.1.2

Aug 31, 2021

1.1.1

Aug 6, 2021

1.1.0

Jun 22, 2021

1.0.2

Jan 23, 2021

1.0.1

Jan 11, 2021

1.0.0

Dec 31, 2020

0.6.1

Sep 21, 2020

0.6.0

Aug 14, 2020

0.5.17

Aug 6, 2020

0.5.16

Aug 6, 2020

0.5.15

Jul 30, 2020

0.5.14

Jul 29, 2020

0.5.12

Jul 29, 2020

0.5.11

Jul 28, 2020

0.5.3

Aug 6, 2020

This version

0.5.2

Aug 6, 2020

0.5.1

Jul 10, 2020

0.5.0

Jul 10, 2020

0.4.31

Jun 16, 2020

0.4.3

May 15, 2020

0.4.2

May 2, 2020

0.4.1

Apr 29, 2020

0.4.0

Apr 27, 2020

0.3.1

Nov 19, 2019

0.3.0

Aug 5, 2019

0.2

Jan 1, 2019

0.1.1

Jan 1, 2019

0.1

Dec 31, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feature_engine-0.5.2.tar.gz (24.8 kB view details)

Uploaded Aug 6, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

feature_engine-0.5.2-py2.py3-none-any.whl (30.2 kB view details)

Uploaded Aug 6, 2020 Python 2Python 3

File details

Details for the file feature_engine-0.5.2.tar.gz.

File metadata

Download URL: feature_engine-0.5.2.tar.gz
Upload date: Aug 6, 2020
Size: 24.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.2

File hashes

Hashes for feature_engine-0.5.2.tar.gz
Algorithm	Hash digest
SHA256	`0b8d26cc3642c05fc19ccf2a4caee81862af467a74560621ce71a9da90d1139b`
MD5	`84d6c88ef018e90ba78bde06ff1c1457`
BLAKE2b-256	`7db75d37a1b04235405e4051f01acabed754e7eb860eec311146030fc73724e2`

See more details on using hashes here.

File details

Details for the file feature_engine-0.5.2-py2.py3-none-any.whl.

File metadata

Download URL: feature_engine-0.5.2-py2.py3-none-any.whl
Upload date: Aug 6, 2020
Size: 30.2 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.2

File hashes

Hashes for feature_engine-0.5.2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`49e7eaf9d12893d3d1bf843b2c38271b21ccdeed06b71ade508d3160aa93f135`
MD5	`a2117f178470ede2ec5b9a0765cbde1d`
BLAKE2b-256	`793367a2d0c0e91f786b33b6cd29f5c217343ccda63f1304e1f6cac069e05f40`

See more details on using hashes here.

feature-engine 0.5.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Feature Engine

Feature-engine features in the following resources:

Blogs about Feature-engine:

Documentation

Current Feature-engine's transformers include functionality for:

Imputing Methods

Encoding Methods

Outlier Handling methods

Discretisation methods

Variable Transformation methods

Scikit-learn Wrapper:

Installing

Usage

Contributing

Local Setup Steps

Opening Pull Requests

License

Authors

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes