Skip to main content

Feature engineering and selection package with Scikit-learn's fit transform functionality

Project description

Feature-engine

feature-engine logo

Open Source GitHub Sponsorship
Tutorials !youtube
Code PyPI - Python Version PyPI Conda
Downloads Monthly Downloads Downloads
Meta GitHub contributors first-timers-only
Documentation Read the Docs
Citation DOI JOSS
Testing CircleCI Codecov Code style: black

Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models. Feature-engine's transformers follow Scikit-learn's functionality with fit() and transform() methods to learn the transforming parameters from the data and then transform it.

Feature-engine features in the following resources

Blogs about Feature-engine

Documentation

Pst! How did you find us?

We want to share Feature-engine with more people. It'd help us loads if you tell us how you discovered us.

Then we'd know what we are doing right and which channels to use to share the love.

Please share your story by answering 1 quick question at this link . 😃

Current Feature-engine's transformers include functionality for:

  • Missing Data Imputation
  • Categorical Encoding
  • Discretisation
  • Outlier Capping or Removal
  • Variable Transformation
  • Variable Creation
  • Variable Selection
  • Datetime Features
  • Text Features
  • Time Series
  • Preprocessing
  • Scaling
  • Scikit-learn Wrappers

Imputation Methods

  • MeanMedianImputer
  • ArbitraryNumberImputer
  • RandomSampleImputer
  • EndTailImputer
  • CategoricalImputer
  • AddMissingIndicator
  • DropMissingData

Encoding Methods

  • OneHotEncoder
  • OrdinalEncoder
  • CountFrequencyEncoder
  • MeanEncoder
  • WoEEncoder
  • RareLabelEncoder
  • DecisionTreeEncoder
  • StringSimilarityEncoder

Discretisation methods

  • EqualFrequencyDiscretiser
  • EqualWidthDiscretiser
  • GeometricWidthDiscretiser
  • DecisionTreeDiscretiser
  • ArbitraryDiscreriser

Outlier Handling methods

  • Winsorizer
  • ArbitraryOutlierCapper
  • OutlierTrimmer

Variable Transformation methods

  • LogTransformer
  • LogCpTransformer
  • ReciprocalTransformer
  • ArcsinTransformer
  • PowerTransformer
  • BoxCoxTransformer
  • YeoJohnsonTransformer
  • ArcSinhTransformer

Variable Scaling methods

  • MeanNormalizationScaler

Variable Creation:

  • MathFeatures
  • RelativeFeatures
  • CyclicalFeatures
  • DecisionTreeFeatures
  • GeoDistanceFeatures

Feature Selection:

  • DropFeatures
  • DropConstantFeatures
  • DropDuplicateFeatures
  • DropCorrelatedFeatures
  • SmartCorrelationSelection
  • ShuffleFeaturesSelector
  • SelectBySingleFeaturePerformance
  • SelectByTargetMeanPerformance
  • RecursiveFeatureElimination
  • RecursiveFeatureAddition
  • DropHighPSIFeatures
  • SelectByInformationValue
  • ProbeFeatureSelection
  • MRMR

Datetime

  • DatetimeFeatures
  • DatetimeSubtraction
  • DatetimeOrdinal

Text Features

  • TextFeatures

Time Series

  • LagFeatures
  • WindowFeatures
  • ExpandingWindowFeatures

Pipelines

  • Pipeline
  • make_pipeline

Preprocessing

  • MatchCategories
  • MatchVariables

Wrappers:

  • SklearnTransformerWrapper

Installation

From PyPI using pip:

pip install feature_engine

From Anaconda:

conda install -c conda-forge feature_engine

Or simply clone it:

git clone https://github.com/feature-engine/feature_engine.git

Example Usage

>>> import pandas as pd
>>> from feature_engine.encoding import RareLabelEncoder

>>> data = {'var_A': ['A'] * 10 + ['B'] * 10 + ['C'] * 2 + ['D'] * 1}
>>> data = pd.DataFrame(data)
>>> data['var_A'].value_counts()
Out[1]:
A    10
B    10
C     2
D     1
Name: var_A, dtype: int64
>>> rare_encoder = RareLabelEncoder(tol=0.10, n_categories=3)
>>> data_encoded = rare_encoder.fit_transform(data)
>>> data_encoded['var_A'].value_counts()
Out[2]:
A       10
B       10
Rare     3
Name: var_A, dtype: int64

Find more examples in our Jupyter Notebook Gallery or in the documentation.

Contribute

Details about how to contribute can be found in the Contribute Page

Briefly:

  • Fork the repo
  • Clone your fork into your local computer:
git clone https://github.com/<YOURUSERNAME>/feature_engine.git
  • navigate into the repo folder
cd feature_engine
  • Install Feature-engine as a developer:
pip install -e .
  • Optional: Create and activate a virtual environment with any tool of choice
  • Install Feature-engine developer dependencies:
pip install -e ".[tests]"
  • Create a feature branch with a meaningful name for your feature:
git checkout -b myfeaturebranch
  • Develop your feature, tests and documentation
  • Make sure the tests pass
  • Make a PR

Thank you!!

Documentation

Feature-engine documentation is built using Sphinx and is hosted on Read the Docs.

To build the documentation make sure you have the dependencies installed: from the root directory:

pip install -r docs/requirements.txt

Now you can build the docs using:

sphinx-build -b html docs build

License

The content of this repository is licensed under a BSD 3-Clause license.

Sponsor

Feature-engine is made possible with the support of Train in Data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feature_engine-1.9.4.tar.gz (150.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

feature_engine-1.9.4-py3-none-any.whl (243.5 kB view details)

Uploaded Python 3

File details

Details for the file feature_engine-1.9.4.tar.gz.

File metadata

  • Download URL: feature_engine-1.9.4.tar.gz
  • Upload date:
  • Size: 150.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for feature_engine-1.9.4.tar.gz
Algorithm Hash digest
SHA256 4ee817b9a557f9670636deb417689645e54fc2391c2f14fbe0ef4f185fd1ab17
MD5 99b3833a43d14cdca41d1731292507b9
BLAKE2b-256 2b9d802c7c4a3f1b8a988e7370b377f426eba7e5ed51a42019973a8063ea5532

See more details on using hashes here.

File details

Details for the file feature_engine-1.9.4-py3-none-any.whl.

File metadata

  • Download URL: feature_engine-1.9.4-py3-none-any.whl
  • Upload date:
  • Size: 243.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for feature_engine-1.9.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a1181dcb25fc906cc74dd4f2947c49a71012a727d30a4505f2418419c83ace9e
MD5 cbe733ffa9b033f1be16980d24c6b6df
BLAKE2b-256 5c92698f823c1e0a4affe75d8dc6685639b82c65241bdda1f04f6a69fd162119

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page