Skip to main content

Feature Selection and Elimination

Project description

Credito Emiliano - Feature Selection, Transformation and Elimination (CE - FeSTE)

This repo contains the 'FeSTE' python package which helps in the features management from the pre-filtering to the pre-processing and feature elimination.

Installation

To install it:

  1. Optional: create a new Python virtual environment (through bash terminal run: "py -m venv your_env_name" and then "source your_env_name/Scripts/activate )
  2. Install the package:
    • User Mode: pip install cefeste

Structure

The .py package is stored in src and contains 3 sub-modules:

  • selection: contains the feature preliminary selection functions
  • transform: contains the feature pre-processing functions
  • elimination: contains the feature elimination functions

Filters

Selection

The main class of this module is FeatureSelection. It applies several filters that can be grouped in the following:

  • Univariate filters:
    • No constant features
    • Number of distinct value too low
    • Number of missing values too high
    • Too concentrate in the most frequent value
    • Unstable between sets
  • Multivariate filters:
    • Spearman Correlation for numerical features
    • Cramer's V for categorical features
    • R2 for mixed features
    • VIF
  • Explanatory filters:
    • Feature AUROC for classification
    • Feature Correlation with target for regression

Trasformation

It is more a technical module which contains 3 classes useful for generating the production pipeline:

  • ColumnExtractor: to extract columns from a pd.DataFrame
  • ColumnRenamer: to rename columns and to transform a np.ndarray to a pd.DataFrame
  • Categorizer: to trasform the dtype of pd.DataFrame columns from 'object' to 'category'

Elimination

The main class of this module is FeatureElimination which is useful for selecting the most useful feature to keep in the model and optimize the hyperparams in the meanwhile.

It is a recursive method that at each iteration can:

  • Perform the hyperparameters optimization using user-defined model, grid, gridsearch method, evaluation measure
  • Calculate the feature shap importance value
  • Identify the last importance feature(/s) and Delete them for the next iteration

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cefeste-1.1.10.tar.gz (26.7 kB view hashes)

Uploaded Source

Built Distribution

cefeste-1.1.10-py3-none-any.whl (29.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page