Skip to main content

Feature Selection and Elimination

Project description

Credito Emiliano - Feature Selection, Transformation and Elimination (CE - FeSTE)

This repo contains the 'FeSTE' python package which helps in the features management from the pre-filtering to the pre-processing and feature elimination.

Installation

To install it:

  1. Optional: create a new Python virtual environment (through bash terminal run: "py -m venv your_env_name" and then "source your_env_name/Scripts/activate )
  2. Install the package:
    • User Mode: pip install cefeste

Structure

The .py package is stored in src and contains 3 sub-modules:

  • selection: contains the feature preliminary selection functions
  • transform: contains the feature pre-processing functions
  • elimination: contains the feature elimination functions

Filters

Selection

The main class of this module is FeatureSelection. It applies several filters that can be grouped in the following:

  • Univariate filters:
    • No constant features
    • Number of distinct value too low
    • Number of missing values too high
    • Too concentrate in the most frequent value
    • Unstable between sets
  • Multivariate filters:
    • Spearman Correlation for numerical features
    • Cramer's V for categorical features
    • R2 for mixed features
    • VIF
  • Explanatory filters:
    • Feature AUROC for classification
    • Feature Correlation with target for regression

Trasformation

It is more a technical module which contains 3 classes useful for generating the production pipeline:

  • ColumnExtractor: to extract columns from a pd.DataFrame
  • ColumnRenamer: to rename columns and to transform a np.ndarray to a pd.DataFrame
  • Categorizer: to trasform the dtype of pd.DataFrame columns from 'object' to 'category'

Elimination

The main class of this module is FeatureElimination which is useful for selecting the most useful feature to keep in the model and optimize the hyperparams in the meanwhile.

It is a recursive method that at each iteration can:

  • Perform the hyperparameters optimization using user-defined model, grid, gridsearch method, evaluation measure
  • Calculate the feature shap importance value
  • Identify the last importance feature(/s) and Delete them for the next iteration

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cefeste-1.2.3.tar.gz (27.8 kB view details)

Uploaded Source

Built Distribution

cefeste-1.2.3-py3-none-any.whl (30.7 kB view details)

Uploaded Python 3

File details

Details for the file cefeste-1.2.3.tar.gz.

File metadata

  • Download URL: cefeste-1.2.3.tar.gz
  • Upload date:
  • Size: 27.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.14

File hashes

Hashes for cefeste-1.2.3.tar.gz
Algorithm Hash digest
SHA256 42fd1e1d214dc8420e199502be42a76eaf0fe65e35c992dcfd5b7e30f47d3672
MD5 234c780d398504e4b999cb986319fe6a
BLAKE2b-256 542fc574ba2ae3956915b43778ada1d612f8d7552589e8117e7b86c191f80c26

See more details on using hashes here.

File details

Details for the file cefeste-1.2.3-py3-none-any.whl.

File metadata

  • Download URL: cefeste-1.2.3-py3-none-any.whl
  • Upload date:
  • Size: 30.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.14

File hashes

Hashes for cefeste-1.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d5c3077f7cadb2e8ec5dd264db2973fe5871e865854d2176dd53e28123dac761
MD5 a5919b7de96d06dea49aacab87d089bd
BLAKE2b-256 70ca3de370cb6ea929f6ae4efba58ab78229db6a2c0e9d475f60f9e211a7c1a6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page