Skip to main content

Feature Selection and Elimination

Project description

Credito Emiliano - Feature Selection, Transformation and Elimination (CE - FeSTE)

This repo contains the 'FeSTE' python package which helps in the features management from the pre-filtering to the pre-processing and feature elimination.

Installation

To install it:

  1. Optional: create a new Python virtual environment (through bash terminal run: "py -m venv your_env_name" and then "source your_env_name/Scripts/activate )
  2. Install the package:
    • User Mode: pip install cefeste

Structure

The .py package is stored in src and contains 3 sub-modules:

  • selection: contains the feature preliminary selection functions
  • transform: contains the feature pre-processing functions
  • elimination: contains the feature elimination functions

Filters

Selection

The main class of this module is FeatureSelection. It applies several filters that can be grouped in the following:

  • Univariate filters:
    • No constant features
    • Number of distinct value too low
    • Number of missing values too high
    • Too concentrate in the most frequent value
    • Unstable between sets
  • Multivariate filters:
    • Spearman Correlation for numerical features
    • Cramer's V for categorical features
    • R2 for mixed features
    • VIF
  • Explanatory filters:
    • Feature AUROC for classification
    • Feature Correlation with target for regression

Trasformation

It is more a technical module which contains 3 classes useful for generating the production pipeline:

  • ColumnExtractor: to extract columns from a pd.DataFrame
  • ColumnRenamer: to rename columns and to transform a np.ndarray to a pd.DataFrame
  • Categorizer: to trasform the dtype of pd.DataFrame columns from 'object' to 'category'

Elimination

The main class of this module is FeatureElimination which is useful for selecting the most useful feature to keep in the model and optimize the hyperparams in the meanwhile.

It is a recursive method that at each iteration can:

  • Perform the hyperparameters optimization using user-defined model, grid, gridsearch method, evaluation measure
  • Calculate the feature shap importance value
  • Identify the last importance feature(/s) and Delete them for the next iteration

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cefeste-1.2.4.tar.gz (27.8 kB view details)

Uploaded Source

Built Distribution

cefeste-1.2.4-py3-none-any.whl (30.7 kB view details)

Uploaded Python 3

File details

Details for the file cefeste-1.2.4.tar.gz.

File metadata

  • Download URL: cefeste-1.2.4.tar.gz
  • Upload date:
  • Size: 27.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.14

File hashes

Hashes for cefeste-1.2.4.tar.gz
Algorithm Hash digest
SHA256 9d9372510f86947b0472f46f31c67e3e3e9afb0303db43b704345bd5f7cabae5
MD5 bb8a9d0b5749c54c1e6502965e01c421
BLAKE2b-256 b4ebeb700f7adec10355a26ad650a64f38a371d2b6c7bed38dad8036263d0699

See more details on using hashes here.

File details

Details for the file cefeste-1.2.4-py3-none-any.whl.

File metadata

  • Download URL: cefeste-1.2.4-py3-none-any.whl
  • Upload date:
  • Size: 30.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.14

File hashes

Hashes for cefeste-1.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 091f235d0faec7963702e96b5af2a73e75ff929dd9387119708d20cb5ff339e4
MD5 8f8f850bd3c85fbaf5f59958fcb1a7b2
BLAKE2b-256 c87a89ebf08d9ffdf20213ca5df299872520d2f54fa7b2b7eb2b2f66b2ce1bed

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page