Skip to main content

A python package for simultaneous Hyperparameters Tuning and Features Selection for Gradient Boosting Models.

Project description

shap-hypetune

A python package for simultaneous Hyperparameters Tuning and Features Selection for Gradient Boosting Models.

shap-hypetune diagram

Overview

Hyperparameters tuning and features selection are two common steps in every machine learning pipeline. Most of the time they are computed separately and independently. This may result in suboptimal performances and in a more time expensive process.

shap-hypetune aims to combine hyperparameters tuning and features selection in a single pipeline optimizing the optimal number of features while searching for the optimal parameters configuration. Hyperparameters Tuning or Features Selection can also be carried out as standalone operations.

shap-hypetune main features:

  • designed for gradient boosting models, as LGBModel or XGBModel;
  • developed to be integrable with the scikit-learn ecosystem;
  • effective in both classification or regression tasks;
  • customizable training process, supporting early-stopping and all the other fitting options available in the standard algorithms api;
  • ranking feature selection algorithms: Recursive Feature Elimination (RFE); Recursive Feature Addition (RFA); or Boruta;
  • classical boosting based feature importances or SHAP feature importances (the later can be computed also on the eval_set);
  • apply grid-search, random-search, or bayesian-search (from hyperopt);
  • parallelized computations with joblib.

Installation

pip install --upgrade shap-hypetune

lightgbm, xgboost are not needed requirements. The module depends only on NumPy, shap, scikit-learn and hyperopt. Python 3.6 or above is supported.

Media

Usage

from shaphypetune import BoostSearch, BoostRFE, BoostRFA, BoostBoruta

Hyperparameters Tuning

BoostSearch(
    estimator,                              # LGBModel or XGBModel
    param_grid=None,                        # parameters to be optimized
    greater_is_better=False,                # minimize or maximize the monitored score
    n_iter=None,                            # number of sampled parameter configurations
    sampling_seed=None,                     # the seed used for parameter sampling
    verbose=1,                              # verbosity mode
    n_jobs=None                             # number of jobs to run in parallel
)

Feature Selection (RFE)

BoostRFE(  
    estimator,                              # LGBModel or XGBModel
    min_features_to_select=None,            # the minimum number of features to be selected  
    step=1,                                 # number of features to remove at each iteration  
    param_grid=None,                        # parameters to be optimized  
    greater_is_better=False,                # minimize or maximize the monitored score  
    importance_type='feature_importances',  # which importance measure to use: default or shap  
    train_importance=True,                  # where to compute the shap feature importance  
    n_iter=None,                            # number of sampled parameter configurations  
    sampling_seed=None,                     # the seed used for parameter sampling  
    verbose=1,                              # verbosity mode  
    n_jobs=None                             # number of jobs to run in parallel  
)  

Feature Selection (BORUTA)

BoostBoruta(
    estimator,                              # LGBModel or XGBModel
    perc=100,                               # threshold used to compare shadow and real features
    alpha=0.05,                             # p-value levels for feature rejection
    max_iter=100,                           # maximum Boruta iterations to perform
    early_stopping_boruta_rounds=None,      # maximum iterations without confirming a feature
    param_grid=None,                        # parameters to be optimized
    greater_is_better=False,                # minimize or maximize the monitored score
    importance_type='feature_importances',  # which importance measure to use: default or shap
    train_importance=True,                  # where to compute the shap feature importance
    n_iter=None,                            # number of sampled parameter configurations
    sampling_seed=None,                     # the seed used for parameter sampling
    verbose=1,                              # verbosity mode
    n_jobs=None                             # number of jobs to run in parallel
)

Feature Selection (RFA)

BoostRFA(
    estimator,                              # LGBModel or XGBModel
    min_features_to_select=None,            # the minimum number of features to be selected
    step=1,                                 # number of features to remove at each iteration
    param_grid=None,                        # parameters to be optimized
    greater_is_better=False,                # minimize or maximize the monitored score
    importance_type='feature_importances',  # which importance measure to use: default or shap
    train_importance=True,                  # where to compute the shap feature importance
    n_iter=None,                            # number of sampled parameter configurations
    sampling_seed=None,                     # the seed used for parameter sampling
    verbose=1,                              # verbosity mode
    n_jobs=None                             # number of jobs to run in parallel
)

Full examples in the notebooks folder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shap-hypetune-0.2.5.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

shap_hypetune-0.2.5-py3-none-any.whl (16.8 kB view details)

Uploaded Python 3

File details

Details for the file shap-hypetune-0.2.5.tar.gz.

File metadata

  • Download URL: shap-hypetune-0.2.5.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.7

File hashes

Hashes for shap-hypetune-0.2.5.tar.gz
Algorithm Hash digest
SHA256 4c12bc983198d7a26dd0b5e4ed4e125946a388e4cddca2fdeeab7b8f80a5d75d
MD5 e9577877bdfa3e8698f04c5b2edca293
BLAKE2b-256 09662c78688ba46117a0d959d47264f32c86110743227de401e4bcb0834aee93

See more details on using hashes here.

File details

Details for the file shap_hypetune-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: shap_hypetune-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 16.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.7

File hashes

Hashes for shap_hypetune-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 6964ec5d43e19a3f604fa5bb8b3d8d5acf675255bacf850756507a6084e33d75
MD5 4d553f1be6545aff80b206f7b8b10699
BLAKE2b-256 fe3ca2d9375a61212f85e212989c78dce061afa7dd220972f4f94e7ecc4abb3c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page