Skip to main content

A python package for simultaneous Hyperparameters Tuning and Features Selection for Gradient Boosting Models.

Project description

shap-hypetune

A python package for simultaneous Hyperparameters Tuning and Features Selection for Gradient Boosting Models.

Overview

Hyperparameters tuning and features selection are two common steps in every machine learning pipeline. Most of the time they are computed separately and independently. This may result in suboptimal performances and in a more time expensive process.

shap-hypetune aims to combine hyperparameters tuning and features selection in a single pipeline optimizing the optimal number of features while searching for the optimal parameters configuration. Hyperparameters Tuning or Features Selection can also be carried out as standalone operations.

shap-hypetune main features:

  • designed for gradient boosting models, as LGBModel or XGBModel;
  • effective in both classification or regression tasks;
  • customizable training process, supporting early-stopping and all the other fitting options available in the standard algorithms api;
  • ranking feature selection algorithms: Recursive Feature Elimination (RFE) or Boruta;
  • classical boosting based feature importances or SHAP feature importances (the later can be computed also on the eval_set);
  • apply grid-search or random-search.

Installation

pip install shap-hypetune

lightgbm, xgboost are not needed requirements. The module depends only on NumPy and shap. Python 3.6 or above is supported.

Media

[Comig Soon]

Usage

Only Hyperparameters Tuning

  • GRID-SEARCH
param_grid = {'n_estimators': 150,
    	      'learning_rate': [0.2, 0.1],
              'num_leaves': [25, 30, 35],
    	      'max_depth': [10, 12]}

model = BoostSearch(LGBMClassifier(), param_grid=param_grid)
model.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], early_stopping_rounds=6, verbose=0)
  • RANDOM-SEARCH
param_dist = {'n_estimators': 150,
    	      'learning_rate': stats.uniform(0.09, 0.25),
    	      'num_leaves': stats.randint(20,40),
    	      'max_depth': [10, 12]}

model = BoostSearch(LGBMClassifier(), param_grid=param_dist, n_iter=10, sampling_seed=0)
model.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], early_stopping_rounds=6, verbose=0)

Only Features Selection

  • RFE
model = BoostRFE(LGBMClassifier(),
                 min_features_to_select=1, step=1)
model.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], early_stopping_rounds=6, verbose=0)
  • Boruta
model = BoostBoruta(LGBMClassifier(),
                    max_iter=100, perc=100)
model.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], early_stopping_rounds=6, verbose=0)

Only Features Selection with SHAP

  • RFE with SHAP
model = BoostRFE(LGBMClassifier(), 
                 min_features_to_select=1, step=1,
                 importance_type='shap_importances', train_importance=False)
model.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], early_stopping_rounds=6, verbose=0)
  • Boruta with SHAP
model = BoostBoruta(LGBMClassifier(),
                    max_iter=100, perc=100,
                    importance_type='shap_importances', train_importance=False)
model.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], early_stopping_rounds=6, verbose=0)

Hyperparameters Tuning + Features Selection

  • RANDOM-SEARCH + RFE
param_dist = {'n_estimators': 150,
    	      'learning_rate': stats.uniform(0.09, 0.25),
    	      'num_leaves': stats.randint(20,40),
    	      'max_depth': [10, 12]}

model = BoostRFE(LGBMClassifier(), param_grid=param_dist, n_iter=10, sampling_seed=0,
                 min_features_to_select=1, step=1)
model.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], early_stopping_rounds=6, verbose=0)
  • RANDOM-SEARCH + Boruta
param_dist = {'n_estimators': 150,
    	      'learning_rate': stats.uniform(0.09, 0.25),
    	      'num_leaves': stats.randint(20,40),
    	      'max_depth': [10, 12]}

model = BoostBoruta(LGBMClassifier(), param_grid=param_dist, n_iter=10, sampling_seed=0,
                    max_iter=100, perc=100)
model.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], early_stopping_rounds=6, verbose=0)

Hyperparameters Tuning + Features Selection with SHAP

  • GRID-SEARCH + RFE with SHAP
param_grid = {'n_estimators': 150,
    	      'learning_rate': [0.2, 0.1],
              'num_leaves': [25, 30, 35],
    	      'max_depth': [10, 12]}

model = BoostRFE(LGBMClassifier(), param_grid=param_grid, 
                 min_features_to_select=1, step=1,
                 importance_type='shap_importances', train_importance=False)
model.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], early_stopping_rounds=6, verbose=0)
  • GRID-SEARCH + Boruta with SHAP
param_grid = {'n_estimators': 150,
    	      'learning_rate': [0.2, 0.1],
              'num_leaves': [25, 30, 35],
    	      'max_depth': [10, 12]}

model = BoostBoruta(LGBMClassifier(), param_grid=param_grid,
                    max_iter=100, perc=100,
                    importance_type='shap_importances', train_importance=False)
model.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], early_stopping_rounds=6, verbose=0)

All the examples are reproducible in regression contexts and with XGBModel.

More examples in the notebooks folder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shap-hypetune-0.1.0.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

shap_hypetune-0.1.0-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file shap-hypetune-0.1.0.tar.gz.

File metadata

  • Download URL: shap-hypetune-0.1.0.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.7

File hashes

Hashes for shap-hypetune-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5facc57c819f9c31d4f7e9aad6ca2c1e7c4c531137c6f1b2f17405ae88756c3a
MD5 5581fb08990981c81411475f66708698
BLAKE2b-256 7a35366e70c534c948073a64350a6b6d29f6bccb6f54cfb7cf879fc27b201989

See more details on using hashes here.

File details

Details for the file shap_hypetune-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: shap_hypetune-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.7

File hashes

Hashes for shap_hypetune-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b4e15e0c3d38cf62198d1153601a9bbff2578db6074664287e63b5435678278a
MD5 fac306e667893d7e2f22281b3ca678ec
BLAKE2b-256 b96c76ed2115fc29c6a323ac0a74397387233fbb7855ee435268576e5a15e754

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page