Skip to main content

A thin wrapper for step-wise parameter optimization of boosting algorithms.

Project description

release License: MIT Twitter Python 3.9 PyPI version coverage

autoboost

Automatic step-wise parameter optimization for xgboost, lightgbm and sklearn's GradingBoosting.

Implemented Strategy

The optimization strategy is taken from SylwiaOliwia. We only incorporate slight changes to the implementation, e.g. we base all decision on the cross-validation test folds and not the entire data set.

The following excerpt is also taken from the readme:

General note

Full GridSearch is time- and memory-demanding, so xgboost-AutoTune tunes parameters in the following steps (one by one, from the most robust to the less):

  1. n_estimators

  2. max_depth, min_child_weight

  3. Gamma

  4. n_estimators

  5. Subsample, colsample_bytree

  6. reg_alpha, reg_lambda

  7. n_estimators and learning_rate

  8. Some of them are related only to xgboost, LightGBM or GBM. Algorithm picks parameters valid for given model and skip the rest.

Model is updated by newly chosen parameters in each step.

Detailed notes

Algorithm make GridsearchCV for each in seven steps (see General note section) and choose the best value. It uses domian values:

{'n_estimators': [30, 50, 70, 100, 150, 200, 300]},
{'max_depth': [3, 5, 7, 9], 'min_child_weight': [0.001, 0.1, 1, 5, 10, 20], 'min_samples_split': [1, 2, 5, 10, 20, 30],
 'num_leaves': [15, 35, 50, 75, 100, 150]},
{'gamma': [0.0, 0.1, 0.2, 0.3, 0.4, 0.5], 'min_samples_leaf': [1, 2, 5, 10, 20, 30],
 'min_child_samples': [2, 7, 15, 25, 45], 'min_split_gain': [0, 0.001, 0.1, 1, 5, 20]},
{'n_estimators': [30, 50, 70, 100, 150, 200, 300], 'max_features': range(10, 25, 3)},
{'subsample': [i / 10 for i in range(4, 10)], 'colsample_bytree': [i / 10 for i in range(4, 10)],
 'feature_fraction': [i / 10 for i in range(4, 10)]},
{'reg_alpha': [1e-5, 1e-2, 0.1, 1, 25, 100], 'reg_lambda': [1e-5, 1e-2, 0.1, 1, 25, 100]}

Unless user will provide his own dictionary of values in initial_params_dict.

In each iteration, if chosing the best value from array has improved scoring by min_loss, algorithm continue searching. It creates new array from the best value, and 2 values in the neighbourhood:

  • If the best value in the previous array had neighbours, then new neighbours will be average between best value and it's previous neighbours. Example: if the best value from n_estimators: [30, 50, 70, 100, 150, 200, 300] will be 70, than the new array to search will be [60, 70, 85].

  • If the best value is the lowest from the array, it's new value will be 2*best_value- following_value unless it's bigger then minimal (otherwise minimal posible value).

  • The the best value was the biggest in the array, it will be treated in the similar way, as the lowest one.

If new values are float and int is required, values are rounded.

n_estimators and learning_rate are chosen pairwise. Algorithm takes its values from model and train them pairwise: ( n* n_estimators , learning_rate/ n ).

Installation

autoboost is available on PyPi and conda. You can easily install the package via:

conda install -c conda-forge autoboost

or alternatively via pip:

pip install autoboost

Usage

The standard usage can be summarized as follows:

from autoboost import optimizer

bo = optimizer.BoostingOptimizer(initial_model=xgboost.XGBRegressor(), scorer=mse_scorer)
clf = bo.fit(x_train, y_train)

Please the example file for a full working example for regression and classification.

Sources

The following list of sources is taking from xgboost-AutoTune.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autoboost-20.8.15.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

autoboost-20.8.15-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file autoboost-20.8.15.tar.gz.

File metadata

  • Download URL: autoboost-20.8.15.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for autoboost-20.8.15.tar.gz
Algorithm Hash digest
SHA256 4aa48633d4160341ce68748c592a68a753be4af3f9ecfb86634e1ee38e8d5736
MD5 6a023a60613c48ebcbaa5d01ecf53dc9
BLAKE2b-256 b037b8fe9b692e1d560f3c9d5859bf02a72d5e3af2b634e544c5b8d6ded44e87

See more details on using hashes here.

File details

Details for the file autoboost-20.8.15-py3-none-any.whl.

File metadata

  • Download URL: autoboost-20.8.15-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for autoboost-20.8.15-py3-none-any.whl
Algorithm Hash digest
SHA256 e4ccc8d96771fbec44bf5cc6c14b1121550e96b01b59399990198192f2cabed4
MD5 2cc51ca02fa2687c07ef2ea8b14ce8ef
BLAKE2b-256 98814466c7a6a0f56834b1fa8f27cf2e8f0c1e3d16901ba5f1d0a1206b35b664

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page