A thin wrapper for step-wise parameter optimization of boosting algorithms.
Project description
autoboost
Automatic step-wise parameter optimization for xgboost, lightgbm and sklearn's GradingBoosting.
Implemented Strategy
The optimization strategy is taken from SylwiaOliwia. We only incorporate slight changes to the implementation, e.g. we base all decision on the cross-validation test folds and not the entire data set.
The following excerpt is also taken from the readme:
General note
Full GridSearch is time- and memory-demanding, so xgboost-AutoTune tunes parameters in the following steps (one by one, from the most robust to the less):
-
n_estimators
-
max_depth, min_child_weight
-
Gamma
-
n_estimators
-
Subsample, colsample_bytree
-
reg_alpha, reg_lambda
-
n_estimators and learning_rate
-
Some of them are related only to
xgboost
,LightGBM
orGBM
. Algorithm picks parameters valid for given model and skip the rest.
Model is updated by newly chosen parameters in each step.
Detailed notes
Algorithm make GridsearchCV for each in seven steps (see General note section) and choose the best value. It uses domian values:
{'n_estimators': [30, 50, 70, 100, 150, 200, 300]},
{'max_depth': [3, 5, 7, 9], 'min_child_weight': [0.001, 0.1, 1, 5, 10, 20], 'min_samples_split': [1, 2, 5, 10, 20, 30],
'num_leaves': [15, 35, 50, 75, 100, 150]},
{'gamma': [0.0, 0.1, 0.2, 0.3, 0.4, 0.5], 'min_samples_leaf': [1, 2, 5, 10, 20, 30],
'min_child_samples': [2, 7, 15, 25, 45], 'min_split_gain': [0, 0.001, 0.1, 1, 5, 20]},
{'n_estimators': [30, 50, 70, 100, 150, 200, 300], 'max_features': range(10, 25, 3)},
{'subsample': [i / 10 for i in range(4, 10)], 'colsample_bytree': [i / 10 for i in range(4, 10)],
'feature_fraction': [i / 10 for i in range(4, 10)]},
{'reg_alpha': [1e-5, 1e-2, 0.1, 1, 25, 100], 'reg_lambda': [1e-5, 1e-2, 0.1, 1, 25, 100]}
Unless user will provide his own dictionary of values in initial_params_dict.
In each iteration, if chosing the best value from array has improved scoring by min_loss, algorithm continue searching. It creates new array from the best value, and 2 values in the neighbourhood:
-
If the best value in the previous array had neighbours, then new neighbours will be average between best value and it's previous neighbours. Example: if the best value from
n_estimators
:[30, 50, 70, 100, 150, 200, 300]
will be 70, than the new array to search will be[60, 70, 85]
. -
If the best value is the lowest from the array, it's new value will be
2*best_value- following_value
unless it's bigger then minimal (otherwise minimal posible value). -
The the best value was the biggest in the array, it will be treated in the similar way, as the lowest one.
If new values are float and int is required, values are rounded.
n_estimators
and learning_rate
are chosen pairwise. Algorithm takes its values from model and train them pairwise: (
n* n_estimators
, learning_rate
/ n ).
Installation
autoboost is available on PyPi and conda. You can easily install the package via:
conda install -c conda-forge autoboost
or alternatively via pip:
pip install autoboost
Usage
The standard usage can be summarized as follows:
from autoboost import optimizer
bo = optimizer.BoostingOptimizer(initial_model=xgboost.XGBRegressor(), scorer=mse_scorer)
clf = bo.fit(x_train, y_train)
Please the example file for a full working example for regression and classification.
Sources
The following list of sources is taking from xgboost-AutoTune.
- autoboost is based on xgboost-AutoTune
- https://xgboost.readthedocs.io/en/stable/tutorials/param_tuning.html
- https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/
- https://machinelearningmastery.com/tune-number-size-decision-trees-xgboost-python/
- https://www.kaggle.com/prasunmishra/parameter-tuning-for-xgboost-sklearn/notebook
- https://cambridgespark.com/content/tutorials/hyperparameter-tuning-in-xgboost/index.html
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file autoboost-20.8.15.tar.gz
.
File metadata
- Download URL: autoboost-20.8.15.tar.gz
- Upload date:
- Size: 11.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4aa48633d4160341ce68748c592a68a753be4af3f9ecfb86634e1ee38e8d5736 |
|
MD5 | 6a023a60613c48ebcbaa5d01ecf53dc9 |
|
BLAKE2b-256 | b037b8fe9b692e1d560f3c9d5859bf02a72d5e3af2b634e544c5b8d6ded44e87 |
File details
Details for the file autoboost-20.8.15-py3-none-any.whl
.
File metadata
- Download URL: autoboost-20.8.15-py3-none-any.whl
- Upload date:
- Size: 9.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e4ccc8d96771fbec44bf5cc6c14b1121550e96b01b59399990198192f2cabed4 |
|
MD5 | 2cc51ca02fa2687c07ef2ea8b14ce8ef |
|
BLAKE2b-256 | 98814466c7a6a0f56834b1fa8f27cf2e8f0c1e3d16901ba5f1d0a1206b35b664 |