hgboost is a python package for hyperparameter optimization for xgboost, catboost and lightboost for both classification and regression tasks.
Project description
hgboost - Hyperoptimized Gradient Boosting
Star it if you like it!
hgboost
is short for Hyperoptimized Gradient Boosting and is a python package for hyperparameter optimization for xgboost, catboost and lightboost using cross-validation, and evaluating the results on an independent validation set.
hgboost
can be applied for classification and regression tasks.
hgboost
is fun because:
* 1. Hyperoptimization of the Parameter-space using bayesian approach.
* 2. Determines the best scoring model(s) using k-fold cross validation.
* 3. Evaluates best model on independent evaluation set.
* 4. Fit model on entire input-data using the best model.
* 5. Works for classification and regression
* 6. Creating a super-hyperoptimized model by an ensemble of all individual optimized models.
* 7. Return model, space and test/evaluation results.
* 8. Makes insightful plots.
Documentation
- API Documentation: https://erdogant.github.io/hgboost/
- Github: https://github.com/erdogant/hgboost/
Schematic overview of hgboost
Installation Environment
- Install hgboost from PyPI (recommended). hgboost is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.
- A new environment is recommended and created as following:
conda create -n env_hgboost python=3.6
conda activate env_hgboost
Install newest version hgboost from pypi
pip install hgboost
Force to install latest version
pip install -U hgboost
Install from github-source
pip install git+https://github.com/erdogant/hgboost#egg=master
Import hgboost package
import hgboost as hgboost
Classification example for xgboost, catboost and lightboost:
# Load library
from hgboost import hgboost
# Initialization
hgb = hgboost(max_eval=10, threshold=0.5, cv=5, test_size=0.2, val_size=0.2, top_cv_evals=10, random_state=42)
# Import data
df = hgb.import_example()
y = df['Survived'].values
y = y.astype(str)
y[y=='1']='survived'
y[y=='0']='dead'
# Preprocessing by encoding variables
del df['Survived']
X = hgb.preprocessing(df)
# Fit catboost by hyperoptimization and cross-validation
results = hgb.catboost(X, y, pos_label='survived')
# Fit lightboost by hyperoptimization and cross-validation
results = hgb.lightboost(X, y, pos_label='survived')
# Fit xgboost by hyperoptimization and cross-validation
results = hgb.xgboost(X, y, pos_label='survived')
# [hgboost] >Start hgboost classification..
# [hgboost] >Collecting xgb_clf parameters.
# [hgboost] >Number of variables in search space is [11], loss function: [auc].
# [hgboost] >method: xgb_clf
# [hgboost] >eval_metric: auc
# [hgboost] >greater_is_better: True
# [hgboost] >pos_label: True
# [hgboost] >Total dataset: (891, 204)
# [hgboost] >Hyperparameter optimization..
# 100% |----| 500/500 [04:39<05:21, 1.33s/trial, best loss: -0.8800619834710744]
# [hgboost] >Best performing [xgb_clf] model: auc=0.881198
# [hgboost] >5-fold cross validation for the top 10 scoring models, Total nr. tests: 50
# 100%|██████████| 10/10 [00:42<00:00, 4.27s/it]
# [hgboost] >Evalute best [xgb_clf] model on independent validation dataset (179 samples, 20.00%).
# [hgboost] >[auc] on independent validation dataset: -0.832
# [hgboost] >Retrain [xgb_clf] on the entire dataset with the optimal parameters settings.
# Plot searched parameter space
hgb.plot_params()
# Plot summary results
hgb.plot()
# Plot the best tree
hgb.treeplot()
# Plot the validation results
hgb.plot_validation()
# Plot the cross-validation results
hgb.plot_cv()
# use the learned model to make new predictions.
y_pred, y_proba = hgb.predict(X)
Create ensemble model for Classification
from hgboost import hgboost
hgb = hgboost(max_eval=100, threshold=0.5, cv=5, test_size=0.2, val_size=0.2, top_cv_evals=10, random_state=None, verbose=3)
# Import data
df = hgb.import_example()
y = df['Survived'].values
del df['Survived']
X = hgb.preprocessing(df, verbose=0)
results = hgb.ensemble(X, y, pos_label=1)
# use the predictor
y_pred, y_proba = hgb.predict(X)
Create ensemble model for Regression
from hgboost import hgboost
hgb = hgboost(max_eval=100, threshold=0.5, cv=5, test_size=0.2, val_size=0.2, top_cv_evals=10, random_state=None, verbose=3)
# Import data
df = hgb.import_example()
y = df['Age'].values
del df['Age']
I = ~np.isnan(y)
X = hgb.preprocessing(df, verbose=0)
X = X.loc[I,:]
y = y[I]
results = hgb.ensemble(X, y, methods=['xgb_reg','ctb_reg','lgb_reg'])
# use the predictor
y_pred, y_proba = hgb.predict(X)
# Plot the ensemble classification validation results
hgb.plot_validation()
Citation
Please cite hgboost in your publications if this is useful for your research. Here is an example BibTeX entry:
@misc{erdogant2020hgboost,
title={hgboost},
author={Erdogan Taskesen},
year={2020},
howpublished={\url{https://github.com/erdogant/hgboost}},
}
References
* http://hyperopt.github.io/hyperopt/
* https://github.com/dmlc/xgboost
* https://github.com/microsoft/LightGBM
* https://github.com/catboost/catboost
Maintainers
- Erdogan Taskesen, github: erdogant
Contribute
- Contributions are welcome.
Licence See LICENSE for details.
Coffee
- If you wish to buy me a Coffee for this work, it is very appreciated :)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hgboost-1.0.0.tar.gz
.
File metadata
- Download URL: hgboost-1.0.0.tar.gz
- Upload date:
- Size: 26.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d4e67499b9836e38fe071b9b33e580cf30e0bdca650a30f1cb9aed4fa1e5fee0 |
|
MD5 | e16c7be5e7beadcda1c03f16b542356c |
|
BLAKE2b-256 | 1abafa5be3978446425304411d919419e8854494605aeeec86eaa6df51c1aa35 |
File details
Details for the file hgboost-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: hgboost-1.0.0-py3-none-any.whl
- Upload date:
- Size: 24.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 38ee4aa62ad64420c415eb5d0f06f8ed8fc87c141dd1c836f47daff85f03ddcf |
|
MD5 | f378c520e2bd526f731a134f8763dfa0 |
|
BLAKE2b-256 | ac358987980526bacdcd8a6d25714dbd9be7d9733a0a85ac916c2dd72c6686dd |