hgboost is a Python package to minimize a function from the model xgboost, catboost or lightboost over a hyperparameter space for both classification and regression.
Project description
hgboost
hgboost
is Python package to minimize the function for xgboost, catboost or lightboost over a hyperparameter space by using cross-validation, and evaluating the results on an indepdendent validation set.
hgboost
can be applied for classification and regression tasks.
Documentation
- API Documentation: https://erdogant.github.io/hgboost/
Installation Environment
- Install hgboost from PyPI (recommended). hgboost is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.
- A new environment is recommended and created as following:
conda create -n env_hgboost python=3.6
conda activate env_hgboost
Install newest version hgboost from pypi
pip install hgboost
Install from github-source
pip install git+https://github.com/erdogant/hgboost#egg=master
Import hgboost package
import hgboost as hgboost
Classification example for xgboost, catboost and lightboost:
# Load libray
from hgboost import hgboost
# Initizalization
hgb = hgboost(max_eval=10, threshold=0.5, cv=5, test_size=0.2, val_size=0.2, top_cv_evals=10, random_state=42)
# Import data
df = hgb.import_example()
y = df['Survived'].values
y = y.astype(str)
y[y=='1']='survived'
y[y=='0']='dead'
# Preprocessing by encoding variables
del df['Survived']
X = hgb.preprocessing(df)
# Fit catboost by hyperoptimization and cross-validation
results = hgb.catboost(X, y, pos_label='survived')
# Fit lightboost by hyperoptimization and cross-validation
results = hgb.lightboost(X, y, pos_label='survived')
# Fit xgboost by hyperoptimization and cross-validation
results = hgb.xgboost(X, y, pos_label='survived')
# [hgboost] >Start hgboost classification..
# [hgboost] >Collecting xgb_clf parameters.
# [hgboost] >Number of variables in search space is [11], loss function: [auc].
# [hgboost] >method: xgb_clf
# [hgboost] >eval_metric: auc
# [hgboost] >greater_is_better: True
# [hgboost] >pos_label: True
# [hgboost] >Total datset: (891, 204)
# [hgboost] >Hyperparameter optimization..
# 100% |----| 500/500 [04:39<05:21, 1.33s/trial, best loss: -0.8800619834710744]
# [hgboost] >Best peforming [xgb_clf] model: auc=0.881198
# [hgboost] >5-fold cross validation for the top 10 scoring models, Total nr. tests: 50
# 100%|██████████| 10/10 [00:42<00:00, 4.27s/it]
# [hgboost] >Evalute best [xgb_clf] model on independent validation dataset (179 samples, 20.00%).
# [hgboost] >[auc] on independent validation dataset: -0.832
# [hgboost] >Retrain [xgb_clf] on the entire dataset with the optimal parameters settings.
# Plot searched parameter space
hgb.plot_params()
# Plot summary results
hgb.plot()
# Plot the best tree
hgb.treeplot()
# Plot the validation results
hgb.plot_validation()
# Plot the cross-validation results
hgb.plot_cv()
# use the learned model to make new predictions.
y_pred, y_proba = hgb.predict(X)
Citation
Please cite hgboost in your publications if this is useful for your research. Here is an example BibTeX entry:
@misc{erdogant2020hgboost,
title={hgboost},
author={Erdogan Taskesen},
year={2019},
howpublished={\url{https://github.com/erdogant/hgboost}},
}
References
Maintainers
- Erdogan Taskesen, github: erdogant
Contribute
- Contributions are welcome.
Licence
See LICENSE for details.
Coffee
- This work is created and maintained in my free time. If you wish to buy me a Coffee for this work, it is very appreciated.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hgboost-0.1.0.tar.gz
(18.9 kB
view hashes)
Built Distribution
hgboost-0.1.0-py3-none-any.whl
(18.4 kB
view hashes)