Skip to main content

hgboost is a python package for hyperparameter optimization for xgboost, catboost and lightboost for both classification and regression tasks.

Project description

hgboost - Hyperoptimized Gradient Boosting

Python PyPI Version License Downloads Sphinx

Star it if you like it!

hgboost is short for Hyperoptimized Gradient Boosting and is a python package for hyperparameter optimization for xgboost, catboost and lightboost using cross-validation, and evaluating the results on an independent validation set. hgboost can be applied for classification and regression tasks.

hgboost is fun because:

* 1. Hyperoptimization of the Parameter-space using bayesian approach.
* 2. Determines the best scoring model(s) using k-fold cross validation.
* 3. Evaluates best model on independent evaluation set.
* 4. Fit model on entire input-data using the best model.
* 5. Return model, space and test/evaluation results.
* 6. Makes insightful plots.

Documentation

Schematic overview of hgboost

Installation Environment

  • Install hgboost from PyPI (recommended). hgboost is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.
  • A new environment is recommended and created as following:
conda create -n env_hgboost python=3.6
conda activate env_hgboost

Install newest version hgboost from pypi

pip install hgboost

Install from github-source

pip install git+https://github.com/erdogant/hgboost#egg=master

Import hgboost package

import hgboost as hgboost

Classification example for xgboost, catboost and lightboost:

# Load libray
from hgboost import hgboost

# Initizalization
hgb = hgboost(max_eval=10, threshold=0.5, cv=5, test_size=0.2, val_size=0.2, top_cv_evals=10, random_state=42)
# Import data
df = hgb.import_example()
y = df['Survived'].values
y = y.astype(str)
y[y=='1']='survived'
y[y=='0']='dead'

# Preprocessing by encoding variables
del df['Survived']
X = hgb.preprocessing(df)
# Fit catboost by hyperoptimization and cross-validation
results = hgb.catboost(X, y, pos_label='survived')

# Fit lightboost by hyperoptimization and cross-validation
results = hgb.lightboost(X, y, pos_label='survived')

# Fit xgboost by hyperoptimization and cross-validation
results = hgb.xgboost(X, y, pos_label='survived')

# [hgboost] >Start hgboost classification..
# [hgboost] >Collecting xgb_clf parameters.
# [hgboost] >Number of variables in search space is [11], loss function: [auc].
# [hgboost] >method: xgb_clf
# [hgboost] >eval_metric: auc
# [hgboost] >greater_is_better: True
# [hgboost] >pos_label: True
# [hgboost] >Total datset: (891, 204) 
# [hgboost] >Hyperparameter optimization..
#  100% |----| 500/500 [04:39<05:21,  1.33s/trial, best loss: -0.8800619834710744]
# [hgboost] >Best peforming [xgb_clf] model: auc=0.881198
# [hgboost] >5-fold cross validation for the top 10 scoring models, Total nr. tests: 50
# 100%|██████████| 10/10 [00:42<00:00,  4.27s/it]
# [hgboost] >Evalute best [xgb_clf] model on independent validation dataset (179 samples, 20.00%).
# [hgboost] >[auc] on independent validation dataset: -0.832
# [hgboost] >Retrain [xgb_clf] on the entire dataset with the optimal parameters settings.
# Plot searched parameter space 
hgb.plot_params()

# Plot summary results
hgb.plot()

# Plot the best tree
hgb.treeplot()

# Plot the validation results
hgb.plot_validation()

# Plot the cross-validation results
hgb.plot_cv()

# use the learned model to make new predictions.
y_pred, y_proba = hgb.predict(X)

Create ensemble model for Classification

from hgboost import hgboost

hgb = hgboost(max_eval=100, threshold=0.5, cv=5, test_size=0.2, val_size=0.2, top_cv_evals=10, random_state=None, verbose=3)

# Import data
df = hgb.import_example()
y = df['Survived'].values
del df['Survived']
X = hgb.preprocessing(df, verbose=0)

results = hgb.ensemble(X, y, pos_label=1)

# use the predictor
y_pred, y_proba = hgb.predict(X)

Create ensemble model for Regression

from hgboost import hgboost

hgb = hgboost(max_eval=100, threshold=0.5, cv=5, test_size=0.2, val_size=0.2, top_cv_evals=10, random_state=None, verbose=3)

# Import data
df = hgb.import_example()
y = df['Age'].values
del df['Age']
I = ~np.isnan(y)
X = hgb.preprocessing(df, verbose=0)
X = X.loc[I,:]
y = y[I]

results = hgb.ensemble(X, y, methods=['xgb_reg','ctb_reg','lgb_reg'])

# use the predictor
y_pred, y_proba = hgb.predict(X)
# Plot the ensemble classification validation results
hgb.plot_validation()

Citation

Please cite hgboost in your publications if this is useful for your research. Here is an example BibTeX entry:

@misc{erdogant2020hgboost,
  title={hgboost},
  author={Erdogan Taskesen},
  year={2020},
  howpublished={\url{https://github.com/erdogant/hgboost}},
}

References

Maintainers

Contribute

  • Contributions are welcome.

Licence

See LICENSE for details.

Coffee

  • This work is created and maintained in my free time. If you wish to buy me a Coffee for this work, it is very appreciated.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hgboost-0.1.1.tar.gz (46.6 kB view details)

Uploaded Source

Built Distribution

hgboost-0.1.1-py3-none-any.whl (43.5 kB view details)

Uploaded Python 3

File details

Details for the file hgboost-0.1.1.tar.gz.

File metadata

  • Download URL: hgboost-0.1.1.tar.gz
  • Upload date:
  • Size: 46.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.2.0.post20200511 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for hgboost-0.1.1.tar.gz
Algorithm Hash digest
SHA256 84abfba130ea13de2ae8d74b084558c0f3f382f1e91b5789f9c2b65cf007b771
MD5 d27bc8dd241967f4766801e888ad7a33
BLAKE2b-256 afacd613a1833d8e864f696ae2aea9a3d12ff70926822118128eb8aecb4784d4

See more details on using hashes here.

File details

Details for the file hgboost-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: hgboost-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 43.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.2.0.post20200511 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for hgboost-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 76d5686caf7375e6dd85836e833531caf0461293fb218e082b34523e56002310
MD5 0f28d279fd476907bacdd02aa4015313
BLAKE2b-256 499e3511aa58e5517076c1c0386c7a3c4c7677946709f2aff454276be606bf9c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page