Skip to main content

A wrapper for conducting Nested Cross-Validation with Bayesian Hyper-Parameter Optimized Gradient Boosting

Project description

Nested Cross-Validation for Bayesian Optimized Gradient Boosting

PyPI version License: GPL v3 Build Status Codacy Badge GitHub last commit

Description

A Python implementation that unifies Nested K-Fold Cross-Validation, Bayesian Hyperparameter Optimization, and Gradient Boosting. Designed for rapid prototyping on small to mid-sized data sets (can be manipulated within memory). Quickly obtains high quality prediction results by abstracting away tedious hyperparameter tuning and implementation details in favor of usability and implementation speed. Bayesian Hyperparamter Optimization utilizes Tree Parzen Estimation (TPE) from the Hyperopt package. Gradient Boosting can be conducted one of three ways. Select between XGBoost, LightGBM, or CatBoost. XGBoost is applied using traditional Gradient Tree Boosting (GTB). LightGBM is applied using its novel Gradient Based One Sided Sampling (GOSS). CatBoost is applied usings its novel Ordered Boosting. NestedHyperBoost can be applied to regression, multi-class classification, and binary classification problems.

Features

  1. Consistent syntax across all Gradient Boosting methods.
  2. Supported Gradient Boosting methods: XGBoost, LightGBM, CatBoost.
  3. Returns custom object that includes common performance metrics and plots.
  4. Developed for readability, maintainability, and future improvement.

Requirements

  1. Python 3
  2. NumPy
  3. Pandas
  4. MatPlotLib
  5. Scikit-Learn
  6. Hyperopt
  7. XGBoost
  8. LightGBM
  9. CatBoost

Installation

## install pypi release
pip install nestedhyperboost

## install developer version
pip install git+https://github.com/nickkunz/nestedhyperboost.git

Usage

## load libraries
from nestedhyperboost import xgboost
from sklearn import datasets
import pandas

## load data
data_sklearn = datasets.load_iris()
data = pandas.DataFrame(data_sklearn.data, columns = data_sklearn.feature_names)
data['target'] = pandas.Series(data_sklearn.target)

## conduct nestedhyperboost
results = xgboost.xgb_ncv_classifier(
    data = data,
    y = 'target',
    k_inner = 5,
    k_outer = 5,
    n_evals = 10
)

## preview results
results.accu_mean()
results.conf_mtrx()
results.prfs_mean()

## preview plots
results.feat_plot()

## model and params
model = results.model
params = results.params

License

© Nick Kunz, 2019. Licensed under the General Public License v3.0 (GPLv3).

Contributions

NestedHyperBoost is open for improvements and maintenance. Your help is valued to make the package better for everyone.

References

Bergstra, J., Bardenet, R., Bengio, Y., Kegl, B. (2011). Algorithms for Hyper-Parameter Optimization. https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf.

Bergstra, J., Yamins, D., Cox, D. D. (2013). Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. Proceedings of the 30th International Conference on International Conference on Machine Learning. 28:I115–I123. http://proceedings.mlr.press/v28/bergstra13.pdf.

Chen, T., Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 785–794. https://www.kdd.org/kdd2016/papers/files/rfp0697-chenAemb.pdf.

Ke, G., Meng, Q., Finley, T., et al. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Proceedings of the 31st International Conference on Neural Information Processing Systems. 3146-3154. https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf.

Prokhorenkova, L., Gusev, G., Vorobev, A., et al. (2018). CatBoost: Unbiased Boosting with Categorical Features. Proceedings of the 32nd International Conference on Neural Information Processing Systems. 6639–6649. http://learningsys.org/nips17/assets/papers/paper_11.pdf.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nestedhyperboost-0.0.3.tar.gz (25.3 kB view details)

Uploaded Source

Built Distribution

nestedhyperboost-0.0.3-py3-none-any.whl (30.9 kB view details)

Uploaded Python 3

File details

Details for the file nestedhyperboost-0.0.3.tar.gz.

File metadata

  • Download URL: nestedhyperboost-0.0.3.tar.gz
  • Upload date:
  • Size: 25.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.31.0 CPython/3.7.4

File hashes

Hashes for nestedhyperboost-0.0.3.tar.gz
Algorithm Hash digest
SHA256 2820ee7469489bf5f2d80d4913437b2759b346a68439e77d005b71118092544b
MD5 947eba1247cd392c7fb923e8e9d201fb
BLAKE2b-256 4007a205a257ed3477699813db8d5f9be5d9a66b34c2fce02862da09de05ab24

See more details on using hashes here.

File details

Details for the file nestedhyperboost-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: nestedhyperboost-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 30.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.31.0 CPython/3.7.4

File hashes

Hashes for nestedhyperboost-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8af9a9b8dd542d0744e6448ad725f59e8946b7f850a17a0b43288659d92e9601
MD5 f45fa43a8ac37b9bf28598ab51d51e27
BLAKE2b-256 5e70f158a8649d218a2e781a0dc1915c4c0aa2b10f47738713e5692b9fb3bf76

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page