Skip to main content

LightGBM/XGBoost interface which tunes n_estimator by splitting data, then refit with entire data

Project description

gbm_autosplit

GBM scikit-learn interfaces which performs "early stopping" with single data set during fit.

"Early stopping" is great practice to tune the number of estimators for gradient boosting models. However it is not difficult to use it in tuning module in scikit-learn such as RandomizedSearchCV / GridSearchCV because to use early stopping module requires two data sets but scikit learn does not have such interface.

To solve this situation, this interface performs following steps with in fit.

  1. Split original input data into two randomly
  2. Estimate n_estimators by using split data set with early stopping
  3. Perform fit by using entire data set with estimated n_estimators

Install

pip install gbm_autosplit

Usage

import gbm_autosplit

estimator = gbm_autosplit.LGBMClassifier()
estimator.fit(x, y)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gbm_autosplit-0.0.12.tar.gz (4.9 kB view hashes)

Uploaded Source

Built Distribution

gbm_autosplit-0.0.12-py3-none-any.whl (5.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page