Skip to main content

BetaBoosting: gradient boosting with a beta function.

Project description

BetaBoost

A small wrapper to do Beta Boosting with XgBoost

pip install BetaBoost==0.0.4

Initiate a BetaBoost object and fit a XgBoost model. Returns a XgBoost Train Object.

Betaboosting assumes that there is some spike in the learning rate that exists which can give 'optimal' performance, a beta kernel with a floor is used for convenience.

A quick example with some toy data. Found this example awhile ago for learning rate decay:

import numpy as np
import xgboost as xgb
import matplotlib.pyplot as plt

def generate_data():
    y = np.random.gamma(2, 4, OBS)
    X = np.random.normal(5, 2, [OBS, FEATURES])
    return X, y

max_iter = 300
eta_base = 0.2
eta_min = 0.1
eta_decay = np.linspace(eta_base, eta_min, max_iter).tolist()
OBS = 10 ** 4
FEATURES = 20
PARAMS = {
    'eta': eta_base,
    "booster": "gbtree",
}


X_train, y_train = generate_data()
X_test, y_test = generate_data()
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
evals_result = {'train': dtrain}

progress1 = dict()
model1 = xgb.train(
    maximize=True,
    params=PARAMS,
    dtrain=dtrain,
    num_boost_round=max_iter,
    early_stopping_rounds=max_iter,
    evals=[(dtrain, 'train'),(dtest, 'test')],
    evals_result=progress1,
    verbose_eval=False,
    callbacks=[xgb.callback.LearningRateScheduler(eta_decay)]
)

progress2 = dict()
model2 = xgb.train(
    maximize=True,
    params=PARAMS,
    dtrain=dtrain,
    num_boost_round=max_iter,
    early_stopping_rounds=max_iter,
    evals=[(dtrain, 'train'),(dtest, 'test')],
    evals_result=progress2,
    verbose_eval=False,
    callbacks=[xgb.callback.LearningRateScheduler(list(np.ones(max_iter)*0.01))]
)


progress3 = dict()
model3 = xgb.train(
    maximize=True,
    params=PARAMS,
    dtrain=dtrain,
    num_boost_round=max_iter,
    early_stopping_rounds=max_iter,
    evals=[(dtrain, 'train'),(dtest, 'test')],
    evals_result=progress3,
    verbose_eval=False,
    callbacks=[xgb.callback.LearningRateScheduler(list(np.ones(max_iter)*0.1))]
)

#Here we call the BetaBoost, the wrapper parameters are passed in the class init
bb_evals = dict()
import BetaBoost as bb
betabooster = bb.BetaBoost(n_boosting_rounds = max_iter)
betabooster.fit(dtrain=dtrain,
                maximize=True,
                params = PARAMS,
                early_stopping_rounds=max_iter,
                evals=[(dtrain, 'train'),(dtest, 'test')],
                evals_result = bb_evals,
                verbose_eval = False)

plt.plot(progress1['test']['rmse'], linestyle = 'dashed', color = 'b', label = 'eta test decay')
plt.plot(progress2['test']['rmse'], linestyle = 'dashed', color = 'r', label = '0.01 test')
plt.plot(progress3['test']['rmse'], linestyle = 'dashed', color = 'black', label = '0.1 test')
plt.plot(bb_evals['test']['rmse'], linestyle = 'dashed', color = 'y', label = 'bb test')
plt.legend()
plt.show()

alt text

A clear nice-ety of beta boosting is that the model's test set is one of the first models to converge and yet is resistant to overfitting. Plotting the default beta kernel settings will show a function that starts quite small and peaks around .5 by iteration 30. This process allows the model to make the largest jumps not when the trees are super weak (standard eta decay) or when they are too strong.

A quick convergence and robustness to overfitting can be achieved with slight tunings of the beta kernel parameters...it's only 6 more parameters guys :)

FAQ:

Why a beta pdf?

TLDR: it worked pretty well!

Longer incohernt look at it: https://github.com/tblume1992/portfolio/blob/master/GradientBoostedTrees/3_Dynamic_GBT.ipynb

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

BetaBoost-0.0.5.tar.gz (3.9 kB view details)

Uploaded Source

File details

Details for the file BetaBoost-0.0.5.tar.gz.

File metadata

  • Download URL: BetaBoost-0.0.5.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.2.0.post20200511 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6

File hashes

Hashes for BetaBoost-0.0.5.tar.gz
Algorithm Hash digest
SHA256 6cfcf33b03fb61291ee37b8928412975f700d10cb1387543cbb88880c0a0e05c
MD5 a415166ea762b4076d28c8d7426a3df3
BLAKE2b-256 068a464d6d07d931572d0b41bc439aa3ff00e33e4c9fb6d3bd10e45ee1f883a0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page