Skip to main content

Implements Wide Boosting functions for popular boosting packages

Project description

wideboost

Implements wide boosting using popular boosting frameworks as a backend. XGBoost supports the most wideboost features currently. Previous versions supported LightGBM, but this has since been deprecated.

Getting started

pip install wideboost

Sample scripts

The examples folder contains sample scripts for regression, binary classification, multivariate classification and multioutput binary classification. Currently xgboost is the only supported backend.

Starter script

import xgboost as xgb
from wideboost.wrappers import wxgb
from pydataset import data
import numpy as np

########
## Get and format the data
DAT = np.asarray(data('Yogurt'))
X = DAT[:,0:9]
Y = np.zeros([X.shape[0],1])
Y[DAT[:,9] == 'dannon'] = 1
Y[DAT[:,9] == 'hiland'] = 2
Y[DAT[:,9] == 'weight'] = 3
Y = wxgb.onehot(Y)

n = X.shape[0]
np.random.seed(123)
train_idx = np.random.choice(np.arange(n),round(n*0.4),replace=False)
test_idx = np.setdiff1d(np.arange(n),train_idx)

xtrain, ytrain = X[train_idx,:], Y[train_idx,]
xtest, ytest = X[test_idx,:],Y[test_idx,]
########

param = {
    'eta':0.1,
    'btype':'I',      ## wideboost param -- one of 'I', 'In', 'R', 'Rn'
    'extra_dims':1,   ## wideboost param -- integer >= -output_dim
    'beta_eta': 0.01, ## wideboost param -- learning rate for B. Can be unstable -- set to 0 to start.
    'output_dim': 4,  ## wideboost param -- Y must be in a 2D format (ie not a vector of categories)
    'objective':'manybinary:logistic',  ## treat response columns as separate binary problems
    'eval_metric':['many_logloss']      ## average binary logloss across columns
}

num_round = 100
watchlist = [((xtrain, ytrain),'train'),((xtest, ytest),'test')]
wxgb_results = dict()
bst = wxgb.fit(xtrain, ytrain, param, num_round, watchlist, evals_result=wxgb_results, verbose_eval=10)

Parameter Explanations

  • 'btype' indicates how to initialize the beta matrix. Settings are 'I', 'In', 'R', 'Rn'.
  • 'beta_eta' learning rate for the beta matrix. Sometimes unstable. Start with 0.
  • 'output_dim' width of Y. All Y need to be in 2D matrix format and onehotted if doing categorical prediction.
  • 'extra_dims' integer indicating how many "wide" dimensions are used. When 'extra_dims' is set to 0 (and 'btype' is set to 'I' and 'beta_eta' is 0) then wide boosting is equivalent to standard gradient boosting.

New Objectives

  • 'multi:squarederror' multidimension output regression.
  • 'manybinary:logistic' loss is independent logloss average across response columns

New Evals

  • 'many_logloss' logloss averaged across response columns
  • 'many_auc' auc averaged across response columns

Reference

https://arxiv.org/pdf/2007.09855.pdf

Analyses included in the paper are in the examples/paper_examples/ folder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wideboost-0.4.2.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

wideboost-0.4.2-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file wideboost-0.4.2.tar.gz.

File metadata

  • Download URL: wideboost-0.4.2.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.0

File hashes

Hashes for wideboost-0.4.2.tar.gz
Algorithm Hash digest
SHA256 faf78d5cdaa1a1d7457eb7581faff9e0d872a6ad7777f16a9e1442f92521e9a9
MD5 b59b1fbad7af53ab854ce1896bd09c7e
BLAKE2b-256 88b3ab19ac02f35288f62f11827b04222166e5af7021b9d51d8c2a557b4a15be

See more details on using hashes here.

File details

Details for the file wideboost-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: wideboost-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 18.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.0

File hashes

Hashes for wideboost-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7b17d08a5b3e304e7e028c5e5232df1eaf79ef1252073728859f33d5a88dc880
MD5 f09eef7778bcb5492484490376921cf8
BLAKE2b-256 ba7a7f5071ae71a7661ca3952244fc122fa12eb0428cbfa633030b54a5c1f9b0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page