Skip to main content

Implements Wide Boosting functions for popular boosting packages

Project description

wideboost

Implements wide boosting using popular boosting frameworks as a backend. XGBoost supports the most wideboost features currently. Previous versions supported LightGBM, but this has since been deprecated.

Getting started

pip install wideboost

Sample scripts

The examples folder contains sample scripts for regression, binary classification, multivariate classification and multioutput binary classification. Currently xgboost is the only supported backend.

Starter script

import xgboost as xgb
from wideboost.wrappers import wxgb
from pydataset import data
import numpy as np

########
## Get and format the data
DAT = np.asarray(data('Yogurt'))
X = DAT[:,0:9]
Y = np.zeros([X.shape[0],1])
Y[DAT[:,9] == 'dannon'] = 1
Y[DAT[:,9] == 'hiland'] = 2
Y[DAT[:,9] == 'weight'] = 3
Y = wxgb.onehot(Y)

n = X.shape[0]
np.random.seed(123)
train_idx = np.random.choice(np.arange(n),round(n*0.4),replace=False)
test_idx = np.setdiff1d(np.arange(n),train_idx)

xtrain, ytrain = X[train_idx,:], Y[train_idx,]
xtest, ytest = X[test_idx,:],Y[test_idx,]
########

param = {
    'eta':0.1,
    'btype':'I',      ## wideboost param -- one of 'I', 'In', 'R', 'Rn'
    'extra_dims':1,   ## wideboost param -- integer >= -output_dim
    'beta_eta': 0.01, ## wideboost param -- learning rate for B. Can be unstable -- set to 0 to start.
    'output_dim': 4,  ## wideboost param -- Y must be in a 2D format (ie not a vector of categories)
    'objective':'manybinary:logistic',  ## treat response columns as separate binary problems
    'eval_metric':['many_logloss']      ## average binary logloss across columns
}

num_round = 100
watchlist = [((xtrain, ytrain),'train'),((xtest, ytest),'test')]
wxgb_results = dict()
bst = wxgb.fit(xtrain, ytrain, param, num_round, watchlist, evals_result=wxgb_results, verbose_eval=10)

Parameter Explanations

  • 'btype' indicates how to initialize the beta matrix. Settings are 'I', 'In', 'R', 'Rn'.
  • 'beta_eta' learning rate for the beta matrix. Sometimes unstable. Start with 0.
  • 'output_dim' width of Y. All Y need to be in 2D matrix format and onehotted if doing categorical prediction.
  • 'extra_dims' integer indicating how many "wide" dimensions are used. When 'extra_dims' is set to 0 (and 'btype' is set to 'I' and 'beta_eta' is 0) then wide boosting is equivalent to standard gradient boosting.

New Objectives

  • 'multi:squarederror' multidimension output regression.
  • 'manybinary:logistic' loss is independent logloss average across response columns

New Evals

  • 'many_logloss' logloss averaged across response columns
  • 'many_auc' auc averaged across response columns

Reference

https://arxiv.org/pdf/2007.09855.pdf

Analyses included in the paper are in the examples/paper_examples/ folder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wideboost-0.4.1.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wideboost-0.4.1-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file wideboost-0.4.1.tar.gz.

File metadata

  • Download URL: wideboost-0.4.1.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.0

File hashes

Hashes for wideboost-0.4.1.tar.gz
Algorithm Hash digest
SHA256 d305be5378d20265c09578741e7108d821ac050dd358de8246eb0089105328e1
MD5 5577f139e57f59a340790d6b188bf0bd
BLAKE2b-256 ff8c92efa29f14758a1b66def89dc84d65252d335d2443daf29d212632e7cbd8

See more details on using hashes here.

File details

Details for the file wideboost-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: wideboost-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 18.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.0

File hashes

Hashes for wideboost-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fb81c3c352b6512b77a167a00af038be7c7556f8315b5dc46b5773a2f94639dc
MD5 4e5dac23342fa628ea7af97aeeefcb3a
BLAKE2b-256 fb1a34c2aa739e73062db528b4572f533c5a9515ec0e74b28403933c3d47de1a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page