Implements Wide Boosting functions for popular boosting packages
Project description
wideboost
Implements wide boosting using popular boosting frameworks as a backend. XGBoost supports the most wideboost features currently. Previous versions supported LightGBM, but this has since been deprecated.
Getting started
pip install wideboost
Sample scripts
The examples folder contains sample scripts for regression, binary classification, multivariate classification and multioutput binary classification. Currently xgboost is the only supported backend.
Starter script
import xgboost as xgb
from wideboost.wrappers import wxgb
from pydataset import data
import numpy as np
########
## Get and format the data
DAT = np.asarray(data('Yogurt'))
X = DAT[:,0:9]
Y = np.zeros([X.shape[0],1])
Y[DAT[:,9] == 'dannon'] = 1
Y[DAT[:,9] == 'hiland'] = 2
Y[DAT[:,9] == 'weight'] = 3
Y = wxgb.onehot(Y)
n = X.shape[0]
np.random.seed(123)
train_idx = np.random.choice(np.arange(n),round(n*0.4),replace=False)
test_idx = np.setdiff1d(np.arange(n),train_idx)
xtrain, ytrain = X[train_idx,:], Y[train_idx,]
xtest, ytest = X[test_idx,:],Y[test_idx,]
########
param = {
'eta':0.1,
'btype':'I', ## wideboost param -- one of 'I', 'In', 'R', 'Rn'
'extra_dims':1, ## wideboost param -- integer >= -output_dim
'beta_eta': 0.01, ## wideboost param -- learning rate for B. Can be unstable -- set to 0 to start.
'output_dim': 4, ## wideboost param -- Y must be in a 2D format (ie not a vector of categories)
'objective':'manybinary:logistic', ## treat response columns as separate binary problems
'eval_metric':['many_logloss'] ## average binary logloss across columns
}
num_round = 100
watchlist = [((xtrain, ytrain),'train'),((xtest, ytest),'test')]
wxgb_results = dict()
bst = wxgb.fit(xtrain, ytrain, param, num_round, watchlist, evals_result=wxgb_results, verbose_eval=10)
Parameter Explanations
'btype'
indicates how to initialize the beta matrix. Settings are'I'
,'In'
,'R'
,'Rn'
.'beta_eta'
learning rate for the beta matrix. Sometimes unstable. Start with 0.'output_dim'
width of Y. All Y need to be in 2D matrix format and onehotted if doing categorical prediction.'extra_dims'
integer indicating how many "wide" dimensions are used. When'extra_dims'
is set to0
(and'btype'
is set to'I'
and'beta_eta'
is0
) then wide boosting is equivalent to standard gradient boosting.
New Objectives
'multi:squarederror'
multidimension output regression.'manybinary:logistic'
loss is independent logloss average across response columns
New Evals
'many_logloss'
logloss averaged across response columns'many_auc'
auc averaged across response columns
Reference
https://arxiv.org/pdf/2007.09855.pdf
Analyses included in the paper are in the examples/paper_examples/ folder.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file wideboost-0.4.2.tar.gz
.
File metadata
- Download URL: wideboost-0.4.2.tar.gz
- Upload date:
- Size: 12.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | faf78d5cdaa1a1d7457eb7581faff9e0d872a6ad7777f16a9e1442f92521e9a9 |
|
MD5 | b59b1fbad7af53ab854ce1896bd09c7e |
|
BLAKE2b-256 | 88b3ab19ac02f35288f62f11827b04222166e5af7021b9d51d8c2a557b4a15be |
File details
Details for the file wideboost-0.4.2-py3-none-any.whl
.
File metadata
- Download URL: wideboost-0.4.2-py3-none-any.whl
- Upload date:
- Size: 18.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7b17d08a5b3e304e7e028c5e5232df1eaf79ef1252073728859f33d5a88dc880 |
|
MD5 | f09eef7778bcb5492484490376921cf8 |
|
BLAKE2b-256 | ba7a7f5071ae71a7661ca3952244fc122fa12eb0428cbfa633030b54a5c1f9b0 |