Unofficial implementation of RolexBoost: A RotationBased Boosting Algorithm With Adaptive Loss Functions
Project description
RolexBoost
Unofficial implementation of D. Yang, H. Lee and D. Lim, "RolexBoost: A RotationBased Boosting Algorithm With Adaptive Loss Functions," in IEEE Access, vol. 8, pp. 4103741044, 2020, doi: 10.1109/ACCESS.2020.2976822.
This is the course project of Fundamentals of Machine Learning, Tsinghua University, 2020 Autumn.
Installation
pip install rolexboost
API reference
We provided scikitlearnlike API for the RolexBoost algorithm proposed in the paper, together with the RotationForest and FlexBoost, which are the source of RolexBoost's idea.
Note that
 Only classifiers are provided. We did not implemented the regressors because they are not mentioned in the paper, while this project is intended as a reproduction.
 We only ensures that the
fit
andpredict
API works well. Some others, such asscore
, may be functional thanks to the scikitlearnBaseEstimator
andClassifierMixin
base classes, but still others, such asfit_predict
orpredict_proba
, are currently unavailable.
We might implement those two in the future if someone is interested in this project.
Basic Example
>>> import pandas as pd
>>> import numpy as np
>>> from rolexboost import RolexBoostClassifier, FlexBoostClassifier, RotationForestClassifier
>>> clf = RolexBoostClassifier() # Or the other two classifiers
>>> df = pd.DataFrame({"A": [2,6,5,7,1,8], "B":[8,5,2,3,4,6], "C": [3,9,5,4,6,1], "Label": [0,1,1,0,0,1]})
>>> df
A B C Label
0 2 8 3 0
1 6 5 9 1
2 5 2 5 1
3 7 3 4 0
4 1 4 6 0
5 8 6 1 1
>>> X = df[["A", "B", "C"]]
>>> y = df["Label"]
>>> clf.fit(X, y)
RolexBoostClassifier()
>>> clf.predict(X)
array([0, 1, 1, 0, 0, 1], dtype=int64)
>>> test_X = np.array([
... [3,1,2],
... [2,5,1],
... [5,1,7]
... ])
>>> clf.predict(test_X)
array([1, 0, 1], dtype=int64)
API References
Rotation Forest
RotationForestClassifier(
n_estimators=100,
n_features_per_subset=3,
bootstrap_rate=0.75,
criterion="gini",
splitter="best",
max_depth=None,
min_samples_split=2,
min_samples_leaf=1,
min_weight_fraction_leaf=0.0,
max_features=None,
random_state=None,
max_leaf_nodes=None,
min_impurity_decrease=0.0,
min_impurity_split=None,
class_weight=None,
presort="deprecated",
ccp_alpha=0.0,
)
n_estimators
: number of base estimatorsn_features_per_subset
: number of features in each subsetbootstrap_rate
: ratio of samples bootstrapped in the original dataset
All other parameters are passed to the DecisionTreeClassifier
of scikitlearn. Please refer to their documentation for details.
Note:
In the algorithm description, a parameter controls the number of subsets, and the number of features is derived from it.
However, the validation part of the paper says that "the number of features in each subset was set to three".
In our framework, the parameter n_samples_per_subset
is thus formulated in this way to make the benchmark evaluation easier.
FlexBoost
FlexBoostClassifier(
n_estimators=100,
K=0.5,
criterion="gini",
splitter="best",
max_depth=1,
min_samples_split=2,
min_samples_leaf=1,
min_weight_fraction_leaf=0.0,
max_features=None,
random_state=None,
max_leaf_nodes=None,
min_impurity_decrease=0.0,
min_impurity_split=None,
class_weight=None,
presort="deprecated",
ccp_alpha=0.0,
)
n_estimators
: number of base estimatorsK
: the parameter to control the "aggressiveness" and "conservativeness" in the adaptive loss function choosing process. It should be a number between 0 and 1.
All other parameters are passed to the DecisionTreeClassifier
of scikitlearn. Please refer to their documentation for details.
The default parameter for max_depth
is 1, because FlexBoost is a modification of AdaBoost, and they should converge to the same result when K=1
.
In scikitlearn implementation of AdaBoost, the default max_depth
for the DecisionTreeClassifier is 1.
RolexBoost
RolexBoostClassifier(
n_estimators=100,
n_features_per_subset=3,
bootstrap_rate=0.75,
K=0.5,
criterion="gini",
splitter="best",
max_depth=1,
min_samples_split=2,
min_samples_leaf=1,
min_weight_fraction_leaf=0.0,
max_features=None,
random_state=None,
max_leaf_nodes=None,
min_impurity_decrease=0.0,
min_impurity_split=None,
class_weight=None,
presort="deprecated",
ccp_alpha=0.0
)
n_estimators
: number of base estimatorsn_features_per_subset
: number of features in each subsetbootstrap_rate
: ratio of samples bootstrapped in the original datasetK
: the parameter to control the "aggressiveness" and "conservativeness" in the adaptive loss function choosing process. It should be a number between 0 and 1.
All other parameters are passed to the DecisionTreeClassifier
of scikitlearn. Please refer to their documentation for details.
Note:
In the algorithm description, a parameter controls the number of subsets, and the number of features is derived from it.
However, the validation part of the paper says that "the number of features in each subset was set to three".
In our framework, the parameter n_samples_per_subset
is thus formulated in this way to make the benchmark evaluation easier.
The default parameter for max_depth
is 1, because RolexBoost integrates the idea of FlexBoost. Please refer to the last section about why FlexBoost has a default max_depth
of 1.
Performance Benchmarks
We have tested the three algorithms on 13 datasets mentioned in the paper.
Here is the result:
algorithm  accuracy  benchmark  ratio 

RotationForest  0.7898  0.7947  0.9938 
FlexBoost  0.7976  0.8095  0.9853 
RolexBoost  0.7775  0.8167  0.9520 
accuracy
refers to the average accuracy of our implementationbenchmark
refers to the average accuracy reported in the paperratio
is accuracy/benchmark
For the detail of each algorithm on each dataset, please run the tests/accuracytest.py. The test may take ~1 hour to finish.
Some datasets reported in the paper are not involved in the benchmark testing for the following two reasons:
 We cannot find the corresponding dataset in the UCI Machine Learning Repository
 The 3class problems are each divided into three 2class problems. We are not sure about how such division is done.
Project details
Release history Release notifications  RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for rolexboost0.0.1py3noneany.whl
Algorithm  Hash digest  

SHA256  5cd2d26e0d1bba56f7a47b3d9e2452ac040425dff15844ef9521076b50fa9bb6 

MD5  0beb36740ebdbaf7a7715ff81e06be9b 

BLAKE2b256  d021bef1ef5afce77962e6eed8128395b14988ae7972aabd8361eb79ab0fbbe9 