scikit-learn compatible alternative random forests algorithms
Project description
WildWood
Scikit-Learn compatible Random Forest algorithms
Documentation | Reproduce experiments |
Installation
The easiest way to install wildwood is using pip
pip install wildwood
But you can also use the latest development from github directly with
pip install git+https://github.com/pyensemble/wildwood.git
Experiments
Experiments with hyperparameters optimization
To run experiments with hyperparameters optimization, under directory experiments/
, use
python run_hyperopt_classfiers.py --clf_name WildWood --dataset_name adult
(with WildWood
and on adult
dataset in this example).
Some options are
- Setting
--n_estimators
or-t
for number of estimators (for maximal number of boosting iterations in case of gradient boosting algorithms), default 100. - Setting
--hyperopt_evals
or-n
for number of hyperopt steps, default 50.
Experiments on default parameters
To run experiments with default parameters, under directory experiments/
, use
python run_benchmark_default_params_classifiers.py --clf_name WildWood --dataset_name adult
(with WildWood
and on adult
dataset in this example).
Datasets and classifiers
For both run_hyperopt_classfiers.py
and run_benchmark_default_params_classifiers. py
, the available options for dataset_name
are:
adult
bank
breastcancer
car
cardio
churn
default-cb
letter
satimage
sensorless
spambase
amazon
covtype
internet
kick
kddcup
higgs
while the available options for clf_name
are
LGBMClassifier
XGBClassifier
CatBoostClassifier
RandomForestClassifier
HistGradientBoostingClassifier
WildWood
Experiments presented in the paper
All the scripts allowing to reproduce the experiments from the paper are available
in the experiments/
folder
-
Figure 1 is produced using
fig_aggregation_effect.py
. -
Figure 2 is produced using
n_tree_experiment.py
. -
Tables 1 and 3 from the paper are produced using
run_hyperopt_classfiers.py
withn_estimators=5000
for gradient boosting algorithms and withn_estimators=n
forRFn
andWWn
- call
python run_hyperopt_classfiers.py --clf_name <classifier> --dataset_name <dataset> --n_estimators <n_estimators>
for each pair
(<classifier>, <dataset>)
to run hyperparameters optimization experiments;- use for example
import pickle as pkl filename = 'exp_hyperopt_xxx.pickle' with open(filename, "rb") as f: results = pkl.load(f) df = results["results"]
to retrieve experiments information, such as AUC, logloss and their standard deviation.
-
Tables 2 and 4 are produced using
benchmark_default_params.py
- call
python run_benchmark_default_params_classifiers.py --clf_name <classifier> --dataset_name <dataset>
for each pair
(<classifier>, <dataset>)
to run experiments with default parameters;- use similar commands to retrieve experiments information.
-
Using experiments results (AUC and fit time) done by
run_hyperopt_classfiers.py
, then concatenating dataframes and usingfig_auc_fit_time.py
to produce Figure 3.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.