irf
Project description
iterative Random Forest
The algorithm details are available at:
Sumanta Basu, Karl Kumbier, James B. Brown, Bin Yu, Iterative Random Forests to detect predictive and stable high-order interactions, PNAS https://www.pnas.org/content/115/8/1943
The implementation is a joint effort of several people in UC Berkeley. See the Authors.md for the complete list. The weighted random forest implementation is based on the random forest source code and API design from scikit-learn, details can be found in API design for machine learning software: experiences from the scikit-learn project, Buitinck et al., 2013.. The setup file is based on the setup file from skgarden.
Installation
Dependencies
The irf package requires
- Python (>= 3.3)
- Numpy (>= 1.8.2)
- Scipy (>= 0.13.3)
- Cython
- pydotplus
- matplotlib
- jupyter
- pyyaml
- scikit-learn (>= 0.22)
Before the installation, please make sure you installed the above python packages correctly via pip:
pip install cython numpy scikit-learn pydotplus jupyter pyyaml matplotlib
Basic setup and installation
Installing irf package is simple. Just clone this repo and use pip install.
git clone https://github.com/Yu-Group/iterative-Random-Forest
Then go to the iterative-Random-Forest
folder and use pip install:
pip install -e .
If irf is installed successfully, you should be able to see it using pip list:
pip list | grep irf
and you should be able to run all the tests (assume the working directory is in the package iterative-Random-Forest):
python irf/tests/test_irf_utils.py
python irf/tests/test_irf_weighted.py
A simple demo
In order to use irf, you need to import it in python.
import numpy as np
from irf import irf_utils
from irf.ensemble import RandomForestClassifierWithWeights
Generate a simple data set with 2 features: 1st feature is a noise feature that has no power in predicting the labels, the 2nd feature determines the label perfectly:
n_samples = 1000
n_features = 10
X_train = np.random.uniform(low=0, high=1, size=(n_samples, n_features))
y_train = np.random.choice([0, 1], size=(n_samples,), p=[.5, .5])
X_test = np.random.uniform(low=0, high=1, size=(n_samples, n_features))
y_test = np.random.choice([0, 1], size=(n_samples,), p=[.5, .5])
# The second feature (which is indexed by 1) is very important
X_train[:, 1] = X_train[:, 1] + y_train
X_test[:, 1] = X_test[:, 1] + y_test
Then run irf
all_rf_weights, all_K_iter_rf_data, \
all_rf_bootstrap_output, all_rit_bootstrap_output, \
stability_score = irf_utils.run_iRF(X_train=X_train,
X_test=X_test,
y_train=y_train,
y_test=y_test,
K=5, # number of iteration
rf = RandomForestClassifierWithWeights(n_estimators=20),
B=30,
random_state_classifier=2018, # random seed
propn_n_samples=.2,
bin_class_type=1,
M=20,
max_depth=5,
noisy_split=False,
num_splits=2,
n_estimators_bootstrap=5)
all_rf_weights stores all the weights for each iteration:
print(all_rf_weights['rf_weight5'])
The proposed feature combination and their scores:
print(stability_score)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file irf-0.2.5.tar.gz
.
File metadata
- Download URL: irf-0.2.5.tar.gz
- Upload date:
- Size: 5.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 275f2fcff0a1ffb4ae4296ff206c4df5c30553b6ff107c7ac7ce7c9240f7e949 |
|
MD5 | fb8fccd9341a461f002056f299a6dedc |
|
BLAKE2b-256 | e22ae0ee927e1f6d2be4efb18f5700fa7ffd95a9359ac721e250781fea689e32 |
File details
Details for the file irf-0.2.5-cp37-cp37m-macosx_11_0_x86_64.whl
.
File metadata
- Download URL: irf-0.2.5-cp37-cp37m-macosx_11_0_x86_64.whl
- Upload date:
- Size: 243.9 kB
- Tags: CPython 3.7m, macOS 11.0+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39294676ebc69b78d55eb70e6b2ad3126c0ba2b341fb66cab75a1411f4c5a9e6 |
|
MD5 | 559625d86c6c003b06111acb029d959a |
|
BLAKE2b-256 | 925e1066aff72ee8853aba72e86642595f96f762f801af3b68d56048436ad65d |