Skip to main content

irf

Project description

Binder

iterative Random Forest

The algorithm details are available at:

Sumanta Basu, Karl Kumbier, James B. Brown, Bin Yu, Iterative Random Forests to detect predictive and stable high-order interactions, PNAS https://www.pnas.org/content/115/8/1943

The implementation is a joint effort of several people in UC Berkeley. See the Authors.md for the complete list. The weighted random forest implementation is based on the random forest source code and API design from scikit-learn, details can be found in API design for machine learning software: experiences from the scikit-learn project, Buitinck et al., 2013.. The setup file is based on the setup file from skgarden.

Installation

Dependencies

The irf package requires

  • Python (>= 3.3)
  • Numpy (>= 1.8.2)
  • Scipy (>= 0.13.3)
  • Cython
  • pydotplus
  • matplotlib
  • jupyter
  • pyyaml
  • scikit-learn (>= 0.22)

Before the installation, please make sure you installed the above python packages correctly via pip:

pip install cython numpy scikit-learn pydotplus jupyter pyyaml matplotlib

Basic setup and installation

Installing irf package is simple. Just clone this repo and use pip install.

git clone https://github.com/Yu-Group/iterative-Random-Forest

Then go to the iterative-Random-Forest folder and use pip install:

pip install -e .

If irf is installed successfully, you should be able to see it using pip list:

pip list | grep irf

and you should be able to run all the tests (assume the working directory is in the package iterative-Random-Forest):

python irf/tests/test_irf_utils.py
python irf/tests/test_irf_weighted.py

A simple demo

In order to use irf, you need to import it in python.

import numpy as np
from irf import irf_utils
from irf.ensemble import RandomForestClassifierWithWeights

Generate a simple data set with 2 features: 1st feature is a noise feature that has no power in predicting the labels, the 2nd feature determines the label perfectly:

n_samples = 1000
n_features = 10
X_train = np.random.uniform(low=0, high=1, size=(n_samples, n_features))
y_train = np.random.choice([0, 1], size=(n_samples,), p=[.5, .5])
X_test = np.random.uniform(low=0, high=1, size=(n_samples, n_features))
y_test = np.random.choice([0, 1], size=(n_samples,), p=[.5, .5])
# The second feature (which is indexed by 1) is very important
X_train[:, 1] = X_train[:, 1] + y_train
X_test[:, 1] = X_test[:, 1] + y_test

Then run irf

all_rf_weights, all_K_iter_rf_data, \
    all_rf_bootstrap_output, all_rit_bootstrap_output, \
    stability_score = irf_utils.run_iRF(X_train=X_train,
                                        X_test=X_test,
                                        y_train=y_train,
                                        y_test=y_test,
                                        K=5,                          # number of iteration
                                        rf = RandomForestClassifierWithWeights(n_estimators=20),
                                        B=30,
                                        random_state_classifier=2018, # random seed
                                        propn_n_samples=.2,
                                        bin_class_type=1,
                                        M=20,
                                        max_depth=5,
                                        noisy_split=False,
                                        num_splits=2,
                                        n_estimators_bootstrap=5)

all_rf_weights stores all the weights for each iteration:

print(all_rf_weights['rf_weight5'])

The proposed feature combination and their scores:

print(stability_score)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

irf-0.2.5.tar.gz (5.7 MB view details)

Uploaded Source

Built Distribution

irf-0.2.5-cp37-cp37m-macosx_11_0_x86_64.whl (243.9 kB view details)

Uploaded CPython 3.7m macOS 11.0+ x86-64

File details

Details for the file irf-0.2.5.tar.gz.

File metadata

  • Download URL: irf-0.2.5.tar.gz
  • Upload date:
  • Size: 5.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.0

File hashes

Hashes for irf-0.2.5.tar.gz
Algorithm Hash digest
SHA256 275f2fcff0a1ffb4ae4296ff206c4df5c30553b6ff107c7ac7ce7c9240f7e949
MD5 fb8fccd9341a461f002056f299a6dedc
BLAKE2b-256 e22ae0ee927e1f6d2be4efb18f5700fa7ffd95a9359ac721e250781fea689e32

See more details on using hashes here.

File details

Details for the file irf-0.2.5-cp37-cp37m-macosx_11_0_x86_64.whl.

File metadata

  • Download URL: irf-0.2.5-cp37-cp37m-macosx_11_0_x86_64.whl
  • Upload date:
  • Size: 243.9 kB
  • Tags: CPython 3.7m, macOS 11.0+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.0

File hashes

Hashes for irf-0.2.5-cp37-cp37m-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 39294676ebc69b78d55eb70e6b2ad3126c0ba2b341fb66cab75a1411f4c5a9e6
MD5 559625d86c6c003b06111acb029d959a
BLAKE2b-256 925e1066aff72ee8853aba72e86642595f96f762f801af3b68d56048436ad65d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page