Skip to main content

Scikit-learn Wrapper for Regularized Greedy Forest

Project description

Build Status Travis Build Status AppVeyor License Python Versions PyPI Version

rgf_python

The wrapper of machine learning algorithm Regularized Greedy Forest (RGF) for Python.

Features

Scikit-learn interface and possibility of usage for multi-label classification problem.

Original RGF implementation is available only for regression and binary classification, but rgf_python is also available for multi-label classification by “One-vs-Rest” method.

Example:

from sklearn import datasets
from sklearn.utils.validation import check_random_state
from sklearn.model_selection import StratifiedKFold, cross_val_score
from rgf.sklearn import RGFClassifier

iris = datasets.load_iris()
rng = check_random_state(0)
perm = rng.permutation(iris.target.size)
iris.data = iris.data[perm]
iris.target = iris.target[perm]

rgf = RGFClassifier(max_leaf=400,
                    algorithm="RGF_Sib",
                    test_interval=100,
                    verbose=True)

n_folds = 3

rgf_scores = cross_val_score(rgf,
                             iris.data,
                             iris.target,
                             cv=StratifiedKFold(n_folds))

rgf_score = sum(rgf_scores)/n_folds
print('RGF Classfier score: {0:.5f}'.format(rgf_score))

More examples could be found here.

Software Requirements

  • Python (2.7 or >= 3.4)

  • scikit-learn (>= 0.18)

  • RGF C++ (link)

If you can’t access the above URL, alternatively, you can get RGF C++ by downloading it from this page. Please see README in the zip file to build RGF executional.

Installation

From PyPI using pip:

pip install rgf_python

or from GitHub:

git clone https://github.com/fukatani/rgf_python.git
python setup.py install

You have to place RGF execution file into directory which is included in environmental variable ‘PATH’. Alternatively, you may specify actual location of RGF execution file and directory for placing temp files by corresponding flags in configuration file .rgfrc, which you should create into your home directory. The default values are platform dependent: for Windows exe_location=$HOME/rgf.exe, temp_location=$HOME/temp/rgf and for others exe_location=$HOME/rgf, temp_location=/tmp/rgf. Here is the example of .rgfrc file:

exe_location=C:/Program Files/RGF/bin/rgf.exe
temp_location=C:/Program Files/RGF/temp

Tuning Hyper-parameters

You can tune hyper-parameters as follows.

  • max_leaf: Appropriate values are data-dependent and usually varied from 1000 to 10000.

  • test_interval: For efficiency, it must be either multiple or divisor of 100 (default value of the optimization interval).

  • algorithm: You can select “RGF”, “RGF Opt” or “RGF Sib”.

  • loss: You can select “LS”, “Log” or “Expo”.

  • reg_depth: Must be no smaller than 1. Meant for being used with algorithm = “RGF Opt” or “RGF Sib”.

  • l2: Either 1, 0.1, or 0.01 often produces good results though with exponential loss (loss = “Expo”) and logistic loss (loss = “Log”), some data requires smaller values such as 1e-10 or 1e-20.

  • sl2: Default value is equal to l2. On some data, l2/100 works well.

  • normalize: If turned on, training targets are normalized so that the average becomes zero.

  • min_samples_leaf: Smaller values may slow down training. Too large values may degrade model accuracy.

  • n_iter: Number of iterations of coordinate descent to optimize weights.

  • n_tree_search: Number of trees to be searched for the nodes to split. The most recently grown trees are searched first.

  • opt_interval: Weight optimization interval in terms of the number of leaf nodes.

  • learning_rate: Step size of Newton updates used in coordinate descent to optimize weights.

Detailed instruction of tuning hyper-parameters is here.

Using at Kaggle Kernel

Now, Kaggle Kernel supports rgf_python. Please see this page.

Other

Shamelessly, much part of the implementation is based on the following code. Thanks!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rgf_python-1.3.1.tar.gz (10.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page