Skip to main content

Random forest estimator

Project description

R-CMD-check

This is an experimental fork of Rforestry, for the package repo, see (https://github.com/forestry-labs/Rforestry)

Rforestry: Random Forests, Linear Trees, and Gradient Boosting for Inference and Interpretability

Sören Künzel, Theo Saarinen, Simon Walter, Sam Antonyan, Edward Liu, Allen Tang, Jasjeet Sekhon

Introduction

Rforestry is a fast implementation of Honest Random Forests, Gradient Boosting, and Linear Random Forests, with an emphasis on inference and interpretability.

How to install - R Package

  1. The GFortran compiler has to be up to date. GFortran Binaries can be found here.
  2. The devtools package has to be installed. You can install it using, install.packages("devtools").
  3. The package contains compiled code, and you must have a development environment to install the development version. You can use devtools::has_devel() to check whether you do. If no development environment exists, Windows users download and install Rtools and macOS users download and install Xcode.
  4. The latest development version can then be installed using devtools::install_github("forestry-labs/Rforestry"). For Windows users, you'll need to skip 64-bit compilation devtools::install_github("forestry-labs/Rforestry", INSTALL_opts = c('--no-multiarch')) due to an outstanding gcc issue.

How to install - Python Package

The python package must be compiled before it can be used. Note that to compile and link the C++ version of forestry, one must be using either OSX or Linux and must have a C++ compiler installed. For example, one can run:

mkdir build
cd build
cmake .
make

Python Package Usage

Then the python code can be called:

import numpy as np
import pandas as pd
from random import randrange
from Rforestry import RandomForest
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

data = load_iris()
df = pd.DataFrame(data['data'], columns=data['feature_names'])
df['target'] = data['target']
X = df.loc[:, df.columns != 'sepal length (cm)']
y = df['sepal length (cm)']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

fr = RandomForest(ntree = 500)

print("Fitting the forest")
fr.fit(X_train, y_train)


print("Predicting with the forest")
forest_preds = fr.predict(X_test)

Plotting the forest

For visualizing the trees, make sure to install the dtreeviz python library.

from dtreeviz.trees import *
from forestry_shadow import ShadowForestryTree


shadow_forestry = ShadowForestryTree(fr, X, y, X.columns.values, 'sepal length (cm)', tree_id=0)

viz = dtreeviz(shadow_forestry,
                scale=3.0,
                target_name='sepal length (cm)',
                feature_names=X.columns.values)

viz.view()

R Package Usage

set.seed(292315) test_idx <- sample(nrow(iris), 3) x_train <- iris[-test_idx, -1] y_train <- iris[-test_idx, 1] x_test <- iris[test_idx, -1]

rf <- forestry(x = x_train, y = y_train, nthread = 2)

predict(rf, x_test)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

random-forestry-0.10.0.tar.gz (897.4 kB view hashes)

Uploaded Source

Built Distributions

random_forestry-0.10.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (267.0 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.24+ x86-64 manylinux: glibc 2.28+ x86-64

random_forestry-0.10.0-cp311-cp311-macosx_11_0_arm64.whl (230.9 kB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

random_forestry-0.10.0-cp311-cp311-macosx_10_9_x86_64.whl (13.4 MB view hashes)

Uploaded CPython 3.11 macOS 10.9+ x86-64

random_forestry-0.10.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (267.0 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.24+ x86-64 manylinux: glibc 2.28+ x86-64

random_forestry-0.10.0-cp310-cp310-macosx_11_0_arm64.whl (230.9 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

random_forestry-0.10.0-cp310-cp310-macosx_10_9_x86_64.whl (13.4 MB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

random_forestry-0.10.0-cp39-cp39-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (267.1 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.24+ x86-64 manylinux: glibc 2.28+ x86-64

random_forestry-0.10.0-cp39-cp39-macosx_11_0_arm64.whl (231.0 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

random_forestry-0.10.0-cp39-cp39-macosx_10_9_x86_64.whl (13.4 MB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

random_forestry-0.10.0-cp38-cp38-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (266.9 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.24+ x86-64 manylinux: glibc 2.28+ x86-64

random_forestry-0.10.0-cp38-cp38-macosx_11_0_arm64.whl (231.0 kB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

random_forestry-0.10.0-cp38-cp38-macosx_10_9_x86_64.whl (13.4 MB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page