Skip to main content

Build decision trees and random forests for classification and regression.

Project description

Description

Build random forests for classification and regression problems. The same program is available on CRAN for R users.

Installation

For Python:

pip install brif

For R:

install.packages('brif')

To use on Google Colab:

!pip install brif
from brif import brif

Examples

from brif import brif
import pandas as pd

# Create a brif object with default parameters.
bf = brif.brif()  

# Display the current parameter values. 
bf.get_param()  

# To change certain parameter values, e.g.:
bf.set_param({'ntrees':100, 'nthreads':2})

# Or simply:
bf.ntrees = 200

# Load input data frame. Data must be a pandas data frame with appropriate headers.
df = pd.read_csv("auto.csv")

# Train the model
bf.fit(df, 'origin')  # specify the target column name

# Or equivalently
bf.fit(df, 7)  # specify the target column index

# Make predictions 
# The target variable column must be excluded, and all other columns should appear in the same order as in training
# Here, predict the first 10 rows of df
pred_labels = bf.predict(df.iloc[0:10, 0:7], type='class')  # return a list containing the predicted class labels
pred_scores = bf.predict(df.iloc[0:10, 0:7], type='score')  # return a data frame containing predicted probabilities by class

# Note: for a regression problem (i.e., when the response variable is numeric type), the predict function will always return a list containing the predicted values

Parameters

tmp_preddata a character string specifying a filename to save the temporary scoring data. Default is "tmp_brif_preddata.txt".

n_numeric_cuts an integer value indicating the maximum number of split points to generate for each numeric variable.

n_integer_cuts an integer value indicating the maximum number of split points to generate for each integer variable.

max_integer_classes an integer value. If the target variable is integer and has more than max_integer_classes unique values in the training data, then the target variable will be grouped into max_integer_classes bins. If the target variable is numeric, then the smaller of max_integer_classes and the number of unique values number of bins will be created on the target variables and the regression problem will be solved as a classification problem.

max_depth an integer specifying the maximum depth of each tree. Maximum is 40.

min_node_size an integer specifying the minimum number of training cases a leaf node must contain.

ntrees an integer specifying the number of trees in the forest.

ps an integer indicating the number of predictors to sample at each node split. Default is 0, meaning to use sqrt(p), where p is the number of predictors in the input.

max_factor_levels an integer. If any factor variables has more than max_factor_levels, the program stops and prompts the user to increase the value of this parameter if the too-many-level factor is indeed intended.

bagging_method an integer indicating the bagging sampling method: 0 for sampling without replacement; 1 for sampling with replacement (bootstrapping).

bagging_proportion a numeric scalar between 0 and 1, indicating the proportion of training observations to be used in each tree.

split_search an integer indicating the choice of the split search method. 0: randomly pick a split point; 1: do a local search; 2: random pick subject to regulation; 3: local search subject to regulation; 4 or above: a mix of options 0 to 3.

search_radius a positive integer indicating the split point search radius. This parameter takes effect only in the self-regulating local search (split_search = 2 or above).

seed a positive integer, random number generator seed.

nthreads an integer specifying the number of threads used by the program. This parameter takes effect only on systems supporting OpenMP.

vote_method an integer (0 or 1) specifying the voting method in prediction. 0: each leaf contributes the raw count and an average is taken on the sum over all leaves; 1: each leaf contributes an intra-node fraction which is then averaged over all leaves with equal weight.

na_numeric a numeric value, substitute for 'nan' in numeric variables.

na_integer an integer value, substitute for 'nan' in integer variables.

na_factor a character string, substitute for missing values in factor variables.

type a character string indicating the return content of the predict function. For a classification problem, "score" means the by-class probabilities and "class" means the class labels (i.e., the target variable levels). For regression, the predicted values are returned. This is a parameter for the predict function, not an attribute of the brif object.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

brif-1.4.7.tar.gz (27.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

brif-1.4.7-cp314-cp314-macosx_15_0_x86_64.whl (31.4 kB view details)

Uploaded CPython 3.14macOS 15.0+ x86-64

brif-1.4.7-cp313-cp313-win_amd64.whl (31.1 kB view details)

Uploaded CPython 3.13Windows x86-64

brif-1.4.7-cp312-cp312-manylinux_2_34_x86_64.whl (170.2 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

File details

Details for the file brif-1.4.7.tar.gz.

File metadata

  • Download URL: brif-1.4.7.tar.gz
  • Upload date:
  • Size: 27.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for brif-1.4.7.tar.gz
Algorithm Hash digest
SHA256 aea38506de80571044fb70a6b289b64b1f837c867e6d21ea61b60be40f0271af
MD5 d7f09ee6ddc50de184f95d2f4eab1923
BLAKE2b-256 3b9ba2630820c3957dfd526400d254717e96ff125107ea64e0c0a94616920472

See more details on using hashes here.

File details

Details for the file brif-1.4.7-cp314-cp314-macosx_15_0_x86_64.whl.

File metadata

File hashes

Hashes for brif-1.4.7-cp314-cp314-macosx_15_0_x86_64.whl
Algorithm Hash digest
SHA256 919c6a3c8d7b17fff1a144c6b345059bc26637d2edeb4b884351e3302410bb4b
MD5 083c58e199a98ad5087712fec3f307b8
BLAKE2b-256 60f948de9586cbb8b3bf48ffade8d85ae5765b2cc0622a2dddbf48cd5ce5c7c9

See more details on using hashes here.

File details

Details for the file brif-1.4.7-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: brif-1.4.7-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 31.1 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for brif-1.4.7-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 aeb63bd389b17038fa867d7d44df258a09b35e985766390a07995722622307cb
MD5 dd9a26086e2d5684707745f8c7d5e0d9
BLAKE2b-256 8e6f33a35a6f2d0ad5563fe8c68499be981cff0bd81368bbf5296dae1cf0efb3

See more details on using hashes here.

File details

Details for the file brif-1.4.7-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for brif-1.4.7-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 84f3e177314dd962f19275b4055cd57fa52d187003724e8610c082ed1f472549
MD5 47cd79a800205e70f01a6875f8ce093f
BLAKE2b-256 d6d9587d3d4521eec0f0f7bc4b9fea710e72183dc7979f75447b5907ef1f7794

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page