Skip to main content

Feature Selection by Sparse Forests

Project description

ControlBurn v0.1.3

This package implements ControlBurn in python. ControlBurn is a feature selection algorithm that uses a weighted LASSO-based feature selection to prune unnecessary features from tree ensembles. The algorithm is efficient and only requires a single training iteration to run.

Tree ensembles distribute feature importance scores evenly amongst groups of correlated features. The average feature ranking of the correlated group is suppressed, which reduces interpretability and complicates feature selection. Like the linear LASSO, ControlBurn assigns all the feature importance of a correlated group of features to a single feature. The algorithm is able to quickly select a subset of important independent features for further analysis.

Installation

The easiest way to install ControlBurn is through pip.

!pip install ControlBurn==0.1.3

Dependencies

ControlBurn works on python 3.7 or above. The following packages are required.

  • numpy (1.20.1)
  • pandas (1.2.4)
  • sklearn (0.24.1)
  • mosek (9.2.47)
  • cvxpy (1.1.13)

Quick Start

from ControlBurn.ControlBurnModel import ControlBurnClassifier
cb = ControlBurnClassifier(alpha = 0.1)
cb.fit(X,y)
print(cb.features_selected_) #print selected features
print(cb.feature_importances_) #print feature importances

pred = cb.predict(X) #return predictions of polished model using selected features

Reference Paper

ControlBurn: Feature Selection by Sparse Forests B. Liu, M. Xie, and M. Udell
ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2021

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ControlBurn-0.1.3.tar.gz (16.8 kB view hashes)

Uploaded Source

Built Distribution

ControlBurn-0.1.3-py3-none-any.whl (16.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page