Feature Selection by Sparse Forests
Project description
ControlBurn v0.0.3
This package implements ControlBurn in python. ControlBurn is a feature selection algorithm that uses a weighted LASSO-based feature selection to prune unnecessary features from tree ensembles. The algorithm is efficient and only requires a single training iteration to run.
Tree ensembles distribute feature importance scores evenly amongst groups of correlated features. The average feature ranking of the correlated group is suppressed, which reduces interpretability and complicates feature selection. Like the linear LASSO, ControlBurn assigns all the feature importance of a correlated group of features to a single feature. The algorithm is able to quickly select a subset of important independent features for further analysis.
Installation
The easiest way to install ControlBurn is through pip.
!pip install ControlBurn== 0.0.3
Dependencies
ControlBurn works on python 3.7 or above. The following packages are required.
- numpy (1.20.1)
- pandas (1.2.4)
- sklearn (0.24.1)
- mosek (9.2.47)
- cvxpy (1.1.13)
Quick Start
from ControlBurn.ControlBurn import ControlBurnClassifier
cb = ControlBurnClassifier(alpha = 0.1, solver = 'SCS')
cb.fit(X,y)
print(cb.features_selected_) #print selected features
print(cb.feature_importances_) #print feature importances
pred = cb.predict(X) #return predictions of polished model using selected features
Reference Paper
ControlBurn: Feature Selection by Sparse Forests B. Liu, M. Xie, and M. Udell
ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2021
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ControlBurn-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 350fdf5fedb5a72ea7b9e8565892a821000fe8896179ef45fb8b373169dd3940 |
|
MD5 | 6444156c173b1b65548e76463011ed78 |
|
BLAKE2b-256 | 578ced084ff4d9f48a42c165d52e10b02ecc6c39b0db5891cef2dd7b535141a9 |