Skip to main content

Feature Selection by Sparse Forests

Project description

ControlBurn v0.1.3

This package implements ControlBurn in python. ControlBurn is a feature selection algorithm that uses a weighted LASSO-based feature selection to prune unnecessary features from tree ensembles. The algorithm is efficient and only requires a single training iteration to run.

Tree ensembles distribute feature importance scores evenly amongst groups of correlated features. The average feature ranking of the correlated group is suppressed, which reduces interpretability and complicates feature selection. Like the linear LASSO, ControlBurn assigns all the feature importance of a correlated group of features to a single feature. The algorithm is able to quickly select a subset of important independent features for further analysis.

Installation

The easiest way to install ControlBurn is through pip.

!pip install ControlBurn==0.1.3

Dependencies

ControlBurn works on python 3.7 or above. The following packages are required.

  • numpy (1.20.1)
  • pandas (1.2.4)
  • sklearn (0.24.1)
  • mosek (9.2.47)
  • cvxpy (1.1.13)

Quick Start

from ControlBurn.ControlBurnModel import ControlBurnClassifier
cb = ControlBurnClassifier(alpha = 0.1)
cb.fit(X,y)
print(cb.features_selected_) #print selected features
print(cb.feature_importances_) #print feature importances

pred = cb.predict(X) #return predictions of polished model using selected features

Reference Paper

ControlBurn: Feature Selection by Sparse Forests B. Liu, M. Xie, and M. Udell
ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2021

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ControlBurn-0.1.3.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

ControlBurn-0.1.3-py3-none-any.whl (16.8 kB view details)

Uploaded Python 3

File details

Details for the file ControlBurn-0.1.3.tar.gz.

File metadata

  • Download URL: ControlBurn-0.1.3.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.13

File hashes

Hashes for ControlBurn-0.1.3.tar.gz
Algorithm Hash digest
SHA256 ad5fe989acebc025ac58496cb4b5f8def20ba5fa90a0d48d9dcee2358aa9324b
MD5 8367ea77c924f72bccf7377c11ece441
BLAKE2b-256 7d1a7d10a29bff6cadc33ed3159b9136003a197be96501120e86cab240e29909

See more details on using hashes here.

File details

Details for the file ControlBurn-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: ControlBurn-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 16.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.13

File hashes

Hashes for ControlBurn-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 12d6594091795b3f30f9f5509337081babe999e1944404ab4968f12c4dec183e
MD5 ad22ae6cec5635d63f4651638fd3a456
BLAKE2b-256 7dc426a6b88cb5ef4f9c9a2bef40e6453cee5025b541ef8158a6791805c187ed

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page