Skip to main content

Relevance, Redundancy, and Complementarity Trade-off, a robust feature selection algorithm.

Project description

RRCT

Relevance, Redundancy, and Complementarity Trade-off, a robust feature selection algorithm (Python version).


This algorithm is a computationally efficient, robust approach for feature selection. The algorithm can be thought of as a natural extension to the popular mRMR feature selection algorithm, given that RRCT explicitly takes into account relevance and redundancy (like mRMR), and also introduces an additional third term to account for conditional relevance (also known as complementarity).

The RRCT algorithm is computationally very efficient and can run within a few seconds including on massive datasets with thousands of features. Moreover, it can serve as a useful 'off-the-shelf' feature selection algorithm because it generalizes well on both regression and classification problems, also without needing further adjusting for mixed-type variables.

R. Ibraheem is the author of this implementation and also maintains this package; this implementation is based on the earlier Python implementation by A. Tsanas (associated to below mentioned publication) that can be found in https://github.com/ThanasisTsanas/RRCT.


Class description

RRCTFeatureSelection(K=None, scale_feature=False)

  1. Parameter:
    • K, non-zero positive integer to specify the number of selected features. Default value is K=None, which means all features will be selected.
    • scale_feature, a boolean. Set to False if your features have been standardized (mean of 0 and standard deviation of 1), otherwise set to True and the features will be standardized before applying the algorithm. Default value is False.
  2. Attributes:
    • selected_feature_indices_, an array of indices corresponding to the indices of selected features
    • rrct_values_, a dictionary containing the relevance, redundancy, complementarity, and RRCT metrics of the selected features
  3. Methods:
    • apply(X=X, y=y, verbose=0), apply the RRCT algorithm to a given training set X, y where X is an n by m numpy array of features, and y is an n by 1 numpy array of target values. verbose, a non-negative integer, controls the verbosity of output
    • select(X=X), select features from a given design matrix X based on the results of the application of RRCT algorithm
    • apply_select(X=X, y=y, verbose=0), apply RRCT algorithm to a given training set X, y and then select features from X.

Installation

pip install rrct

Example

# import the RRCT feature selection object
from rrct import RRCTFeatureSelection

# RRCT with K=20
selector = RRCTFeatureSelection(K=20, scale_feature=False)

# Apply RRCT to a training set X, y
selector.apply(X=X, y=y)

# Select features from X
X_selected = selector.select(X=X)

# Alternatively, apply_select can be called, which applies RRCT and select features from  X
X_selected = selector.apply_select(X=X, y=y)

# Get the selected feature indices
selector.selected_feature_indices_

# Get the summary of the RRCT metrics
selector.rrct_values_

Reference

A. Tsanas: "Relevance, redundancy and complementarity trade-off (RRCT): a generic, efficient, robust feature selection tool", Patterns, Vol. 3:100471, 2022 https://doi.org/10.1016/j.patter.2022.100471

R. Ibraheem is a PhD student of EPSRC's MAC-MIGS Centre for Doctoral Training and he is hosted by the University of Edinburgh. MAC-MIGS is supported by the UK's Engineering and Physical Science Research Council (grant number EP/S023291/1). R. Ibraheem is supervised by Dr G. dos Reis. R. Ibraheem further contact points: LinkedIn, ORCID, GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rrct-1.0.5.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

rrct-1.0.5-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file rrct-1.0.5.tar.gz.

File metadata

  • Download URL: rrct-1.0.5.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for rrct-1.0.5.tar.gz
Algorithm Hash digest
SHA256 af4f237e745f0d05e5719f0dcb2b2b48a49ae4aee10f427baebe49fc51ebef22
MD5 e22aacf2037a20075d4702c596d300a6
BLAKE2b-256 4e10afcbf22119bf851509831db00b0bf8ecaa5ae32eacd806eb4fa2399f0426

See more details on using hashes here.

File details

Details for the file rrct-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: rrct-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 18.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for rrct-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 6d50e6c80eae8a6e03902bf189934cdb2d270ae949fa587bcd1b4f1c1326a6ce
MD5 209897fb7d660f67f0253c3f0135942f
BLAKE2b-256 7d59e1f0f23cdced4bfdd0a8c31f350c57c67446783c4bc5f805e5b2c6490778

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page