Skip to main content

Feature relevance interval method

Project description

Feature relevance intervals

Build Status Coverage Status Binder DOI

This repo contains the python implementation of the all-relevant feature selection method described in the corresponding publications[1,2].

Try out the online demo notebook here.

Example output of method for biomedical dataset

Installation

The library needs various dependencies which should automatically be installed. We highly recommend the Anaconda Python distribution to provide all dependencies. The library was written with Python 3 in mind and due to the foreseeable ending of Python 2 support, backwards compatibility is not planned.

If you just want to use the stable version from PyPi use

$ pip install fri

To install the module in development clone the repo and execute:

$ python setup.py install

Testing

To test if the library was installed correctly you can use the pytest command to run all included tests.

$ pip install pytest

then run in the root directory:

$ pytest

Usage

Examples and API descriptions can be found here.

In general, the library follows the sklearn API format. The two important classes exposed to the user are

FRIClassification

and

FRIRegression

depending on your data type.

Parameters

C : float, optional

Set a fixed regularization parameter. If None, value will automatically be determined using GridSearch.

random_state : int seed, RandomState instance, or None (default=None)

The seed of the pseudo random number generator to use when shuffling the data.

shadow_features : boolean, default = True

Use shuffled contrast features for each real feature as a baseline correction. Each feature gets shuffled independently and feature relevance bounds computed on these contrast features. Leads to a more sparse output but still has some ploblems with very sparse binary features, which lead to a better "random" distribution. Increase n_resampling when having problems with this.

parallel : boolean, default = False

Uses multiprocessing with all available cores when enabled to compute relevance bounds in parallel.

n_resampling : int, default = 3

Number of contrast features which get computed per features. Results are averaged to reduce problems on some sparse input features.

Regression specific

epsilon : float, optional

Controls size of epsilon tube around initial SVR Model. By default, value is set using hyperparameter optimization.

Attributes

n_features_ : int

The number of selected features.

allrel_prediction_ : array of shape [n_features]

The mask of selected features. Includes all relevant ones.

ranking_ : array of shape [n_features]

The feature ranking, such that ranking_[i] corresponds to the ranking position of the i-th feature. Selected (i.e., estimated best) features are assigned rank 1 and tentative features are assigned rank 2.

Examples

# ## Classification data
from fri import genClassificationData
X,y = genClassificationData(n_samples=100, n_features=6,n_strel=2, n_redundant=2,
                    n_repeated=0, flip_y=0)

# We created a binary classification set with 6 features of which 2 are strongly relevant and 2 weakly relevant.

# Scale Data
from sklearn.preprocessing import StandardScaler
X_scaled = StandardScaler().fit_transform(X)

# New object for Classification Data
from fri import FRIClassification
fri_model = FRIClassification()

# Fit to data
fri_model.fit(X_scaled,y)

# Print out feature relevance intervals
print(fri_model.interval_)

# ### Plot results
from fri import plot
plot.plotIntervals(fri_model.interval_)

# ### Print internal Parameters

print(fri_model.allrel_prediction_)

# Print out hyperparameter found by GridSearchCV
print(fri_model._hyper_C)
# Get weights for linear models used for each feature optimization

print(fri_model._omegas)

References

[1] Göpfert C, Pfannschmidt L, Hammer B. Feature Relevance Bounds for Linear Classification. In: Proceedings of the ESANN. 25th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning; Accepted. https://pub.uni-bielefeld.de/publication/2908201

[2] Göpfert C, Pfannschmidt L, Göpfert JP, Hammer B. Interpretation of Linear Classifiers by Means of Feature Relevance Bounds. Neurocomputing. Accepted. https://pub.uni-bielefeld.de/publication/2915273

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fri-3.2.0.tar.gz (37.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page