Skip to main content

Feature relevance interval method

Project description

Feature relevance intervals

Build Status Coverage Status Binder DOI

This repo contains the python implementation of the all-relevant feature selection method described in the corresponding publications[1,2].

Try out the online demo notebook here.

Example output of method for biomedical dataset

Installation

The library needs various dependencies which should automatically be installed. We highly recommend the Anaconda Python distribution to provide all dependencies. The library was written with Python 3 in mind and due to the foreseeable ending of Python 2 support, backwards compatibility is not planned.

If you just want to use the stable version from PyPi use

$ pip install fri

To install the module in development clone the repo and execute:

$ python setup.py install

Testing

To test if the library was installed correctly you can use the pytest command to run all included tests.

$ pip install pytest

then run in the root directory:

$ pytest

Usage

Examples and API descriptions can be found here.

In general, the library follows the sklearn API format. The two important classes exposed to the user are

FRIClassification

and

FRIRegression

depending on your data type.

Parameters

C : float, optional

Set a fixed regularization parameter. If None, value will automatically be determined using GridSearch.

random_state : int seed, RandomState instance, or None (default=None)

The seed of the pseudo random number generator to use when shuffling the data.

parallel : boolean, default = False

Uses multiprocessing with all available cores when enabled to compute relevance bounds in parallel.

n_resampling : int, default = 3

Number of contrast features which get computed per features. Results are averaged to reduce problems on some sparse input features.

Regression specific

epsilon : float, optional

Controls size of epsilon tube around initial SVR Model. By default, value is set using hyperparameter optimization.

Attributes

n_features_ : int

The number of selected features.

allrel_prediction_ : array of shape [n_features]

The mask of selected features. Includes all relevant ones.

Examples

# ## Classification data
from fri import genClassificationData
X,y = genClassificationData(n_samples=100, n_features=6,n_strel=2, n_redundant=2,
                    n_repeated=0, flip_y=0)

# We created a binary classification set with 6 features of which 2 are strongly relevant and 2 weakly relevant.

# Scale Data
from sklearn.preprocessing import StandardScaler
X_scaled = StandardScaler().fit_transform(X)

# New object for Classification Data
from fri import FRIClassification
fri_model = FRIClassification()

# Fit to data
fri_model.fit(X_scaled,y)

# Print out feature relevance intervals
print(fri_model.interval_)

# ### Plot results
from fri import plot
plot.plotIntervals(fri_model.interval_)

# ### Print internal Parameters

print(fri_model.allrel_prediction_)

# Print out hyperparameter found by GridSearchCV
print(fri_model._hyper_C)
# Get weights for linear models used for each feature optimization

print(fri_model._omegas)

References

[1] Göpfert C, Pfannschmidt L, Hammer B. Feature Relevance Bounds for Linear Classification. In: Proceedings of the ESANN. 25th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning; Accepted. https://pub.uni-bielefeld.de/publication/2908201

[2] Göpfert C, Pfannschmidt L, Göpfert JP, Hammer B. Interpretation of Linear Classifiers by Means of Feature Relevance Bounds. Neurocomputing. Accepted. https://pub.uni-bielefeld.de/publication/2915273

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
fri-3.3.1.tar.gz (36.6 kB) Copy SHA256 hash SHA256 Source None Oct 4, 2018

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page