Feature relevance interval method
Project description
Feature relevance intervals
This repo contains the python implementation of the all-relevant feature selection method described in the corresponding publications[1,2].
Try out the online demo notebook here.
Installation
The library needs various dependencies which should automatically be installed. We highly recommend the Anaconda Python distribution to provide all dependencies. The library was written with Python 3 in mind and due to the foreseeable ending of Python 2 support, backwards compatibility is not planned.
If you just want to use the stable version from PyPi use
$ pip install fri
To install the module in development clone the repo and execute:
$ python setup.py install
Testing
To test if the library was installed correctly you can use the pytest
command to run all included tests.
$ pip install pytest
then run in the root directory:
$ pytest
Usage
Examples and API descriptions can be found here.
In general, the library follows the sklearn API format. The two important classes exposed to the user are
FRIClassification
and
FRIRegression
depending on your data type.
Parameters
C : float, optional
Set a fixed regularization parameter. If None, value will automatically be determined using GridSearch.
random_state : int seed, RandomState instance, or None (default=None)
The seed of the pseudo random number generator to use when shuffling the data.
parallel : boolean, default = False
Uses multiprocessing with all available cores when enabled to compute relevance bounds in parallel.
n_resampling : int, default = 3
Number of contrast features which get computed per features. Results are averaged to reduce problems on some sparse input features.
Regression specific
epsilon : float, optional
Controls size of epsilon tube around initial SVR Model. By default, value is set using hyperparameter optimization.
Attributes
n_features_ : int
The number of selected features.
allrel_prediction_ : array of shape [n_features]
The mask of selected features. Includes all relevant ones.
Examples
# ## Classification data
from fri import genClassificationData
X,y = genClassificationData(n_samples=100, n_features=6,n_strel=2, n_redundant=2,
n_repeated=0, flip_y=0)
# We created a binary classification set with 6 features of which 2 are strongly relevant and 2 weakly relevant.
# Scale Data
from sklearn.preprocessing import StandardScaler
X_scaled = StandardScaler().fit_transform(X)
# New object for Classification Data
from fri import FRIClassification
fri_model = FRIClassification()
# Fit to data
fri_model.fit(X_scaled,y)
# Print out feature relevance intervals
print(fri_model.interval_)
# ### Plot results
from fri import plot
plot.plotIntervals(fri_model.interval_)
# ### Print internal Parameters
print(fri_model.allrel_prediction_)
# Print out hyperparameter found by GridSearchCV
print(fri_model._hyper_C)
# Get weights for linear models used for each feature optimization
print(fri_model._omegas)
References
[1] Göpfert C, Pfannschmidt L, Hammer B. Feature Relevance Bounds for Linear Classification. In: Proceedings of the ESANN. 25th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning; Accepted. https://pub.uni-bielefeld.de/publication/2908201
[2] Göpfert C, Pfannschmidt L, Göpfert JP, Hammer B. Interpretation of Linear Classifiers by Means of Feature Relevance Bounds. Neurocomputing. Accepted. https://pub.uni-bielefeld.de/publication/2915273
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.