Skip to main content

No project description provided

Project description

Parallelized Mutual Information based Feature Selection module.

Related blog post here

Dependencies

  • scipy(>=0.17.0)

  • numpy(>=1.10.4)

  • scikit-learn(>=0.17.1)

  • bottleneck(>=1.1.0)

How to use

Download, import and do as you would with any other scikit-learn method:

  • fit(X, y)

  • transform(X)

  • fit_transform(X, y)

Description

MIFS stands for Mutual Information based Feature Selection. This class contains routines for selecting features using both continuous and discrete y variables. Three selection algorithms are implemented: JMI, JMIM and MRMR.

This implementation tries to mimic the scikit-learn interface, so use fit, transform or fit_transform, to run the feature selection.

See examples/example.py for well examples and usage.

Docs

Parameters

method : string, default = ‘JMI’:

> Which mutual information based feature selection method to use:
> * 'JMI' : Joint Mutual Information [1]
> * 'JMIM' : Joint Mutual Information Maximisation [2]
> * 'MRMR' : Max-Relevance Min-Redundancy [3]

k : int, default = 5:

> Sets the number of samples to use for the kernel density estimation with the kNN method. Kraskov et al. recommend a small integer between 3 and 10.

n_features : int or string, default = ‘auto’:

> If int, it sets the number of features that has to be selected from X. If 'auto' this is determined automatically based on the amount of mutual information the previously selected features share with y.

categorical : Boolean, default = True:

> If True, y is assumed to be a categorical class label. If False, y is treated as a continuous. Consequently this parameter determines the method of estimation of the MI between the predictors in X and y.

verbose : int, default=0:

> Controls verbosity of output:
> * 0: no output
> * 1: displays selected features
> * 2: displays selected features and mutual information

Attributes

n_features : int:

> The number of selected features.

support : array of length [number of features in X]:

> The mask array of selected features.

ranking : array of shape [n_features]:

> The feature ranking of the selected features, with the first being the first feature selected with largest marginal MI with y, followed by the others with decreasing MI.

mi : array of shape n_features:

> The JMIM of the selected features. Usually this a monotone decreasing array of numbers converging to 0. One can use this to estimate the number of features to select. In fact this is what n_features='auto' tries to do heuristically.

Examples

The following example illustrates the use of the package:

import pandas as pd
import mifs

# load X and y
X = pd.read_csv('my_X_table.csv', index_col=0).values
y = pd.read_csv('my_y_vector.csv', index_col=0).values

# define MI_FS feature selection method
feat_selector = mifs.MutualInformationFeatureSelector()

# find all relevant features
feat_selector.fit(X, y)

# check selected features
feat_selector._support_mask

# check ranking of features
feat_selector.ranking_

# call transform() on X to filter it down to selected features
X_filtered = feat_selector.transform(X)

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mifspy-0.0.1b2.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

mifspy-0.0.1b2-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file mifspy-0.0.1b2.tar.gz.

File metadata

  • Download URL: mifspy-0.0.1b2.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.0

File hashes

Hashes for mifspy-0.0.1b2.tar.gz
Algorithm Hash digest
SHA256 0d61c649022421b02296638df04c1510fa14d840c9ed7cdcd62d8367c382c783
MD5 d5215c5e91a28a4fc0c6c2b8b3b82687
BLAKE2b-256 639c9f800b10f42df04e9b63efc44e9e692a2766e3ec35fb2c66f4057a9821f2

See more details on using hashes here.

File details

Details for the file mifspy-0.0.1b2-py3-none-any.whl.

File metadata

  • Download URL: mifspy-0.0.1b2-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.0

File hashes

Hashes for mifspy-0.0.1b2-py3-none-any.whl
Algorithm Hash digest
SHA256 6131910485fa4ee6769cb565a44fb32c73aeef4d9503d0ed2a17cbe5cb5218b4
MD5 90e206d946e5648e6e1c9de926de6c45
BLAKE2b-256 258f0610c959f6c58f2298f8398c8a10f5a11d93e8977fa508f411a7a5ceed0d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page