Skip to main content

MAFESE: Metaheuristic Algorithm for Feature Selection - An Open Source Python Library

Project description

MAFESE


GitHub release Wheel PyPI version PyPI - Python Version PyPI - Status PyPI - Downloads Downloads Tests & Publishes to PyPI GitHub Release Date Documentation Status Chat Average time to resolve an issue Percentage of issues still open GitHub contributors GitTutorial DOI License: GPL v3

MAFESE (Metaheuristic Algorithms for FEature SElection) is the largest python library focused on feature selection using meta-heuristic algorithms.

  • Free software: GNU General Public License (GPL) V3 license
  • Total Wrapper-based (Metaheuristic Algorithms): > 170 methods
  • Total Filter-based (Statistical-based): > 6 methods
  • Total classification dataset: > 20 datasets
  • Total estimator methods: > 3 methods
  • Total performance metrics (as fitness): > 10 metrics
  • Documentation: https://mafese.readthedocs.io/en/latest/
  • Python versions: 3.7.x, 3.8.x, 3.9.x, 3.10.x, 3.11.x
  • Dependencies: numpy, scipy, scikit-learn, pandas, matplotlib, mealpy, permetrics

Installation

Install with pip

Install the current PyPI release:

$ pip install mafese==0.1.0

Install directly from source code

$ git clone https://github.com/thieu1995/mafese.git
$ cd mafese
$ python setup.py install

Lib's structure

docs
examples
mafese
    wrapper
        recursive.py
        sequential.py
    filter.py
    utils
        correlation.py
        encoder.py
        estimator.py
        validator.py
    __init__.py
    selector.py
README.md
setup.py

Usage

After installation, you can import MAFESE as any other Python module:

$ python
>>> import mafese
>>> mafese.__version__

Let's go through some examples.

Examples

First, you need to load your dataset, or you can load own available datasets:

# Load available dataset from MAFESE
from mafese import get_dataset

# Try unknown data
get_dataset("unknown")
# Enter: 1

data = get_dataset("Arrhythmia")
# Load your own dataset 
import pandas as pd
from mafese import Data

# load X and y
# NOTE mafese accepts numpy arrays only, hence the .values attribute
dataset = pd.read_csv('examples/dataset.csv', index_col=0).values
X, y = dataset[:, 0:-1], dataset[:, -1]
data = Data(X, y)

Next, split dataset into train and test set

data.split_train_test(test_size=0.2, inplace=True)
print(data.X_train[:2].shape)
print(data.y_train[:2].shape)

Next, how to use Recursive wrapper-based method:

from mafese.wrapper.recursive import Recursive

# define mafese feature selection method
feat_selector = Recursive(problem="classification", estimator="rf", n_features=5)

# find all relevant features - 5 features should be selected
feat_selector.fit(data.X_train, data.y_train)

# check selected features - True (or 1) is selected, False (or 0) is not selected
print(feat_selector.selected_feature_masks)
print(feat_selector.selected_feature_solution)

# check the index of selected features
print(feat_selector.selected_feature_indexes)

# call transform() on X to filter it down to selected features
X_train_selected = feat_selector.transform(data.X_train)
X_test_selected = feat_selector.transform(data.X_test)

Or, how to use Sequential wrapper-based method:

from mafese.wrapper.sequential import Sequential

# define mafese feature selection method
feat_selector = Sequential(problem="classification", estimator="knn", n_features=3, direction="forward")

# find all relevant features - 5 features should be selected
feat_selector.fit(data.X_train, data.y_train)

# check selected features - True (or 1) is selected, False (or 0) is not selected
print(feat_selector.selected_feature_masks)
print(feat_selector.selected_feature_solution)

# check the index of selected features
print(feat_selector.selected_feature_indexes)

# call transform() on X to filter it down to selected features
X_train_selected = feat_selector.transform(data.X_train)
X_test_selected = feat_selector.transform(data.X_test)

Or, how to use Filter-based feature selection with different correlation methods:

from mafese.filter import Filter

# define mafese feature selection method
feat_selector = Filter(problem='classification', method='SPEARMAN', n_features=5)

# find all relevant features - 5 features should be selected
feat_selector.fit(data.X_train, data.y_train)

# check selected features - True (or 1) is selected, False (or 0) is not selected
print(feat_selector.selected_feature_masks)
print(feat_selector.selected_feature_solution)

# check the index of selected features
print(feat_selector.selected_feature_indexes)

# call transform() on X to filter it down to selected features
X_train_selected = feat_selector.transform(data.X_train)
X_test_selected = feat_selector.transform(data.X_test)

For more usage examples please look at examples folder.

Get helps (questions, problems)

Want to have an instant assistant? Join our telegram community at link We share lots of information, questions, and answers there. You will get more support and knowledge there.

References

1. https://neptune.ai/blog/feature-selection-methods
https://github.com/LBBSoft/FeatureSelect
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2754-0

https://github.com/scikit-learn-contrib/boruta_py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mafese-0.1.0.tar.gz (2.4 MB view hashes)

Uploaded Source

Built Distribution

mafese-0.1.0-py3-none-any.whl (2.5 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page