MAFESE: Metaheuristic Algorithm for Feature Selection - An Open Source Python Library
Project description
MAFESE (Metaheuristic Algorithms for FEature SElection) is the largest python library focused on feature selection using meta-heuristic algorithms.
- Free software: GNU General Public License (GPL) V3 license
- Total Wrapper-based (Metaheuristic Algorithms): > 170 methods
- Total FilterSelector-based (Statistical-based): > 6 methods
- Total classification dataset: > 20 datasets
- Total estimator methods: > 3 methods
- Total performance metrics (as fitness): > 10 metrics
- Documentation: https://mafese.readthedocs.io/en/latest/
- Python versions: 3.7.x, 3.8.x, 3.9.x, 3.10.x, 3.11.x
- Dependencies: numpy, scipy, scikit-learn, pandas, matplotlib, mealpy, permetrics
Installation
Install with pip
Install the current PyPI release:
$ pip install mafese==0.1.1
Install directly from source code
$ git clone https://github.com/thieu1995/mafese.git
$ cd mafese
$ python setup.py install
Lib's structure
docs
examples
mafese
wrapper
recursive.py
sequential.py
filter.py
utils
correlation.py
encoder.py
estimator.py
validator.py
__init__.py
selector.py
README.md
setup.py
Usage
After installation, you can import MAFESE as any other Python module:
$ python
>>> import mafese
>>> mafese.__version__
Let's go through some examples.
Examples
First, you need to load your dataset, or you can load own available datasets:
# Load available dataset from MAFESE
from mafese import get_dataset
# Try unknown data
get_dataset("unknown")
# Enter: 1
data = get_dataset("Arrhythmia")
# Load your own dataset
import pandas as pd
from mafese import Data
# load X and y
# NOTE mafese accepts numpy arrays only, hence the .values attribute
dataset = pd.read_csv('examples/dataset.csv', index_col=0).values
X, y = dataset[:, 0:-1], dataset[:, -1]
data = Data(X, y)
Next, split dataset into train and test set
data.split_train_test(test_size=0.2, inplace=True)
print(data.X_train[:2].shape)
print(data.y_train[:2].shape)
Next, how to use Recursive wrapper-based method:
from mafese.wrapper.recursive import RecursiveSelector
# define mafese feature selection method
feat_selector = RecursiveSelector(problem="classification", estimator="rf", n_features=5)
# find all relevant features - 5 features should be selected
feat_selector.fit(data.X_train, data.y_train)
# check selected features - True (or 1) is selected, False (or 0) is not selected
print(feat_selector.selected_feature_masks)
print(feat_selector.selected_feature_solution)
# check the index of selected features
print(feat_selector.selected_feature_indexes)
# call transform() on X to filter it down to selected features
X_train_selected = feat_selector.transform(data.X_train)
X_test_selected = feat_selector.transform(data.X_test)
Or, how to use Sequential (backward or forward) wrapper-based method:
from mafese.wrapper.sequential import SequentialSelector
# define mafese feature selection method
feat_selector = SequentialSelector(problem="classification", estimator="knn", n_features=3, direction="forward")
# find all relevant features - 5 features should be selected
feat_selector.fit(data.X_train, data.y_train)
# check selected features - True (or 1) is selected, False (or 0) is not selected
print(feat_selector.selected_feature_masks)
print(feat_selector.selected_feature_solution)
# check the index of selected features
print(feat_selector.selected_feature_indexes)
# call transform() on X to filter it down to selected features
X_train_selected = feat_selector.transform(data.X_train)
X_test_selected = feat_selector.transform(data.X_test)
Or, how to use Filter-based feature selection with different correlation methods:
from mafese.filter import FilterSelector
# define mafese feature selection method
feat_selector = FilterSelector(problem='classification', method='SPEARMAN', n_features=5)
# find all relevant features - 5 features should be selected
feat_selector.fit(data.X_train, data.y_train)
# check selected features - True (or 1) is selected, False (or 0) is not selected
print(feat_selector.selected_feature_masks)
print(feat_selector.selected_feature_solution)
# check the index of selected features
print(feat_selector.selected_feature_indexes)
# call transform() on X to filter it down to selected features
X_train_selected = feat_selector.transform(data.X_train)
X_test_selected = feat_selector.transform(data.X_test)
For more usage examples please look at examples folder.
Shortcut
To call the class
from mafese import Data, get_dataset
from mafese import SequentialSelector, RecursiveSelector, FilterSelector
Get helps (questions, problems)
-
Official source code repo: https://github.com/thieu1995/mafese
-
Official document: https://mafese.readthedocs.io/
-
Download releases: https://pypi.org/project/mafese/
-
Issue tracker: https://github.com/thieu1995/mafese/issues
-
Notable changes log: https://github.com/thieu1995/mafese/blob/master/ChangeLog.md
-
Examples with different meapy version: https://github.com/thieu1995/mafese/blob/master/examples.md
-
This project also related to our another projects which are "meta-heuristics", "neural-network", and "optimization" check it here
Want to have an instant assistant? Join our telegram community at link We share lots of information, questions, and answers there. You will get more support and knowledge there.
References
If you are using mafese in your project, we would appreciate citations:
@software{nguyen_van_thieu_2023_7969043,
author = {Nguyen Van Thieu},
title = {MAFESE: Metaheuristic Algorithm for Feature Selection - An Open Source Python Library},
month = may,
year = 2023,
publisher = {Zenodo},
doi = {10.5281/zenodo.7969042},
url = {https://github.com/thieu1995/mafese}
}
1. https://neptune.ai/blog/feature-selection-methods
2. https://www.blog.trainindata.com/feature-selection-machine-learning-with-python/
3. https://github.com/LBBSoft/FeatureSelect
4. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2754-0
5. https://github.com/scikit-learn-contrib/boruta_py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.