Skip to main content

A Python Package for Feature Selection

Project description

Py_FS: A Python Package for Feature Selection

Py_FS is a toolbox developed with complete focus on Feature Selection (FS) using Python as the underlying programming language. It comes with capabilities like nature-inspired evolutionary feature selection algorithms, filter methods and simple evaulation metrics to help with easy applications and comparisons among different feature selection algorithms over different datasets. It is still in the development phase. We wish to extend this package further to contain more extensive set of feature selection procedures and corresponding utilities.

[UPDATE!] Py_FS now provides access to 30 popular pre-processed datasets used for feature selection. Please find the list of the datasets in the following link: Py_FS database

Please cite this paper if you are using Py_FS:

Guha, R., Chatterjee, B., Khalid Hassan, S. K., Ahmed, S., Bhattacharyya, T., & Sarkar, R. (2022). Py_FS: A Python Package for Feature Selection using Meta-heuristic Optimization Algorithms. In Computational Intelligence in Pattern Recognition (pp. 495-504). Springer, Singapore. DOI: https://link.springer.com/chapter/10.1007/978-981-16-2543-5_42


Installation

Please install the required utilities for the package by running this piece of code:

pip3 install -r requirements.txt

The package is publicly avaliable at PYPI: Python Package Index. Anybody willing to use the package can install it by simply running:

pip3 install Py-FS

If you are using an older version of the package and want to update it, use:

pip3 install -U Py-FS

Note

Py_FS uses numpy arrays of numbers to process the datasets. So, please convert the datasets to numpy arrays and exclude any kind of string (like table headings) before using Py_FS for feature selection. In future, we may add a preprocessing stage before feeding the dataset to Py_FS modules to allow data frames and other formats to make it easier for the users. But, currently Py_FS provides no such support.

Structure

The current structure of the package is mentioned below. Depending on the level of the function intended to call, it should be imported using the period(.) hierarchy.

Py_FS

For example, if someone wants to use GA, it should be imported using the following statement:

from Py_FS.wrapper.nature_inspired import GA

There are mainly three utilities in the current version of the package. The next part discusses these three sections in detail:

Quick User Guide

For a quick demonstration of the process of using Py_FS, please proceed to this Colab link: Py_FS: Demonstration.

References

This toolbox has been developed by a team of students from Computer Science and Engineering department, Jadavpur University supervised by Prof. Ram Sarkar. This team has participated in many research activities related to engineering optimization, feature selection and image processing. We request the users of Py_FS to cite the relevant articles from our group. It will mean a lot to us. The articles produced by this team are mentioned below:

Wrappers

GA

  • Guha, R., Ghosh, M., Kapri, S., Shaw, S., Mutsuddi, S., Bhateja, V., & Sarkar, R. (2019). Deluge based Genetic Algorithm for feature selection. Evolutionary intelligence, 1-11.

  • Ghosh, M., Guha, R., Mondal, R., Singh, P. K., Sarkar, R., & Nasipuri, M. (2018). Feature selection using histogram-based multi-objective GA for handwritten Devanagari numeral recognition. In Intelligent engineering informatics (pp. 471-479). Springer, Singapore.

  • Guha, R., Ghosh, M., Singh, P. K., Sarkar, R., & Nasipuri, M. (2019). M-HMOGA: a new multi-objective feature selection algorithm for handwritten numeral classification. Journal of Intelligent Systems, 29(1), 1453-1467.

  • Ghosh, M., Guha, R., Singh, P. K., Bhateja, V., & Sarkar, R. (2019). A histogram based fuzzy ensemble technique for feature selection. Evolutionary Intelligence, 12(4), 713-724.

  • Ghosh, M., Guha, R., Alam, I., Lohariwal, P., Jalan, D., & Sarkar, R. (2019). Binary Genetic Swarm Optimization: A Combination of GA and PSO for Feature Selection. Journal of Intelligent Systems, 29(1), 1598-1610.

  • Guha, R., Khan, A.H., Singh, P.K. et al. CGA: a new feature selection model for visual human action recognition. Neural Comput & Applic (2020). https://doi.org/10.1007/s00521-020-05297-5

GSA

  • Guha, R., Ghosh, M., Chakrabarti, A., Sarkar, R., & Mirjalili, S. (2020). Introducing clustering based population in Binary Gravitational Search Algorithm for Feature Selection. Applied Soft Computing, 106341.

GWO

  • Dhargupta, S., Ghosh, M., Mirjalili, S., & Sarkar, R. (2020). Selective opposition based grey wolf optimization. Expert Systems with Applications, 113389.

MA

  • T. Bhattacharyya, B. Chatterjee, P. K. Singh, J. H. Yoon, Z. W. Geem and R. Sarkar, "Mayfly in Harmony: A New Hybrid Meta-heuristic Feature Selection Algorithm," in IEEE Access, doi: 10.1109/ACCESS.2020.3031718

PSO

  • Ghosh, M., Guha, R., Singh, P. K., Bhateja, V., & Sarkar, R. (2019). A histogram based fuzzy ensemble technique for feature selection. Evolutionary Intelligence, 12(4), 713-724.

  • Ghosh, M., Guha, R., Alam, I., Lohariwal, P., Jalan, D., & Sarkar, R. (2019). Binary Genetic Swarm Optimization: A Combination of GA and PSO for Feature Selection. Journal of Intelligent Systems, 29(1), 1598-1610.

WOA

Filters

Overall

  • Ghosh, K. K., Begum, S., Sardar, A., Adhikary, S., Ghosh, M., Kumar, M., & Sarkar, R. (2021). Theoretical and empirical analysis of filter ranking methods: Experimental study on benchmark DNA microarray data. Expert Systems with Applications, 169, 114485.

PCC

  • Guha, R., Ghosh, K. K., Bhowmik, S., & Sarkar, R. (2020, February). Mutually Informed Correlation Coefficient (MICC)-a New Filter Based Feature Selection Method. In 2020 IEEE Calcutta Conference (CALCON) (pp. 54-58). IEEE.

MI

  • Guha, R., Ghosh, K. K., Bhowmik, S., & Sarkar, R. (2020, February). Mutually Informed Correlation Coefficient (MICC)-a New Filter Based Feature Selection Method. In 2020 IEEE Calcutta Conference (CALCON) (pp. 54-58). IEEE.

1. Wrapper-based Nature-inspired Feature Selection

Wrapper-based Nature-inspired methods are very popular feature selection approaches due to their efficiency and simplicity. These methods progress by introducing random set of candidate solutions (agents which are natural elements like particles, whales, bats etc.) and improving these solutions gradually by using guidance mechanisms of fitter agents. In order to calculate the fitness of the candidate solutions, wrappers require some learning algorithm (like classifiers) to calculate the worth of a solution at every iteration. This makes wrapper methods extremely reliable but computationally expensive as well.

Py_FS currently supports the following 12 wrapper-based FS methods:

  • Binary Bat Algorithm (BBA)
  • Cuckoo Search Algorithm (CS)
  • Equilibrium Optimizer (EO)
  • Genetic Algorithm (GA)
  • Gravitational Search Algorithm (GSA)
  • Grey Wolf Optimizer (GWO)
  • Harmony Search (HS)
  • Mayfly Algorithm (MA)
  • Particle Swarm Optimization (PSO)
  • Red Deer Algorithm (RDA)
  • Sine Cosine Algorithm (SCA)
  • Whale Optimization Algorithm (WOA)

These wrapper approaches can be imported in your code using the following statements:

from Py_FS.wrapper.nature_inspired import BBA
from Py_FS.wrapper.nature_inspired import CS
from Py_FS.wrapper.nature_inspired import EO
from Py_FS.wrapper.nature_inspired import GA
from Py_FS.wrapper.nature_inspired import GSA
from Py_FS.wrapper.nature_inspired import GWO
from Py_FS.wrapper.nature_inspired import HS
from Py_FS.wrapper.nature_inspired import MA
from Py_FS.wrapper.nature_inspired import PSO
from Py_FS.wrapper.nature_inspired import RDA
from Py_FS.wrapper.nature_inspired import SCA
from Py_FS.wrapper.nature_inspired import WOA

2. Filter-based Feature Selection

Filter methods do not use any intermediate learning algorithm to verify the strength of the generated solutions. Instead, they use statistical measures to identify the importance of different features in the context. So, finally every feature gets a rank according to their relevance in the dataset. The top features can then be used for classification.

Py_FS currently supports the following 4 filter-based FS methods:

  • Pearson Correlation Coefficient (PCC)
  • Spearman Correlation Coefficient (SCC)
  • Relief
  • Mutual Information (MI)

These filter approaches can be imported in your code using the following statements:

from Py_FS.filter import PCC
from Py_FS.filter import SCC
from Py_FS.filter import Relief
from Py_FS.filter import MI

3. Evaluation Metrics

The package comes with tools to evaluate features before or after FS. This helps to easily compare and analyze performances of different FS procedures.

Py_FS currently supports the following evaluation metrics:

  • classification accuracy
  • average recall
  • average precision
  • average f1 score
  • confusion matrix
  • confusion graph

The evaulation capabilities can be imported in your code using the following statement:

from Py_FS.evaluation import evaluate

User Manual

For detailed user guidelines, please access this user manual: Py_FS User Manual

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Py_FS-0.2.1.tar.gz (10.7 MB view hashes)

Uploaded Source

Built Distribution

Py_FS-0.2.1-py3-none-any.whl (9.5 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page