A Python Package for Feature Selection
Project description
Py_FS: A Python Package for Feature Selection
Py_FS is a toolbox developed with complete focus on Feature Selection (FS) using Python as the underlying programming language. It comes with capabilities like nature-inspired evolutionary feature selection algorithms, filter methods and simple evaulation metrics to help with easy applications and comparisons among different feature selection algorithms over different datasets. It is still in the development phase. We wish to extend this package further to contain more extensive set of feature selection procedures and corresponding utilities.
[UPDATE!] Py_FS now provides access to 30 popular pre-processed datasets used for feature selection. Please find the list of the datasets in the following link: Py_FS database
Please cite this paper if you are using Py_FS:
Guha, R., Chatterjee, B., Khalid Hassan, S. K., Ahmed, S., Bhattacharyya, T., & Sarkar, R. (2022). Py_FS: A Python Package for Feature Selection using Meta-heuristic Optimization Algorithms. In Computational Intelligence in Pattern Recognition (pp. 495-504). Springer, Singapore. DOI: https://link.springer.com/chapter/10.1007/978-981-16-2543-5_42
Installation
Please install the required utilities for the package by running this piece of code:
pip3 install -r requirements.txt
The package is publicly avaliable at PYPI: Python Package Index. Anybody willing to use the package can install it by simply running:
pip3 install Py-FS
If you are using an older version of the package and want to update it, use:
pip3 install -U Py-FS
Note
Py_FS uses numpy arrays of numbers to process the datasets. So, please convert the datasets to numpy arrays and exclude any kind of string (like table headings) before using Py_FS for feature selection. In future, we may add a preprocessing stage before feeding the dataset to Py_FS modules to allow data frames and other formats to make it easier for the users. But, currently Py_FS provides no such support.
Structure
The current structure of the package is mentioned below. Depending on the level of the function intended to call, it should be imported using the period(.) hierarchy.
Py_FS
For example, if someone wants to use GA, it should be imported using the following statement:
from Py_FS.wrapper.nature_inspired import GA
There are mainly three utilities in the current version of the package. The next part discusses these three sections in detail:
Quick User Guide
For a quick demonstration of the process of using Py_FS, please proceed to this Colab link: Py_FS: Demonstration.
References
This toolbox has been developed by a team of students from Computer Science and Engineering department, Jadavpur University supervised by Prof. Ram Sarkar. This team has participated in many research activities related to engineering optimization, feature selection and image processing. We request the users of Py_FS to cite the relevant articles from our group. It will mean a lot to us. The articles produced by this team are mentioned below:
Wrappers
GA
-
Guha, R., Ghosh, M., Kapri, S., Shaw, S., Mutsuddi, S., Bhateja, V., & Sarkar, R. (2019). Deluge based Genetic Algorithm for feature selection. Evolutionary intelligence, 1-11.
-
Ghosh, M., Guha, R., Mondal, R., Singh, P. K., Sarkar, R., & Nasipuri, M. (2018). Feature selection using histogram-based multi-objective GA for handwritten Devanagari numeral recognition. In Intelligent engineering informatics (pp. 471-479). Springer, Singapore.
-
Guha, R., Ghosh, M., Singh, P. K., Sarkar, R., & Nasipuri, M. (2019). M-HMOGA: a new multi-objective feature selection algorithm for handwritten numeral classification. Journal of Intelligent Systems, 29(1), 1453-1467.
-
Ghosh, M., Guha, R., Singh, P. K., Bhateja, V., & Sarkar, R. (2019). A histogram based fuzzy ensemble technique for feature selection. Evolutionary Intelligence, 12(4), 713-724.
-
Ghosh, M., Guha, R., Alam, I., Lohariwal, P., Jalan, D., & Sarkar, R. (2019). Binary Genetic Swarm Optimization: A Combination of GA and PSO for Feature Selection. Journal of Intelligent Systems, 29(1), 1598-1610.
-
Guha, R., Khan, A.H., Singh, P.K. et al. CGA: a new feature selection model for visual human action recognition. Neural Comput & Applic (2020). https://doi.org/10.1007/s00521-020-05297-5
GSA
- Guha, R., Ghosh, M., Chakrabarti, A., Sarkar, R., & Mirjalili, S. (2020). Introducing clustering based population in Binary Gravitational Search Algorithm for Feature Selection. Applied Soft Computing, 106341.
GWO
- Dhargupta, S., Ghosh, M., Mirjalili, S., & Sarkar, R. (2020). Selective opposition based grey wolf optimization. Expert Systems with Applications, 113389.
MA
- T. Bhattacharyya, B. Chatterjee, P. K. Singh, J. H. Yoon, Z. W. Geem and R. Sarkar, "Mayfly in Harmony: A New Hybrid Meta-heuristic Feature Selection Algorithm," in IEEE Access, doi: 10.1109/ACCESS.2020.3031718
PSO
-
Ghosh, M., Guha, R., Singh, P. K., Bhateja, V., & Sarkar, R. (2019). A histogram based fuzzy ensemble technique for feature selection. Evolutionary Intelligence, 12(4), 713-724.
-
Ghosh, M., Guha, R., Alam, I., Lohariwal, P., Jalan, D., & Sarkar, R. (2019). Binary Genetic Swarm Optimization: A Combination of GA and PSO for Feature Selection. Journal of Intelligent Systems, 29(1), 1598-1610.
WOA
- Guha, R., Ghosh, M., Mutsuddi, S. et al. Embedded chaotic whale survival algorithm for filter–wrapper feature selection. Soft Comput 24, 12821–12843 (2020). https://doi.org/10.1007/s00500-020-05183-1.
Filters
Overall
- Ghosh, K. K., Begum, S., Sardar, A., Adhikary, S., Ghosh, M., Kumar, M., & Sarkar, R. (2021). Theoretical and empirical analysis of filter ranking methods: Experimental study on benchmark DNA microarray data. Expert Systems with Applications, 169, 114485.
PCC
- Guha, R., Ghosh, K. K., Bhowmik, S., & Sarkar, R. (2020, February). Mutually Informed Correlation Coefficient (MICC)-a New Filter Based Feature Selection Method. In 2020 IEEE Calcutta Conference (CALCON) (pp. 54-58). IEEE.
MI
- Guha, R., Ghosh, K. K., Bhowmik, S., & Sarkar, R. (2020, February). Mutually Informed Correlation Coefficient (MICC)-a New Filter Based Feature Selection Method. In 2020 IEEE Calcutta Conference (CALCON) (pp. 54-58). IEEE.
1. Wrapper-based Nature-inspired Feature Selection
Wrapper-based Nature-inspired methods are very popular feature selection approaches due to their efficiency and simplicity. These methods progress by introducing random set of candidate solutions (agents which are natural elements like particles, whales, bats etc.) and improving these solutions gradually by using guidance mechanisms of fitter agents. In order to calculate the fitness of the candidate solutions, wrappers require some learning algorithm (like classifiers) to calculate the worth of a solution at every iteration. This makes wrapper methods extremely reliable but computationally expensive as well.
Py_FS currently supports the following 12 wrapper-based FS methods:
- Binary Bat Algorithm (BBA)
- Cuckoo Search Algorithm (CS)
- Equilibrium Optimizer (EO)
- Genetic Algorithm (GA)
- Gravitational Search Algorithm (GSA)
- Grey Wolf Optimizer (GWO)
- Harmony Search (HS)
- Mayfly Algorithm (MA)
- Particle Swarm Optimization (PSO)
- Red Deer Algorithm (RDA)
- Sine Cosine Algorithm (SCA)
- Whale Optimization Algorithm (WOA)
These wrapper approaches can be imported in your code using the following statements:
from Py_FS.wrapper.nature_inspired import BBA
from Py_FS.wrapper.nature_inspired import CS
from Py_FS.wrapper.nature_inspired import EO
from Py_FS.wrapper.nature_inspired import GA
from Py_FS.wrapper.nature_inspired import GSA
from Py_FS.wrapper.nature_inspired import GWO
from Py_FS.wrapper.nature_inspired import HS
from Py_FS.wrapper.nature_inspired import MA
from Py_FS.wrapper.nature_inspired import PSO
from Py_FS.wrapper.nature_inspired import RDA
from Py_FS.wrapper.nature_inspired import SCA
from Py_FS.wrapper.nature_inspired import WOA
2. Filter-based Feature Selection
Filter methods do not use any intermediate learning algorithm to verify the strength of the generated solutions. Instead, they use statistical measures to identify the importance of different features in the context. So, finally every feature gets a rank according to their relevance in the dataset. The top features can then be used for classification.
Py_FS currently supports the following 4 filter-based FS methods:
- Pearson Correlation Coefficient (PCC)
- Spearman Correlation Coefficient (SCC)
- Relief
- Mutual Information (MI)
These filter approaches can be imported in your code using the following statements:
from Py_FS.filter import PCC
from Py_FS.filter import SCC
from Py_FS.filter import Relief
from Py_FS.filter import MI
3. Evaluation Metrics
The package comes with tools to evaluate features before or after FS. This helps to easily compare and analyze performances of different FS procedures.
Py_FS currently supports the following evaluation metrics:
- classification accuracy
- average recall
- average precision
- average f1 score
- confusion matrix
- confusion graph
The evaulation capabilities can be imported in your code using the following statement:
from Py_FS.evaluation import evaluate
User Manual
For detailed user guidelines, please access this user manual: Py_FS User Manual
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file Py_FS-0.2.1.tar.gz
.
File metadata
- Download URL: Py_FS-0.2.1.tar.gz
- Upload date:
- Size: 10.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92ed9cb3c077b5173ac12ccac597bcb8b1668940eb95be587327bb94dc11fc9b |
|
MD5 | 0659dc6190448f66179a7d926606ce85 |
|
BLAKE2b-256 | ea4963f0dd50ad53a55f00717f77fe13f16bd7aec3d07ac02d789a8b8fdc36f2 |
File details
Details for the file Py_FS-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: Py_FS-0.2.1-py3-none-any.whl
- Upload date:
- Size: 9.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b19010b7acbf4cc310b994a42059c058c3b063dabcb53939e22d632796fc295e |
|
MD5 | d2e417d092413ae13f4e1cfa4ef3cf73 |
|
BLAKE2b-256 | dbf15ce49db7722e5febcd2c9f46667de342c9aa69e9004ca0b8b4d3eb19e330 |