Skip to main content

A Feature Selection and Feature ranking Package that can be used to select and rank features in datasets

Project description

Feature Selection and Feature Ranking Algorithms :

A Python package that provides many feature selection and feature ranking algorithms

Use the function call like :

fsfr(dataset, fs = 'string_value', fr = 'string_value', ftf = 'string_value')

Parameters :

dataset : pandas dataframe of the original dataset

         It must only contain numerical values (categorical, ordinal values are excluded) and 
         the class variable (decisional attribute or variable) should be also of numerical type.

fs : string values - 'gpso' or 'ga'

     fs means feature selection method can be either :
     gpso : Geometric Particle Swarm Optimisation
     ga : Genetic Alogorithm

fr : string values - 'rsm_a' , 'rsm_b' , 'rsm_c' , 'mifsnd' , 'mrmr'

     fr means feature ranking and can be either :
     rsm_a : Rough Set Method 1
     rsm_b : Rough Set Method 2
     rsm_c : Rough Set Method 3
     mifsnd : Mutual Information Feature Selection-ND
     mrmr : Minimum Redundancy Maximum Relevance

ftf : string values - 'ftf_1' , 'ftf_2' , 'ftf_3'

    ftf means fitness function
    If 'fs' is used then, it is mandatory to specify the value of 'ftf'
    ftf_1 : fitness function = 0.75 * (100/accuracy) + 0.25 * (no of features)
    ftf_2 : fitness function = 0.75 * accuracy + 0.25 * (1 / no of features)
    ftf_3 : fitness_function = accuracy * (1 - no of features/total no of features)
    no of features = no of features that are selected by the algorithm at that instance

Returns : list of features ranked in descending order if both 'fs' and 'fr' are used or only 'fr' is used.

The feature selection and ranking can be used independently of each other by mentioning either fs='' or fr='' but both cannot be '' and it is preferable to use both at the same time in case of larger datasets.

Refrences for algorithms :

gpso with ftf_1 : https://www.researchgate.net/publication/4307926_Gene_selection_in_cancer_ rsm_a : http://library.isical.ac.in:8080/jspui/bitstream/10263/5158/1/Rough%20Sets%20for%20Selection %20of%20Molecular%20Descriptors%20to%20Predict%20Biological%20Activity%20of%20Molecules-IEEETOSMAC-% 20Part%20C-AAR-40-6-2010-p%20639-648.pdf rsm_b : https://ieeexplore.ieee.org/document/7104131 mifsnd : https://www.sciencedirect.com/science/article/pii/S0957417414002164 The rest of the algorithms have been self developed and do not contain any materials from any other sources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page