Skip to main content

The first feature selection method based on reinforcement learning - Python library available on pip for a fast deployment.

Project description

FSRLeaning - Python Library

Downloads Downloads

FSRLeaning is a Python library for feature selection using reinforcement learning. It's designed to be easy to use and efficient, particularly for selecting the most relevant features from a very large set.

Installation

Install FSRLearning using pip:

pip install FSRLearning

Example usage

Data Pre-processing

The Dataset

In this example, we're using the Australian credit approval dataset. It has 14 features that have been intentionally anonymized. The goal is to predict whether the label is 0 or 1. We're using this dataset to demonstrate how to use the library, but the model can work with any dataset. You can find more details about the dataset here.

The process

The first step is a pre-processing of the data. You need to give as input to the method for feature selection a X and y pandas DataFrame. X is the dataset with all the features that we want to evaluate and y the label to be predicted. It is highly recommended to create a mapping between features and a list of number. For example each feature is associated with a number. Here is an example of the data pre-processing step on a data set with 14 features including 1 label.

import pandas as pd

# Get the pandas DataFrame
australian_data = pd.read_csv('australian_data.csv', header=None)

# Get the dataset with the features
X = australian_data.drop(14, axis=1)

# Get the dataset with the label values
y = australian_data[14]

After this step we can simply run a feature selection and ranking process that maximises a metric.

from FSRLearning import FeatureSelectorRL

# Create the object of feature selection with RL
fsrl_obj = FeatureSelectorRL(14, nb_iter=200)

# Returns the results of the selection and the ranking
results = fsrl_obj.fit_predict(X, y)
results

The feature_Selector_RL has several parameters that can be tuned. Here is all of them and the values that they can take.

  • feature_number (integer) : number of features in the DataFrame X

  • feature_structure (dictionary, optional) : dictionary for the graph implementation

  • eps (float [0; 1], optional) : probability of choosing a random next state, 0 is an only greedy algorithm and 1 only random

  • alpha (float [0; 1], optional): control the rate of updates, 0 is a very not updating state and 1 a very updated

  • gamma (float [0, 1], optional): factor of moderation of the observation of the next state, 0 is a shortsighted condition and 1 it exhibits farsighted behavior

  • nb_iter (int, optional): number of sequences to go through the graph

  • starting_state ("empty" or "random", optional) : if "empty" the algorithm starts from the empty state and if "random" the algorithm starts from a random state in the graph

The output of the selection process is a 5-tuple object.

  • Index of the features that have been sorted

  • Number of times that each feature has been chosen

  • Mean reward brought by each feature

  • Ranking of the features from the less important to the most important

  • Number of states visited

Existing methods

  • Compare the performance of the FSRLearning library with RFE from Sickit-Learn :
fsrl_obj.compare_with_benchmark(X, y, results)

Returns some comparisons and plot a graph with the metric for each set of features selected. It is useful for parameters tuning.

  • Get the evolution of the number of the visited states for the first time and the already visited states :
fsrl_obj.get_plot_ratio_exploration()

Returns a plot. It is useful to get an overview of how the graph is browse and to tune the epsilon parameter (exploration parameter).

  • Get an overview of the relative impact of each feature on the model :
fsrl_obj.get_feature_strengh(results)

Returns a bar plot.

  • Get an overview of the action of the stop conditions :
fsrl_obj.get_depth_of_visited_states()

Returns a plot. It is useful to see how deep the Markovian Decision Process goes in the graph.

Your contribution is welcomed !

  • Automatise the data processing step and generalize the input data format and type
  • Distribute the computation of each reward for making the algorithm faster
  • Add more vizualization and feedback methods

References

This library has been implemented with the help of these two articles :

  • Sali Rasoul, Sodiq Adewole and Alphonse Akakpo, FEATURE SELECTION USING REINFORCEMENT LEARNING (2021)
  • Seyed Mehdin Hazrati Fard, Ali Hamzeh and Sattar Hashemi, USING REINFORCEMENT LEARNING TO FIND AN OPTIMAL SET OF FEATURES (2013)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fsrlearning-1.0.7.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

FSRLearning-1.0.7-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file fsrlearning-1.0.7.tar.gz.

File metadata

  • Download URL: fsrlearning-1.0.7.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.19

File hashes

Hashes for fsrlearning-1.0.7.tar.gz
Algorithm Hash digest
SHA256 e3b975a2fe0513c402babe64393e67666418801ecd78019d432c12e0dc4d4b6b
MD5 544133ba393d27afa7e524549d7519c3
BLAKE2b-256 2d04633d5ce611f2d6f96b07fde08c45b25eba9e4deab43f24036a4368805518

See more details on using hashes here.

File details

Details for the file FSRLearning-1.0.7-py3-none-any.whl.

File metadata

  • Download URL: FSRLearning-1.0.7-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.19

File hashes

Hashes for FSRLearning-1.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 de8e60c8e5146ab9355d719324bab2d177a7fee804da8a7638847356e1f3e228
MD5 9df39ab88953595b321ca6565c9dfecf
BLAKE2b-256 29f7f42240f7a71d681080707fd2847e4ea50d14cd17d3b96ba1db419c4c4f5b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page