A toolbox for feature selection using evolutionary algorithms

These details have not been verified by PyPI

Project links

Homepage

Project description

Evolutionary feature selection toolbox

This package contains a set of tool to easily apply evolutionary feature selection techniques to your datasets.

Installation
Quick Start
Initializers
Fitness Functions
Binary Transformers
Optimizers
Credits

Installation

You can install the package using the following command.

pip install evotoolbox

Quick start

This package is based on 4 essential building blocks: initializer, optimizer, fitness function, and binary transformer. To find the best features using an evolutionary algorithm you must choose one of the provided classes within this package or you can easily define your own custom classes.

The following sample code shows how to use the default Random Initializer, GWO Optimizer, MultiObjective Fitness, and Threshold Transformer to do a simple feature selection task.

import pandas as pd
import evotoolbox
from evotoolbox.initializers import RandomInitializer
from evotoolbox.fitness import MultiobjectiveFitness
from evotoolbox.binary import ThresholdTransformer
from evotoolbox.optimizers import GWO

# Load your data
data = pd.read_csv('colon.csv').to_numpy()
X = data[:,:-1]
Y = data[:, -1]
n_features = X.shape[1]

# Define the algorithm options
initializer = RandomInitializer(n_features=n_features, n_agents=10)
optimizer = GWO(ThresholdTransformer(0.5), max_iter=100, lb=0, ub=1)
fitness = MultiobjectiveFitness(alpha=0.99)

# Fit the data using the provided options
result = evotoolbox.fit(X, Y, initializer, optimizer, fitness)

# The result is a dictionary with three keys:
# solution: the binary solution with shape (n_features,) where the selected features are 1 and others are 0
# c: the convergence curve of the fitness
# nf: number of selected features in the final solution
print(result['solution'])

Initializers

Provided Initializers

This package provides three default initializer classes. These classes can be imported from evotoolbox.initializers. All of the initializers must receive the number of features and the number of agents as arguments.

RandomInitializer

The RandomInitializer creates a random initial population.

GreedyInitializer

The GreedyInitializer creates a random population and the tries to enhance the solution based on a greedy algorithm. This is a two step process. In the first pass the greedy algorithm checks if the fitness increases by setting zeros to ones in the solution one by one. If the solution fitness gets better that feature will be selected. In the second pass the algorithm tries to drop the features using the same logic. This initializer usually generates high quality solutions with small number of features. The downside is that the initialization time increases linearly with the number of features and it can take hours to complete on very large datasets.

OblInitializer

The RandomInitializer creates a random initial population and then also generates the complement of these solutions and compare each of these pairs and keeps the better solution in each pair and discards the other.

Defining your own initializer

You can easily define your own initializer by extending the evotoolbox.initializers.BaseInitializer class. You can access the number of features and agents in this class. You also receive a fitness function to evaluate your solutions in the initialization process.

import numpy as np
from evotoolbox.initializers import BaseInitializer

class MyCustomInitializer(BaseInitializer):
    def init(self, fitness_func):
        # create a numpy array of the initial population
        positions = np.zeros((self.n_agents, self.n_features))

        # do your magic here!

        # you can evaluate and compare the solutions using fitness_func
        sample_fitness = fitness_func(positions[0])
        # return the generated initial positions
        return positions

Fitness Functions

Fitness functions play an important role in meta-heuristic optimization. Generally in a feature selection task, a multi objective fitness function is used to achieve a high accuracy while keeping the number of features at minimum.

Provided fitness function

MultiobjectiveFitness

One of the most popular fitness functions used in feature selection is defined by this equation.

Fitness(selected_features) = alpha * KNN_ACCURACY(selected_features) + (1-alpha) * count(selected_features)

This package provides a MultiobjectiveFitness class which applies K-fold cross validation and calculates the accuracy of a KNN classifier and then uses the above equation to find the fitness of the given features. To use this fitness function simply insatiate this class with alpha and k (number of neighbors) parameters.

fitness = MultiobjectiveFitness(alpha=0.99, k=5)

Defining your own fitness function

Of course you can implement your own fitness function. To do so, you must extend the BaseFitness class and implement the evaluate function. This function receives the solution to evaluate as an argument. You can access the data with self.features and self.labels variables. Here's an example.

from evotoolbox.fitness import BaseFitness

class MyFitness(BaseFitness):
     def evaluate(self, solution):
        # use self.features and self.labels to evaluate the given solution
        # return a number, corresponding to the current solution fitness
        return fitness

Binary Transformers

Most of the evolutionary algorithms work in continuos space. However, for a feature selection task, we must convert these continuos values to binary values so that we can use them to choose the best features.

Provided binary transformers

This package provides a variety of binary transformers. The transformer is given to the optimizer so that it can use it on every iteration to convert the continuos solutions into binary values. There is one threshold based and three transfer function based methods which you can use. A transfer function gives the probability of a feature being set to 1 and it is used as follows. Z(x) is the binary value of x and T(x) is the transfer function.

transfer function

ThresholdTransformer

The simplest is the ThresholdTransformer. This transformer uses a threshold to simply set anything above the threshold to one, and anything below the threshold to zero.

from evotoolbox.binary import ThresholdTransformer
transformer = ThresholdTransformer(0.5) # provide the threshold value here

SigmoidTransformer

Sigmoid or S transfer function is a popular transfer function which is defined as below.

S transformer equation

from evotoolbox.binary import SigmoidTransformer
transformer = SigmoidTransformer(1) # set alpha value here

VTransformer

There are different types of V transformers. This package uses the following equation.

V transformer equation

from evotoolbox.binary import VTransformer
transformer = VTransformer(1) # set alpha value here

QTransformer

Quadratic transfer functions are another group of transfer functions and are defined with the following formula.

Q transformer equation

from evotoolbox.binary import QTransformer
transformer = QTransformer(6, 1) # set Xmax and p

Defining your own binary transformer

You might need to implement a custom binary transformer. Like other classes, you can easily extend the BaseTransformer class in evotoolbox.binary and implement the transform function to create your own transformer. Here's an example.

import numpy as np
from evotoolbox.binary import BaseTransformer

class VTransformer(BaseTransformer):
    def __init__(self, custom_parameter):
        # define your custom parameters here
        self.custom_parameter = custom_parameter

    def transform(self, solution):
        binary_solution = np.zeros_like(solution, dtype='int')
        # put your logic here 
        return binary_solution

Optimizers

This package comes with a variety of optimizers to use conveniently. You are also free to define your own optimizers.

Provided Optimizers

These optimizers are currently available in this package. Please note that this is an ongoing project and this list will be updated regularly with new algorithms.

Each of the optimizers can have their own parameters, but the first four arguments when instantiating an optimizer class are shared and required. These arguments are: binary_transformer, max_iter, lb, and ub. these arguments control the binary transformer used to binarize the continuos values, max number of iterations, lower bound, and upper bound, respectively.

Grey Wolf Optimizer (GWO)

GWO is introduced by ... for more info refer to the relevant paper.

import GWO from evotoolbox.optimizers
initializer = GWO(binary_transformer, max_iter, lb, ub)

Butterfly Optimization Algorithm (BOA)

BOA is introduced by ... for more info refer to the relevant paper.

import BOA from evotoolbox.optimizers
initializer = BOA(binary_transformer, max_iter, lb, ub, p=0.8, a=0.1, c_min=0.01, c_max=0.25)

Genetic Algorithm (GA)

GA is introduced by ... for more info refer to the relevant paper.

import GA from evotoolbox.optimizers
initializer = GA(binary_transformer, max_iter, lb, ub, MR = 0.01, CR = 0.8)

Harris Hawk Optimizer (HHO)

HHO is introduced by ... for more info refer to the relevant paper.

import HHO from evotoolbox.optimizers
initializer = HHO(binary_transformer, max_iter, lb, ub, beta = 1.5)

Salp Swarm Algorithm (SSA)

SSA is introduced by ... for more info refer to the relevant paper.

import SSA from evotoolbox.optimizers
initializer = SSA(binary_transformer, max_iter, lb, ub)

Defining your own optimizer

You probably want to implement your own optimizer to try out a new algorithm. Defining a new optimizer is simple, you should extend the BaseOptimizer class provided in evotoolbox.optimizers. You can define additional parameters required for your algorithm in the class constructor. All optimizers must implement the abstract optimize(self, fitness_func, initial_positions, n_features, n_agents) method defined in BaseOptimizer Take a look at this example.

import numpy as np
from evotoolbox.optimizers import BaseOptimizer

class BOA(BaseOptimizer):
    def __init__(self, binary_transformer, max_iter, lb, ub, my_parameter):
        super().__init__(binary_transformer, max_iter, lb, ub)
        self.my_parameter = my_parameter


    def optimize(self, fitness_func, initial_positions, n_features, n_agents):
        # Optimize the problem using the given arguments
        # You may use your custom parameter with self.my_parameter
        # initial_positions will be the population initialized before
        # Your function must return a dictionary as defined below:
        # solution: the binary solution with shape (n_features,) where the selected features are 1 and others are 0
        # c: the convergence curve of the fitness
        # nf: number of selected features in the final solution
        return {
            'solution': None,
            'c': None,
            'nf': None,
        }

Credits

Authors:

Shakiba Shahbandegan

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.0.2

Apr 27, 2021

This version

0.0.1

Apr 27, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evotoolbox-0.0.1.tar.gz (16.8 kB view hashes)

Uploaded Apr 27, 2021 Source

Built Distribution

evotoolbox-0.0.1-py3-none-any.whl (21.9 kB view hashes)

Uploaded Apr 27, 2021 Python 3

Hashes for evotoolbox-0.0.1.tar.gz

Hashes for evotoolbox-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`1ac22beb12d96c6e8241688e1d6f5e8da847f4de3b3aa25b0d13af39b4487645`
MD5	`3ba27d25a57d9e1a835d4b71bd97cff5`
BLAKE2b-256	`2055d4a794d700e390fb850f5555f168bf852f4df6dada1bfdfac2ac3e4288f3`

Hashes for evotoolbox-0.0.1-py3-none-any.whl

Hashes for evotoolbox-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`433c6ed7e7cef78bb4430065995e62a681177bcd2bf76bdaec2ef5ad8c800639`
MD5	`d82f615748f50b87ac5280544c2ac9fa`
BLAKE2b-256	`aadf021544b017bc5f151f4be88c849620a2db486f0ebc321a080b1792e0117c`

evotoolbox 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Evolutionary feature selection toolbox

Installation

Quick start

Initializers

Provided Initializers

RandomInitializer

GreedyInitializer

OblInitializer

Defining your own initializer

Fitness Functions

Provided fitness function

MultiobjectiveFitness

Defining your own fitness function

Binary Transformers

Provided binary transformers

ThresholdTransformer

SigmoidTransformer

VTransformer

QTransformer

Defining your own binary transformer

Optimizers

Provided Optimizers

Grey Wolf Optimizer (GWO)

Butterfly Optimization Algorithm (BOA)

Genetic Algorithm (GA)

Harris Hawk Optimizer (HHO)

Salp Swarm Algorithm (SSA)

Defining your own optimizer

Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution