Skip to main content

Python implementation of the M6GP algorithm, a multi-objective genetic programming algorithm for feature engineering

Project description

This is an easy-to-use, scikit-learn inspired version of the M6GP algorithm.


By using this file, you are agreeing to this product's EULA
This product can be obtained at https://github.com/jespb/Python-M6GP
Copyright ©2023-2025 J. E. Batista


This file contains information about the command and flags used in the stand-alone version of this implementation and an explanation of how to import, use, and edit this implementation.




This implementation of M6GP can be used in a stand-alone fashion using the following command and flags:

$ python Main_M6GP_standalone.py

[-d datasets]
- This flag expects a set of csv dataset names separated by ";" (e.g., "a.csv;b.csv")
- By default, the heart.csv dataset is used

[-dsdir dir]
- States the dataset directory.
- By default "datasets/" is used
- Use "-dsdir ./" for the root directory

[-md max_depth]
- This flag expects an integer with the maximum initial depth for the trees;
- By default, this value is set to 6.

[-mg max_generation]
- This flag expects an integer with the maximum number of generations;
- By default, this value is set to 100.

[-odir dir]
- States the output directory.
- By default, "results/" is used
- Use "-odir ./" for the root directory

[-op operators]
- This flag excepts a set of operators and their number of arguments, separated by ";"
- Allowed operators: +,2 ; -,2 ; *,2 ; /,2
- By default, the used operators are the sum, subtraction, multiplication, and protected division: "+,2;-,2;*,2;/,2"

[-ps population_size]
- This flag expects an integer with the size of the population;
- By default, this value is set to 500.

[-runs number_of_runs]
- This flag expects an integer with the number of runs to be made;
- By default, this value is set to 30

[-tf train_fraction]
- This flag expects a float [0;1] with the fraction of the dataset to be used in training;
- By default, this value is set to 0.70

[-ts tournament_size]
- This flag expects an integer with the tournament size;
- By default, this value is set to 10.

[-t number_of_threads]
- This flag expects an integer with the number of threads to use while evaluating the population;
- If the value is set to 1, the multiprocessing library will not be used
- By default, this value is set to 1.

[-di minimum_number_of_dimension]
- This flag expects an integer with the minimum number of dimensions in each individual;
- This flag affects the number of dimensions in the initial individuals;
- By default, this value is set to 1

[-dm maximum_number_of_dimension]
- This flag expects an integer with the maximum number of dimensions in each individual;
- By default, this value is set to 9999

[-rs random state]
- This flag expects an integer with the seed to be used by the M6GP algorithm;
- By default, this value is set to 42




How to import this implementation to your project:
- Download this repository;
- Copy the "m6gp/" directory to your project directory;
- import the M6GP class using "from m6gp.M6GP import M6GP".

How to use this implementation:
$ from m6gp.M6GP import M6GP
$ model = M6GP()
$ model.fit( training_x, training_y, test_x (optional), test_y (optional) )

Arguments for M6GP():
operators -> Operators used by the individual (default: [("+",2),("-",2),("*",2),("/",2)] )
max_depth -> Max initial depths of the individuals (default: 6)
population_size -> Population size (default: 500)
max_generation -> Maximum number of generations (default: 100)
tournament_size -> Tournament size (default: 5)
limit_depth -> Maximum individual depth (default: 17)
threads -> Number of CPU threads to be used (default: 1)
random_state -> Random state (default: 42)
dim_min -> Minimum number of dimensions (default: 1)
dim_max -> Maximum number of dimensions (default: 9999) #The algorithm will not reach this value

Arguments for model.fit():
Tr_X -> Training samples
Tr_Y -> Training labels
Te_X -> Test samples, used in the standalone version (default: None)
Te_Y -> Test labels, used in the standalone version (default: None)

Useful methods:
$ model = M6GP() -> starts the model;
$ model.fit(X, Y) -> fits the model to the dataset;
$ model.predict(X) -> Returns a list with the prediction of the given dataset.




How to edit this implementation:
Fitness Function ( m6gp.Individual ):
- Change the getFitness() method to use your own fitness function;
- This implementation assumes that a higher fitness is always better. To change this, edit the __gt__ method in this class;
- Warning: Since M6GP is a slow method, a fitness function that escalates well with the number of features is recommended.

Classification/regression algorithm ( m6gp.Individual ):
- Change the createModel() method to use your own classifier;
- Assuming it is a scykit-learn implementation, you may only need to change one line in this method;
- Warning: Since M6GP is a slow method, a learning algorithm that escalates well with the number of features is recommended.




Reference:

@inproceedings{m6gp,
doi = {10.1109/CEC60901.2024.10612107},
url = {https://ieeexplore.ieee.org/abstract/document/10612107},
year = {2024},
month = jun,
publisher = {{IEEE}},
author = {Joao E. Batista and Nuno M. Rodrigues and Leonardo Vanneschi},
title = {{M6GP: Multiobjective Feature Engineering}},
booktitle = {2024 {IEEE} Congress on Evolutionary Computation ({CEC})}
}


You may also be interested in other works related to measuring the complexity of feature engineering models:
- https://ieeexplore.ieee.org/abstract/document/10611989
- https://www.sciencedirect.com/science/article/pii/S2210650224002992



Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

m6gp-1.1.0.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

m6gp-1.1.0-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file m6gp-1.1.0.tar.gz.

File metadata

  • Download URL: m6gp-1.1.0.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for m6gp-1.1.0.tar.gz
Algorithm Hash digest
SHA256 e474f1a1504726665bc822c8e432466768001c9649ab0893df087f37ad996737
MD5 bf35231777517c2978379f20e4494088
BLAKE2b-256 5c82a01e2af067d651674ceec276e801a510fc1b49791c5fa21c2a686f72a39e

See more details on using hashes here.

File details

Details for the file m6gp-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: m6gp-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for m6gp-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 358c80a156e19a04f2ae4efe384464ff5a1cdb092fc632b6492d3f4e70ee0fb6
MD5 929f3ed5f515c275fc8414cd0e0daaf1
BLAKE2b-256 e5b22a61febeb33d2356ef83a32463fe367eaff5a8f5d282be1ac97caeaae97e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page