An open source python library for automated feature engineering based on Genetic Programming
Project description
Evolutionary Forest
An open source python library for automated feature engineering based on Genetic Programming
Free software: BSD license
Documentation: https://evolutionary-forest.readthedocs.io.
Introduction
Feature engineering is a long-standing issue that has plagued machine learning practitioners for many years. Deep learning techniques have significantly reduced the need for manual feature engineering in recent years. However, a critical issue is that the features discovered by deep learning methods are difficult to interpret.
In the domain of interpretable machine learning, genetic programming has demonstrated to be a promising method for automated feature construction, as it can improve the performance of traditional machine learning systems while maintaining similar interpretability. Nonetheless, such a potent method is rarely mentioned by practitioners. We believe that the main reason for this phenomenon is that there is still a lack of a mature package that can automatically build features based on the genetic programming algorithm. As a result, we propose this package with the goal of providing a powerful feature construction tool for enhancing existing state-of-the-art machine learning algorithms, particularly decision-tree based algorithms.
Features
A powerful feature construction tool for generating interpretable machine learning features.
A reliable machine learning model has powerful performance on the small dataset.
Installation
From PyPI:
pip install -U evolutionary_forest
From GitHub (Latest Code):
pip install git+https://github.com/hengzhe-zhang/EvolutionaryForest.git
Supported Algorithms
Example
An example of usage:
X, y = load_diabetes(return_X_y=True)
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
r = EvolutionaryForestRegressor(max_height=3, normalize=True, select='AutomaticLexicase',
gene_num=10, boost_size=100, n_gen=20, n_pop=200, cross_pb=1,
base_learner='Random-DT', verbose=True)
r.fit(x_train, y_train)
print(r2_score(y_test, r.predict(x_test)))
An example of improvements brought about by constructed features:
Tutorials
Here are some nodebook examples of using Evolutionary Forest:
Documentation
Tutorial: English Version | 中文版本
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
Citation
Please cite our paper if you find it helpful :)
@article{zhang2021evolutionary,
title={An Evolutionary Forest for Regression},
author={Zhang, Hengzhe and Zhou, Aimin and Zhang, Hu},
journal={IEEE Transactions on Evolutionary Computation},
volume={26},
number={4},
pages={735--749},
year={2021},
publisher={IEEE}
}
@article{zhang2023sr,
title={SR-Forest: A Genetic Programming based Heterogeneous Ensemble Learning Method},
author={Zhang, Hengzhe and Zhou, Aimin and Chen, Qi and Xue, Bing and Zhang, Mengjie},
journal={IEEE Transactions on Evolutionary Computation},
year={2023},
publisher={IEEE}
}
History
0.1.0 (2021-05-22)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file evolutionary_forest-0.2.4.tar.gz
.
File metadata
- Download URL: evolutionary_forest-0.2.4.tar.gz
- Upload date:
- Size: 179.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ade02241e3ab4c5c7cfc4a23aa4f8866588235e5240a07d8f98a3cdd63f350d2 |
|
MD5 | 9cdcd785e4aec5aba44eea52bf335657 |
|
BLAKE2b-256 | abab35b47eeb15a4ff203b3c165caed239923b19c97d6005d600c29c62d83cbd |
File details
Details for the file evolutionary_forest-0.2.4-py2.py3-none-any.whl
.
File metadata
- Download URL: evolutionary_forest-0.2.4-py2.py3-none-any.whl
- Upload date:
- Size: 156.8 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 03c2cb3bcdd0eb748fbdf4202a565e5696aaa09e16bdeeb41ff81569996c9a46 |
|
MD5 | e9eab67b60270aa20694fd2c78fd035f |
|
BLAKE2b-256 | 05b1008f18d171c3faec4e5718b6c18898024655cc2ee299ada7f09ac793b355 |