Python automated machine learning framework
Project description
🌳 NiaAML
📦 Installation • 💻 Graphical User Interface • 📮 API • ✨ Implemented Components • 💪 Optimization Process And Parameter Tuning • 📓 Examples • 🫂 Contributors • 🙏 Support • 🔑 License • 📄 Cite Us
NiaAML is a framework for Automated Machine Learning based on nature-inspired algorithms for optimization. The framework is written fully in Python. The name NiaAML comes from the Automated Machine Learning method of the same name [1]. Its goal is to compose the best possible classification pipeline for the given task efficiently using components on the input. The components are divided into three groups: feature selection algorithms, feature transformation algorithms and classifiers. The framework uses nature-inspired algorithms for optimization to choose the best set of components for the classification pipeline, and optimize their hyperparameters. We use the NiaPy framework for the optimization process, which is a popular Python collection of nature-inspired algorithms. The NiaAML framework is easy to use and customize or expand to suit your needs.
🆕📈 NiaAML now also support regression tasks. The package still refers to regressors as "classifiers" to avoid introducing a breaking change to the API.
The NiaAML framework allows you not only to run full pipeline optimization, but also to separate implemented components such as classifiers, feature selection algorithms, etc. It supports numerical and categorical features as well as missing values in datasets.
- Free software: MIT license,
- Documentation: https://niaaml.readthedocs.io/en/latest/,
- Python versions: 3.9 | 3.10 | 3.11
- Dependencies: click,
- Tested OS: Windows, Ubuntu, Fedora, Linux Mint and CentOS. However, that does not mean it does not work on others.
📦 Installation
pip3
Install NiaAML with pip3:
pip3 install niaaml
In case you would like to try out the latest pre-release version of the framework, install it using:
pip3 install niaaml --pre
Fedora Linux
To install NiaAML on Fedora, use:
$ dnf install python-niaaml
Alpine Linux
To install NiaAML on Alpine Linux, please enable Community repository and use:
$ apk add py3-niaaml
💻 Graphical User Interface
There is a simple Graphical User Interface for the NiaAML package available here.
📮 API
There is a simple API for remote work with NiaAML package available here.
✨ Implemented Components
Click here for a list of currently implemented components divided into groups: classifiers, feature selection algorithms and feature transformation algorithms. At the end you can also see a list of currently implemented fitness functions for the optimization process, categorical features' encoders, and missing values' imputers. All of the components are passed into the optimization process using their class names. Let's say we want to choose between Adaptive Boosting, Bagging and Multi Layer Perceptron classifiers, Select K Best and Select Percentile feature selection algorithms and Normalizer as the feature transformation algorithm (may not be selected during the optimization process).
PipelineOptimizer(
data=...,
classifiers=['AdaBoost', 'Bagging', 'MultiLayerPerceptron'],
feature_selection_algorithms=['SelectKBest', 'SelectPercentile'],
feature_transform_algorithms=['Normalizer']
)
The argument of the PipelineOptimizer categorical_features_encoder
is None
by default. If your dataset contains any categorical features, you need to specify an encoder to use. The same goes for imputer
and features that contain missing values.
PipelineOptimizer(
data=...,
classifiers=['AdaBoost', 'Bagging', 'MultiLayerPerceptron'],
feature_selection_algorithms=['SelectKBest', 'SelectPercentile'],
feature_transform_algorithms=['Normalizer'],
categorical_features_encoder='OneHotEncoder',
imputer='SimpleImputer'
)
For a full example see the 📓 Examples section.
💪 Optimization Process And Parameter Tuning
In the modifier version of NiaAML optimization process there are two types of optimization. The goal of the first type is to find an optimal set of components (feature selection algorithm, feature transformation algorithm and classifier). The next step is to find optimal parameters for the selected set of components, and that is the goal of the second type of optimization. Each component has an attribute _params
, which is a dictionary of parameters and their possible values.
self._params = dict(
n_estimators = ParameterDefinition(MinMax(min=10, max=111), np.uint),
algorithm = ParameterDefinition(['SAMME', 'SAMME.R'])
)
An individual in the first type of optimization is represented as a real-valued vector that has a size equal to the sum of the number of keys in all three dictionaries (classifier's _params
, Feature Transformation algorithm's _params
and feature selection algorithm's _params
) and the value of each dimension is in the range [0.0, 1.0]. The second type of optimization maps real values from the individual's vector to those parameter definitions in the dictionaries. Each parameter's value can be defined as a range or array of values. In the first case, a value from a vector is mapped from one iterval to another, and in the second case, a value from the vector falls into one of the bins that represent an index of the array that holds possible parameters` values.
Let's say we have a classifier with 3 parameters, a feature selection algorithm with 2 parameters and feature transformation algorithm with 4 parameters. The size of an individual in the second type of optimization is 9. The size of an individual in the first type of optimization is always 3 (1 classifier, 1 feature selection algorithm and 1 feature transformation algorithm).
In some cases we may want to tune a parameter that needs additional information for setting its range of values, so we cannot set the range in the initialization method. In that case, we should set its value in the dictionary to None and define it later in the process. The parameter will be a part of the parameter tuning process as soon as we define its possible values. For example, see Select K Best Feature Selection and its parameter k
.
The NiaAML framwork also supports running optimization according to the original method proposed in [1] where the components selection and hyperparameter optimization steps are combined into one.
📓 Examples
Example of Usage
Load data and try to find the optimal pipeline for the given components. The example below uses the Particle Swarm Algorithm as the optimization algorithm. You can find a list of all available algorithms in the NiaPy's repository.
from niaaml import PipelineOptimizer, Pipeline
from niaaml.data import BasicDataReader
import numpy
import pandas
# dummy random data
data_reader = BasicDataReader(
x=numpy.random.uniform(low=0.0, high=15.0, size=(50, 3)),
y=numpy.random.choice(['Class 1', 'Class 2'], size=50)
)
pipeline_optimizer = PipelineOptimizer(
data=data_reader,
classifiers=['AdaBoost', 'Bagging', 'MultiLayerPerceptron', 'RandomForest', 'ExtremelyRandomizedTrees', 'LinearSVC'],
feature_selection_algorithms=['SelectKBest', 'SelectPercentile', 'ParticleSwarmOptimization', 'VarianceThreshold'],
feature_transform_algorithms=['Normalizer', 'StandardScaler']
)
# run the modified version of optimization
pipeline1 = pipeline_optimizer.run('Accuracy', 15, 15, 300, 300, 'ParticleSwarmAlgorithm', 'ParticleSwarmAlgorithm')
# run the original version
pipeline2 = pipeline_optimizer.run_v1('Accuracy', 15, 400, 'ParticleSwarmAlgorithm')
You can save a result of the optimization process as an object to a file for later use.
pipeline1.export('pipeline.ppln')
And also load it from a file and use the pipeline.
loaded_pipeline = Pipeline.load('pipeline.ppln')
# some features (can be loaded using DataReader object instances)
x = pandas.DataFrame([[0.35, 0.46, 5.32], [0.16, 0.55, 12.5]])
y = loaded_pipeline.run(x)
You can also save a user-friendly representation of a pipeline to a text file.
pipeline1.export_text('pipeline.txt')
This is a very simple example with dummy data. It is only intended to give you a basic idea of how to use the framework.
📈 Example of a Regression Task
The API for solving regression tasks is not different to the classification use-case. One only has to choose the right components that support regression:
Currently, the following components support regression tasks:
➡️ Feature Transform Algorithms:
- "Normalizer"
- "StandardScaler"
- "MaxAbsScaler"
- "QuantileTransformer"
- "RobustScaler"
🔎 Feature Selection Algorithms:
- "SelectKBest"
- "SelectPercentile"
- "SelectUnivariateRegression"
🔮 Models (Classifiers):
- "LinearRegression"
- "RidgeRegression"
- "LassoRegression"
- "DecisionTreeRegression"
- "GaussianProcessRegression"
pipeline_optimizer = PipelineOptimizer(
data=data_reader,
feature_selection_algorithms=["SelectKBest", "SelectPercentile", "SelectUnivariateRegression"],
feature_transform_algorithms=["Normalizer", "StandardScaler"],
classifiers=["LinearRegression", "RidgeRegression", "LassoRegression", "DecisionTreeRegression", "GaussianProcessRegression"],
)
# run the modified version of optimization
pipeline1 = pipeline_optimizer.run("MSE", 10, 10, 20, 20, "ParticleSwarmAlgorithm")
Example of a Pipeline Component's Implementation
The NiaAML framework is easily expandable, as you can implement components by overriding the base classes' methods. To implement a classifier you should inherit from the Classifier class, and you can do the same with FeatureSelectionAlgorithm and FeatureTransformAlgorithm classes. All of the mentioned classes inherit from the PipelineComponent class.
Take a look at the Classifier class and the implementation of the AdaBoost classifier that inherits from it.
Example of a Fitness Function's Implementation
The NiaAML framework also allows you to implement your own fitness function. All you need to do is implement the FitnessFunction class.
Take a look at the Accuracy implementation.
Example of a Feature Encoder's Implementation
The NiaAML framework also allows you to implement your own feature encoder. All you need to do is implement the FeatureEncoder class.
Take a look at the OneHotEncoder implementation.
Example of an Imputer's Implementation
The NiaAML framework also allows you to implement your own imputer. All you need to do is implement the Imputer class.
Take a look at the SimpleImputer implementation.
More
You can find more examples here.
🫂 Contributors
Thanks goes to these wonderful people (emoji key):
Luka Pečnik 💻 📖 👀 🐛 💡 ⚠️ 🚇 |
firefly-cpp 💻 🐛 🧑🏫 🔬 🤔 |
sisco0 🤔 |
zStupan 💻 |
Ben Beasley 💻 🚇 |
Laurenz Farthofer 💻 📖 🚇 |
This project follows the all-contributors specification. Contributions of any kind are welcome!
🙇 Contributing
We encourage you to contribute to NiaAML! Please check out the Contributing to NiaAML guide for guidelines about how to proceed.
Everyone interacting in NiaAML's codebases, issue trackers, chat rooms and mailing lists is expected to follow the NiaAML code of conduct.
🙏 Support
❓ Usage Questions
If you have questions about how to use NiaAML, or have an issue that isn’t related to a bug, you can place a question on StackOverflow.
You can also seek support via email
NiaAML is a community supported package, nobody is paid to develop package nor to handle NiaAML support.
All people answering your questions are doing it with their own time, so please be kind and provide as much information as possible.
❗ Issues
Before creating bug reports, please check existing issues list as you might find out that you don't need to create one. When you are creating a bug report, please include as many details as possible in the issue template.
🔑 Licence
This package is distributed under the MIT License. This license can be found online at http://www.opensource.org/licenses/MIT.
Disclaimer
This framework is provided as-is, and there are no guarantees that it fits your purposes or that it is bug-free. Use it at your own risk!
📝 References
[1] Iztok Fister Jr., Milan Zorman, Dušan Fister, Iztok Fister. Continuous optimizers for automatic design and evaluation of classification pipelines. In: Frontier applications of nature inspired computation. Springer tracts in nature-inspired computing, pp.281-301, 2020.
📄 Cite us
@article{Pečnik2021,
doi = {10.21105/joss.02949},
url = {https://doi.org/10.21105/joss.02949},
year = {2021},
publisher = {The Open Journal},
volume = {6},
number = {61},
pages = {2949},
author = {Luka Pečnik and Iztok Fister},
title = {NiaAML: AutoML framework based on stochastic population-based nature-inspired algorithms},
journal = {Journal of Open Source Software}
}
L. Pečnik, I. Fister Jr. "NiaAML: AutoML framework based on stochastic population-based nature-inspired algorithms." Journal of Open Source Software 6.61 (2021): 2949.
@inproceedings{pecnik_niaaml2_2021,
address = {Cham},
title = {{NiaAML2}: {An} {Improved} {AutoML} {Using} {Nature}-{Inspired} {Algorithms}},
isbn = {978-3-030-78811-7},
abstract = {Using machine learning methods in the real-world is far from being easy, especially because of the number of methods on the one hand, and setting the optimal values of their parameters on the other. Therefore, a lot of so-called AutoML methods have emerged nowadays that also enable automatic construction of classification pipelines to users, who are not experts in this domain. In this study, the NiaAML2 method is proposed that is capable of constructing the classification pipelines using nature-inspired algorithms in two phases: pipeline construction, and hyper-parameter optimization. This method improves the original NiaAML capable of this construction in one phase. The algorithm was applied to four UCI ML datasets, while the obtained results encouraged us to continue with the research.},
booktitle = {Advances in {Swarm} {Intelligence}},
publisher = {Springer International Publishing},
author = {Pečnik, Luka and Fister, Iztok and Fister, Iztok},
editor = {Tan, Ying and Shi, Yuhui},
year = {2021},
pages = {243--252},
}
L. Pečnik, Fister, I., Fister, I. Jr. NiaAML2: An Improved AutoML Using Nature-Inspired Algorithms. In International Conference on Swarm Intelligence (pp. 243-252). Springer, Cham, 2021.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file niaaml-2.0.0.tar.gz
.
File metadata
- Download URL: niaaml-2.0.0.tar.gz
- Upload date:
- Size: 42.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.12.2 Linux/6.8.7-200.fc39.x86_64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 712086374a320b8b2d104e48d4201ccfd5b00588dcf54073b25b6a0c1adbc6bb |
|
MD5 | 6df147c36bf15dd0b7d97c166c07e135 |
|
BLAKE2b-256 | 7308c90811246d0822645d0d8158da79ba27e900e148d26a78abfcdc5e87b1e2 |
File details
Details for the file niaaml-2.0.0-py3-none-any.whl
.
File metadata
- Download URL: niaaml-2.0.0-py3-none-any.whl
- Upload date:
- Size: 75.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.12.2 Linux/6.8.7-200.fc39.x86_64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c94421f09d3dbfebde06465d47c1937b4f562d6e340e01ef1a3792f5baf79bc8 |
|
MD5 | d67d0c282c398dbcc65a2ea44d271845 |
|
BLAKE2b-256 | 1f1271a06556d4d7d8ab46b77124312a4b5b23c69d4fac80dfbe6ad02bfef88d |