Scikit-lean models hyperparameters tuning, using evolutionary algorithms
Project description
Sklearn-genetic-opt
scikit-learn models hyperparameters tuning, using evolutionary algorithms.
This is meant to be an alternative from popular methods inside scikit-learn such as Grid Search and Randomized Grid Search.
Sklearn-genetic-opt uses evolutionary algorithms from the deap package to choose set of hyperparameters that optimizes (max or min) the cross validation scores, it can be used for both regression and classification problems.
Documentation is available here
Sampled distribution of hyperparameters:
Optimization progress in a regression problem:
Main Features:
GASearchCV: Principal class of the package, holds the evolutionary cross validation optimization routine
Algorithms: Set of different evolutionary algorithms to use as optimization procedure
Callbacks: Custom evaluation strategies to generate Early Stopping rules
Plots: Generate pre-define plots to understand the optimization process
MLflow: Build-in integration with mlflow to log all the hyperparameters and their cv-score
Usage:
Install sklearn-genetic-opt
It’s advised to install sklearn-genetic using a virtual env, inside the env use:
pip install sklearn-genetic-opt
Example
from sklearn_genetic import GASearchCV
from sklearn_genetic.space import Continuous, Categorical, Integer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
data = load_digits()
n_samples = len(data.images)
X = data.images.reshape((n_samples, -1))
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
clf = RandomForestClassifier()
param_grid = {'min_weight_fraction_leaf': Continuous(0.01, 0.5, distribution='log-uniform'),
'bootstrap': Categorical([True, False]),
'max_depth': Integer(2, 30),
'max_leaf_nodes': Integer(2, 35),
'n_estimators': Integer(100, 300)}
cv = StratifiedKFold(n_splits=3, shuffle=True)
evolved_estimator = GASearchCV(estimator=clf,
cv=cv,
scoring='accuracy',
population_size=10,
generations=35,
param_grid=param_grid,
n_jobs=-1,
verbose=True,
keep_top_k=4)
# Train and optimize the estimator
evolved_estimator.fit(X_train, y_train)
# Best parameters found
print(evolved_estimator.best_params_)
# Use the model fitted with the best parameters
y_predict_ga = evolved_estimator.predict(X_test)
print(accuracy_score(y_test, y_predict_ga))
# Saved metadata for further analysis
print("Stats achieved in each generation: ", evolved_estimator.history)
print("Best k solutions: ", evolved_estimator.hof)
Results
Log controlled by verbosity
Changelog
See the changelog for notes on the changes of Sklearn-genetic-opt
Important links
Official source code repo: https://github.com/rodrigo-arenas/Sklearn-genetic-opt/
Download releases: https://pypi.org/project/sklearn-genetic-opt/
Issue tracker: https://github.com/rodrigo-arenas/Sklearn-genetic-opt/issues
Source code
You can check the latest development version with the command:
git clone https://github.com/rodrigo-arenas/Sklearn-genetic-opt.git
Contributing
Contributions are more than welcome! There are lots of opportunities on the on going project, so please get in touch if you would like to help out. Also check the Contribution guide
Testing
After installation, you can launch the test suite from outside the source directory:
pytest sklearn_genetic
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file sklearn-genetic-opt-0.5.0.tar.gz
.
File metadata
- Download URL: sklearn-genetic-opt-0.5.0.tar.gz
- Upload date:
- Size: 20.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 994e04ba8755c201e490b67c0d1666dca653a963faf8d25140490f8288e23bed |
|
MD5 | 8c4927f42742692c17938989dfe38419 |
|
BLAKE2b-256 | 8cdaa0f271b84c8d41e08aefe2f46dbfda7cf2793b88c1dd080dfb4b65fa88e4 |
File details
Details for the file sklearn_genetic_opt-0.5.0-py3-none-any.whl
.
File metadata
- Download URL: sklearn_genetic_opt-0.5.0-py3-none-any.whl
- Upload date:
- Size: 27.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 20b1365b1a31a77710cd61a41e8dd96100198b8fb59389bdb3e523a00d30d20b |
|
MD5 | fa05a7351e3a2c80eb10ddd47a891e3b |
|
BLAKE2b-256 | 2ce565ab3d590af252b1f40f147178b6dcb41aaafdf99a1e720104b181033d65 |