Skip to main content

Scikit-lean models hyperparameters tuning, using evolutionary algorithms

Project description

Tests Codecov PythonVersion PyPi Docs

https://github.com/rodrigo-arenas/Sklearn-genetic-opt/blob/master/docs/logo.png?raw=true

Sklearn-genetic-opt

scikit-learn models hyperparameters tuning, using evolutionary algorithms.

This is meant to be an alternative from popular methods inside scikit-learn such as Grid Search and Randomized Grid Search.

Sklearn-genetic-opt uses evolutionary algorithms from the DEAP package to choose the set of hyperparameters that optimizes (max or min) the cross-validation scores, it can be used for both regression and classification problems.

Documentation is available here

Main Features:

  • GASearchCV: Principal class of the package, holds the evolutionary cross-validation optimization routine.

  • Algorithms: Set of different evolutionary algorithms to use as an optimization procedure.

  • Callbacks: Custom evaluation strategies to generate early stopping rules, logging (into TensorBoard, .pkl files, etc) or your custom logic.

  • Plots: Generate pre-defined plots to understand the optimization process.

  • MLflow: Build-in integration with mlflow to log all the hyperparameters, cv-scores and the fitted models.

Demos on Features:

Visualize the progress of your training:

https://github.com/rodrigo-arenas/Sklearn-genetic-opt/blob/master/docs/images/progress_bar.gif?raw=true

Real-time metrics visualization and comparison across runs:

https://github.com/rodrigo-arenas/Sklearn-genetic-opt/blob/master/docs/images/tensorboard_log.png?raw=true

Sampled distribution of hyperparameters:

https://github.com/rodrigo-arenas/Sklearn-genetic-opt/blob/master/docs/images/density.png?raw=true

Artifacts logging:

https://github.com/rodrigo-arenas/Sklearn-genetic-opt/blob/master/docs/images/mlflow_artifacts_4.png?raw=true

Usage:

Install sklearn-genetic-opt

It’s advised to install sklearn-genetic using a virtual env, inside the env use:

pip install sklearn-genetic-opt

If you want to get all the features, including plotting and mlflow logging capabilities, install all the extra packages:

pip install sklearn-genetic-opt[all]

The only optional dependency that the last command does not install, it’s Tensorflow, it is usually advised to look further which distribution works better for you.

Example

from sklearn_genetic import GASearchCV
from sklearn_genetic.space import Continuous, Categorical, Integer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

data = load_digits()
n_samples = len(data.images)
X = data.images.reshape((n_samples, -1))
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

clf = RandomForestClassifier()

param_grid = {'min_weight_fraction_leaf': Continuous(0.01, 0.5, distribution='log-uniform'),
              'bootstrap': Categorical([True, False]),
              'max_depth': Integer(2, 30),
              'max_leaf_nodes': Integer(2, 35),
              'n_estimators': Integer(100, 300)}

cv = StratifiedKFold(n_splits=3, shuffle=True)

evolved_estimator = GASearchCV(estimator=clf,
                               cv=cv,
                               scoring='accuracy',
                               population_size=10,
                               generations=35,
                               param_grid=param_grid,
                               n_jobs=-1,
                               verbose=True,
                               keep_top_k=4)

# Train and optimize the estimator
evolved_estimator.fit(X_train, y_train)
# Best parameters found
print(evolved_estimator.best_params_)
# Use the model fitted with the best parameters
y_predict_ga = evolved_estimator.predict(X_test)
print(accuracy_score(y_test, y_predict_ga))

# Saved metadata for further analysis
print("Stats achieved in each generation: ", evolved_estimator.history)
print("Best k solutions: ", evolved_estimator.hof)

Changelog

See the changelog for notes on the changes of Sklearn-genetic-opt

Source code

You can check the latest development version with the command:

git clone https://github.com/rodrigo-arenas/Sklearn-genetic-opt.git

Install the development dependencies:

pip install -r dev-requirements.txt

Check the latest in-development documentation: https://sklearn-genetic-opt.readthedocs.io/en/latest/

Contributing

Contributions are more than welcome! There are several opportunities on the ongoing project, so please get in touch if you would like to help out. Make sure to check the current issues and also the Contribution guide.

Big thanks to the people who are helping with this project!

Contributors

Testing

After installation, you can launch the test suite from outside the source directory:

pytest sklearn_genetic

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sklearn-genetic-opt-0.6.1.tar.gz (24.0 kB view details)

Uploaded Source

Built Distribution

sklearn_genetic_opt-0.6.1-py3-none-any.whl (26.8 kB view details)

Uploaded Python 3

File details

Details for the file sklearn-genetic-opt-0.6.1.tar.gz.

File metadata

  • Download URL: sklearn-genetic-opt-0.6.1.tar.gz
  • Upload date:
  • Size: 24.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.8

File hashes

Hashes for sklearn-genetic-opt-0.6.1.tar.gz
Algorithm Hash digest
SHA256 6c42fc608d488eceef00e423577b83769645239a4d822fbfc5e0ad89338752fe
MD5 4abbc95f4927943d5158db6d8fa81fda
BLAKE2b-256 73c361ff4154b7ebf5d540089b6b66a5ba1d205c3519490a88ea0d2f45b40d8e

See more details on using hashes here.

File details

Details for the file sklearn_genetic_opt-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: sklearn_genetic_opt-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 26.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.8

File hashes

Hashes for sklearn_genetic_opt-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c1e5d2dd217a5a2feafaf41fe1898717129420282684c0dcb2fe5737bdd77273
MD5 82a300bfcb8148a3fc979408e23b8d3b
BLAKE2b-256 3c86fc20aa51e340855dfc8a3ac2eb6479d33d35bdcb2c79644f9f544f5e2271

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page