Skip to main content

Scikit-lean models hyperparameters tuning, using evolutionary algorithms

Project description

Travis Codecov PythonVersion PyPi Docs

https://github.com/rodrigo-arenas/Sklearn-genetic-opt/blob/master/docs/logo.png?raw=true

Sklearn-genetic-opt

scikit-learn models hyperparameters tuning, using evolutionary algorithms.

This is meant to be an alternative from popular methods inside scikit-learn such as Grid Search and Randomized Grid Search.

Sklearn-genetic-opt uses evolutionary algorithms from the deap package to find the “best” set of hyperparameters that optimizes (max or min) the cross validation scores, it can be used for both regression and classification problems.

Documentation is available here

Sampled distribution of hyperparameters:

https://github.com/rodrigo-arenas/Sklearn-genetic-opt/blob/master/demo/images/density.png?raw=true

Optimization progress in a regression problem:

https://github.com/rodrigo-arenas/Sklearn-genetic-opt/blob/master/demo/images/fitness.png?raw=true

Main Features:

  • GASearchCV: Principal class of the package, holds the evolutionary cross validation optimization routine

  • Algorithms: Set of different evolutionary algorithms to use as optimization procedure

  • Callbacks: Custom evaluation strategies to generate Early Stopping rules

  • Plots: Generate pre-define plots to understand the optimization process

Usage:

Install sklearn-genetic-opt

It’s advised to install sklearn-genetic using a virtual env, inside the env use:

pip install sklearn-genetic-opt

Example

from sklearn_genetic import GASearchCV
from sklearn_genetic.space import Continuous, Categorical, Integer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

data = load_digits()
n_samples = len(data.images)
X = data.images.reshape((n_samples, -1))
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

clf = RandomForestClassifier()

param_grid = {'min_weight_fraction_leaf': Continuous(0.01, 0.5, distribution='log-uniform'),
              'bootstrap': Categorical([True, False]),
              'max_depth': Integer(2, 30),
              'max_leaf_nodes': Integer(2, 35),
              'n_estimators': Integer(100, 300)}

cv = StratifiedKFold(n_splits=3, shuffle=True)

evolved_estimator = GASearchCV(estimator=clf,
                               cv=cv,
                               scoring='accuracy',
                               population_size=10,
                               generations=35,
                               param_grid=param_grid,
                               n_jobs=-1,
                               verbose=True,
                               keep_top_k=4)

# Train and optimize the estimator
evolved_estimator.fit(X_train, y_train)
# Best parameters found
print(evolved_estimator.best_params_)
# Use the model fitted with the best parameters
y_predict_ga = evolved_estimator.predict(X_test)
print(accuracy_score(y_test, y_predict_ga))

# Saved metadata for further analysis
print("Stats achieved in each generation: ", evolved_estimator.history)
print("Best k solutions: ", evolved_estimator.hof)

Results

Log controlled by verbosity

https://github.com/rodrigo-arenas/Sklearn-genetic-opt/blob/master/demo/images/log.JPG?raw=true

Contributing

Contributions are more than welcome! There are lots of opportunities on the on going project, so please get in touch if you would like to help out. Also check the Contribution guide

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sklearn-genetic-opt-0.4.0.tar.gz (19.2 kB view details)

Uploaded Source

Built Distribution

sklearn_genetic_opt-0.4.0-py3-none-any.whl (23.0 kB view details)

Uploaded Python 3

File details

Details for the file sklearn-genetic-opt-0.4.0.tar.gz.

File metadata

  • Download URL: sklearn-genetic-opt-0.4.0.tar.gz
  • Upload date:
  • Size: 19.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.8

File hashes

Hashes for sklearn-genetic-opt-0.4.0.tar.gz
Algorithm Hash digest
SHA256 31a592a73e149705bb6542095a8d2da4a9ae3233759314cddd3df8ba16c823f1
MD5 891614e9b8b800e17d13ca0e68c920df
BLAKE2b-256 70cd4a5d1b852fee745678ac276f2f5bc58f158181c98f2b2124081f3bc50aab

See more details on using hashes here.

File details

Details for the file sklearn_genetic_opt-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: sklearn_genetic_opt-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 23.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.8

File hashes

Hashes for sklearn_genetic_opt-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d06b1d0094007e90b0b16944a983c11ae73adf2d8e083eb1dc77c1e0ec081bc9
MD5 5ecc0098885d8c70705bad6a6c067c26
BLAKE2b-256 610653e95d46324f3d5f7a957515971ebbba40856e33fd8123886d5b0a087677

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page