Skip to main content

Scikit-lean models hyperparameters tuning, using evolutionary algorithms

Project description

Tests Codecov PythonVersion PyPi Docs

https://github.com/rodrigo-arenas/Sklearn-genetic-opt/blob/master/docs/logo.png?raw=true

Sklearn-genetic-opt

scikit-learn models hyperparameters tuning, using evolutionary algorithms.

This is meant to be an alternative from popular methods inside scikit-learn such as Grid Search and Randomized Grid Search.

Sklearn-genetic-opt uses evolutionary algorithms from the deap package to choose set of hyperparameters that optimizes (max or min) the cross validation scores, it can be used for both regression and classification problems.

Documentation is available here

Sampled distribution of hyperparameters:

https://github.com/rodrigo-arenas/Sklearn-genetic-opt/blob/master/demo/images/density.png?raw=true

Optimization progress in a regression problem:

https://github.com/rodrigo-arenas/Sklearn-genetic-opt/blob/master/demo/images/fitness.png?raw=true

Main Features:

  • GASearchCV: Principal class of the package, holds the evolutionary cross validation optimization routine

  • Algorithms: Set of different evolutionary algorithms to use as optimization procedure

  • Callbacks: Custom evaluation strategies to generate Early Stopping rules

  • Plots: Generate pre-define plots to understand the optimization process

Usage:

Install sklearn-genetic-opt

It’s advised to install sklearn-genetic using a virtual env, inside the env use:

pip install sklearn-genetic-opt

Example

from sklearn_genetic import GASearchCV
from sklearn_genetic.space import Continuous, Categorical, Integer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

data = load_digits()
n_samples = len(data.images)
X = data.images.reshape((n_samples, -1))
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

clf = RandomForestClassifier()

param_grid = {'min_weight_fraction_leaf': Continuous(0.01, 0.5, distribution='log-uniform'),
              'bootstrap': Categorical([True, False]),
              'max_depth': Integer(2, 30),
              'max_leaf_nodes': Integer(2, 35),
              'n_estimators': Integer(100, 300)}

cv = StratifiedKFold(n_splits=3, shuffle=True)

evolved_estimator = GASearchCV(estimator=clf,
                               cv=cv,
                               scoring='accuracy',
                               population_size=10,
                               generations=35,
                               param_grid=param_grid,
                               n_jobs=-1,
                               verbose=True,
                               keep_top_k=4)

# Train and optimize the estimator
evolved_estimator.fit(X_train, y_train)
# Best parameters found
print(evolved_estimator.best_params_)
# Use the model fitted with the best parameters
y_predict_ga = evolved_estimator.predict(X_test)
print(accuracy_score(y_test, y_predict_ga))

# Saved metadata for further analysis
print("Stats achieved in each generation: ", evolved_estimator.history)
print("Best k solutions: ", evolved_estimator.hof)

Results

Log controlled by verbosity

https://github.com/rodrigo-arenas/Sklearn-genetic-opt/blob/master/demo/images/log.JPG?raw=true

Changelog

See the changelog for notes on the changes of Sklearn-genetic-opt

Source code

You can check the latest development version with the command:

git clone https://github.com/rodrigo-arenas/Sklearn-genetic-opt.git

Contributing

Contributions are more than welcome! There are lots of opportunities on the on going project, so please get in touch if you would like to help out. Also check the Contribution guide

Testing

After installation, you can launch the test suite from outside the source directory:

pytest sklearn_genetic

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sklearn-genetic-opt-0.4.1.tar.gz (19.8 kB view details)

Uploaded Source

Built Distribution

sklearn_genetic_opt-0.4.1-py3-none-any.whl (23.3 kB view details)

Uploaded Python 3

File details

Details for the file sklearn-genetic-opt-0.4.1.tar.gz.

File metadata

  • Download URL: sklearn-genetic-opt-0.4.1.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.8

File hashes

Hashes for sklearn-genetic-opt-0.4.1.tar.gz
Algorithm Hash digest
SHA256 8066f8a2d51a19df8b2360ca85a4400398eb3ed4f92ca751ae10abfb52f5c251
MD5 90085b8e97c3a5e16b4561f0b69ee2a3
BLAKE2b-256 6a3df92836e0373abdf8ea65b78d1dad8a3a0888f2df27ba83fe6c6cba8cfe96

See more details on using hashes here.

File details

Details for the file sklearn_genetic_opt-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: sklearn_genetic_opt-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 23.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.8

File hashes

Hashes for sklearn_genetic_opt-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b36baf81a3cdc32b520fb736d4c3719defa736a169deb455cf2072dfc3eeb24f
MD5 57f929f4283addba27061b85b8b60fbd
BLAKE2b-256 23bba0208c150668c53958dee1423b93231c1291ecac8d098e18e1cf47eddca0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page