A hyperparameter optimization toolbox for convenient and fast prototyping
Project description
A hyperparameter optimization and meta-learning toolbox for convenient and fast prototyping of machine-/deep-learning models.
Master status: | |
Dev status: | |
Code quality: |
Overview | Installation | How to | Examples | Hyperactive API | License
Overview
- Optimize machine- or deep-learning models
- Very simple (scikit-learn inspired) API
- Choose from a variety of different optimization techniques
- High performance: Optimizer time is neglectable for most models
- Utilize advanced features to improve the optimization
Optimization Techniques | Supported Packages | Advanced Features |
Local Search: Random Methods: Markov Chain Monte Carlo: Population Methods: Sequential Methods: | Machine Learning: Deep Learning: Distribution: |
Position Initialization:
|
Installation
Hyperactive is developed and tested in python 3:
Hyperactive (stable) is available on PyPi:
pip install hyperactive
Hyperactive (development version) from Github:
git clone https://github.com/SimonBlanke/Hyperactive.git
pip install Hyperactive/
How to use Hyperactive
Choose an optimizer
Your decision to use a specific optimizer should be based on the time it takes to evaluate a model and if you already have a start point. Try to stick to the following guideline, when choosing an optimizer:
- only use local or mcmc optimizers, if you have a good start point
- random optimizers are a good way to start exploring the search space
- the majority of the iteration-time should be the evaluation-time of the model
You can choose an optimizer-class from the list provided in the API. All optimization techniques are explained in more detail here. A comparison between the iteration- and evaluation-time for different models can be seen here.
Create the search space
The search space of machine learning models is created with a dictionary, containing the model-type, hyperparameters and list of values.
search_config = {
'sklearn.neighbors.KNeighborsClassifier': {
'n_neighbors': range(1, 100),
'weights': ["uniform", "distance"],
'p': [1, 2]
}
}
The search space of deep learning models is created with a dictionary, containing the layers (with the number of the layer) and list of values. In this dictionary 'compile' and 'fit' are in 'layer' zero. The first input layer starts at 1.
search_config = {
"keras.compile.0": {"loss": ["binary_crossentropy"], "optimizer": ["adam"]},
"keras.fit.0": {"epochs": [5], "batch_size": [200], "verbose": [1]},
"keras.layers.Dense.1": {
"units": range(5, 100),
"activation": ["relu"],
"kernel_initializer": ["uniform"],
},
"keras.layers.Dense.2": {"units": [1], "activation": ["softmax"]},
}
How many iterations?
The number of iterations should be low for your first optimization to get to know the iteration-time. For the iteration-time you should take the following effects into account:
- A k-fold-crossvalidation increases evaluation-time like training on k-1 times on the training data
- If you lower cv below 1 the evaluation will deal with it like a training/validation-split, where cv marks the training data fraction. Therefore lower cv means faster evaluation.
- Some optimizers will do (and need) multiple evaluations per iteration:
- Particle-swarm-optimization
- Evoluion strategy
- Parallel Tempering
- The complexity of the machine-/deep-learning models will heavily influence the evaluation- and therefore iteration-time.
- The number of epochs should probably be kept low. You just want to compare different types of models. Retrain the best model afterwards with more epochs.
Evaluation (optional)
You can optionaly change the evaluation of the model with the 'cv' and 'metric' keyword in the optimizer class.
The 'cv' keyword-argument works like in sklearn but with the added possibility to have a value lower than 1. In this case the evaluation will be done by doing a training/validation-split in the training data. A cv of 0.75 will use 75% of the data for training and 25% for the validation of the model. As a general guideline: You should set the cv-value high (bigger than 3) if your dataset is small. This avoids wrong evaluations. On large deep-learning dataset you can set cv to 0.5 - 0.8 without risking a noisy evaluation. For large datasets you can even select a cv-value close to 0.9.
The 'metric'-keyword-argument accepts one of the metrics (provided in the API.) as a string. To know, which of those metrics work with what kind of datasets you can take a look at this notebook. In it every metric is tried out on popular datasets.
Distribution (optional)
You can start multiple optimizations in parallel by increasing the number of jobs. This can make sense if you want to increase the chance of finding the optimal solution or optimize different models at the same time.
Advanced features (optional)
The advanced features can be very useful to improve the performance of the optimizers in some situations. The 'memory' is used by default, because it saves you a lot of time.
Examples
Scikit-learn:
from sklearn.datasets import load_iris
from hyperactive import RandomSearchOptimizer
iris_data = load_iris()
X = iris_data.data
y = iris_data.target
# this defines the model and hyperparameter search space
search_config = {
'sklearn.neighbors.KNeighborsClassifier': {
'n_neighbors': range(1, 100),
'weights': ["uniform", "distance"],
'p': [1, 2]
}
}
opt = RandomSearchOptimizer(search_config, n_iter=1000, n_jobs=2, cv=3)
# search best hyperparameter for given data
opt.fit(X, y)
XGBoost:
import numpy as np
from sklearn.datasets import load_breast_cancer
from hyperactive import RandomAnnealingOptimizer
breast_cancer_data = load_breast_cancer()
X = breast_cancer_data.data
y = breast_cancer_data.target
# this defines the model and hyperparameter search space
search_config = {
"xgboost.XGBClassifier": {
"n_estimators": range(3, 50, 1),
"max_depth": range(1, 21),
"learning_rate": [1e-3, 1e-2, 1e-1, 0.5, 1.0],
"subsample": np.arange(0.1, 1.01, 0.1),
"min_child_weight": range(1, 21),
"nthread": [1],
}
}
opt = RandomAnnealingOptimizer(search_config, n_iter=100, n_jobs=4, cv=3)
# search best hyperparameter for given data
opt.fit(X, y)
LightGBM:
import numpy as np
from sklearn.datasets import load_breast_cancer
from hyperactive import RandomSearchOptimizer
breast_cancer_data = load_breast_cancer()
X = breast_cancer_data.data
y = breast_cancer_data.target
# this defines the model and hyperparameter search space
search_config = {
"lightgbm.LGBMClassifier": {
"boosting_type": ["gbdt"],
"num_leaves": range(2, 20),
"learning_rate": np.arange(0.01, 0.1, 0.01),
"feature_fraction": np.arange(0.1, 0.95, 0.1),
"bagging_fraction": np.arange(0.1, 0.95, 0.1),
"bagging_freq": range(2, 10, 1),
}
}
opt = RandomSearchOptimizer(search_config, n_iter=10, n_jobs=4, cv=3)
# search best hyperparameter for given data
opt.fit(X, y)
CatBoost:
import numpy as np
from sklearn.datasets import load_breast_cancer
from hyperactive import RandomSearchOptimizer
breast_cancer_data = load_breast_cancer()
X = breast_cancer_data.data
y = breast_cancer_data.target
# this defines the model and hyperparameter search space
search_config = {
"catboost.CatBoostClassifier": {
"iterations": [3],
"learning_rate": np.arange(0.01, 0.1, 0.01),
"depth": range(2, 20),
"verbose": [0],
"thread_count": [1],
}
}
opt = RandomSearchOptimizer(search_config, n_iter=10, n_jobs=4, cv=3)
# search best hyperparameter for given data
opt.fit(X, y)
Keras:
import numpy as np
from keras.datasets import mnist
from keras.utils import to_categorical
from hyperactive import RandomSearchOptimizer
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(60000, 28, 28, 1)
X_test = X_test.reshape(10000, 28, 28, 1)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
# this defines the structure of the model and the search space in each layer
search_config = {
"keras.compile.0": {"loss": ["categorical_crossentropy"], "optimizer": ["adam"]},
"keras.fit.0": {"epochs": [10], "batch_size": [500], "verbose": [2]},
"keras.layers.Conv2D.1": {
"filters": [32, 64, 128],
"kernel_size": range(3, 4),
"activation": ["relu"],
"input_shape": [(28, 28, 1)],
},
"keras.layers.MaxPooling2D.2": {"pool_size": [(2, 2)]},
"keras.layers.Conv2D.3": {
"filters": [16, 32, 64],
"kernel_size": [3],
"activation": ["relu"],
},
"keras.layers.MaxPooling2D.4": {"pool_size": [(2, 2)]},
"keras.layers.Flatten.5": {},
"keras.layers.Dense.6": {"units": range(30, 200, 10), "activation": ["softmax"]},
"keras.layers.Dropout.7": {"rate": list(np.arange(0.4, 0.8, 0.1))},
"keras.layers.Dense.8": {"units": [10], "activation": ["softmax"]},
}
Optimizer = RandomSearchOptimizer(search_config, n_iter=10)
# search best hyperparameter for given data
Optimizer.fit(X_train, y_train)
Hyperactive API
Classes:
HillClimbingOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=3, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False, eps=1, r=1e-6)
StochasticHillClimbingOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=3, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False)
TabuOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=3, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False, eps=1, tabu_memory=[3, 6, 9])
RandomSearchOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=3, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False)
RandomRestartHillClimbingOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=3, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False, n_restarts=10)
RandomAnnealingOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=3, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False, eps=100, t_rate=0.98)
SimulatedAnnealingOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=3, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False, eps=1, t_rate=0.98)
StochasticTunnelingOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=3, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False, eps=1, t_rate=0.98, n_neighbours=1, gamma=1)
ParallelTemperingOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=3, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False, eps=1, t_rate=0.98, n_neighbours=1, system_temps=[0.1, 0.2, 0.01], n_swaps=10)
ParticleSwarmOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=3, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False, n_part=4, w=0.5, c_k=0.5, c_s=0.9)
EvolutionStrategyOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=3, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False, individuals=10, mutation_rate=0.7, crossover_rate=0.3)
BayesianOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=3, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False)
General positional argument:
Argument | Type | Description |
---|---|---|
search_config | dict | hyperparameter search space to explore by the optimizer |
n_iter | int | number of iterations to perform |
General keyword arguments:
Argument | Type | Default | Description |
---|---|---|---|
metric | str | "accuracy" | metric for model evaluation |
n_jobs | int | 1 | number of jobs to run in parallel (-1 for maximum) |
cv | int | 3 | if cv > 1: cross-validation / if cv < 1: train/validation split, where cv-float marks the relative size of the train data |
verbosity | int | 1 | Shows model and metric information |
random_state | int | None | The seed for random number generator |
warm_start | dict | None | Hyperparameter configuration to start from |
memory | bool | True | Stores explored evaluations in a dictionary to save computing time |
scatter_init | int | False | Chooses better initial position by training on multiple random positions with smaller training dataset (split into int subsets) |
Specific keyword arguments:
Hill Climbing
Argument | Type | Default | Description |
---|---|---|---|
eps | int | 1 | epsilon |
Stochastic Hill Climbing
Argument | Type | Default | Description |
---|---|---|---|
eps | int | 1 | epsilon |
r | float | 1e-6 | acceptance factor |
Tabu Search
Argument | Type | Default | Description |
---|---|---|---|
eps | int | 1 | epsilon |
tabu_memory | list | [3, 6, 9] | length of short/mid/long-term memory |
Random Restart Hill Climbing
Argument | Type | Default | Description |
---|---|---|---|
eps | int | 1 | epsilon |
n_restarts | int | 10 | number of restarts |
Random Annealing
Argument | Type | Default | Description |
---|---|---|---|
eps | int | 100 | epsilon |
t_rate | float | 0.98 | cooling rate |
Simulated Annealing
Argument | Type | Default | Description |
---|---|---|---|
eps | int | 1 | epsilon |
t_rate | float | 0.98 | cooling rate |
Stochastic Tunneling
Argument | Type | Default | Description |
---|---|---|---|
eps | int | 1 | epsilon |
t_rate | float | 0.98 | cooling rate |
gamma | float | 1 | tunneling factor |
Parallel Tempering
Argument | Type | Default | Description |
---|---|---|---|
eps | int | 1 | epsilon |
t_rate | float | 0.98 | cooling rate |
system_temps | list | [0.1, 0.2, 0.01] | initial temperatures (number of elements defines number of systems) |
n_swaps | int | 10 | number of swaps |
Particle Swarm Optimization
Argument | Type | Default | Description |
---|---|---|---|
n_part | int | 1 | number of particles |
w | float | 0.5 | intertia factor |
c_k | float | 0.8 | cognitive factor |
c_s | float | 0.9 | social factor |
Evolution Strategy
Argument | Type | Default | Description |
---|---|---|---|
individuals | int | 10 | number of individuals |
mutation_rate | float | 0.7 | mutation rate |
crossover_rate | float | 0.3 | crossover rate |
Bayesian Optimization
Argument | Type | Default | Description |
---|---|---|---|
kernel | class | Matern | Kernel used for the gaussian process |
General methods:
fit(self, X_train, y_train)
Argument | Type | Description |
---|---|---|
X_train | array-like | training input features |
y_train | array-like | training target |
predict(self, X_test)
Argument | Type | Description |
---|---|---|
X_test | array-like | testing input features |
score(self, X_test, y_test)
Argument | Type | Description |
---|---|---|
X_test | array-like | testing input features |
y_test | array-like | true values |
export(self, filename)
Argument | Type | Description |
---|---|---|
filename | str | file name and path for model export |
Available Metrics:
Machine Learning
Scores | Losses |
---|---|
accuracy_score | brier_score_loss |
balanced_accuracy_score | log_loss |
average_precision_score | max_error |
f1_score | mean_absolute_error |
recall_score | mean_squared_error |
jaccard_score | mean_squared_log_error |
roc_auc_score | median_absolute_error |
explained_variance_score |
Deep Learning
Scores | Losses |
---|---|
accuracy | mean_squared_error |
binary_accuracy | mean_absolute_error |
categorical_accuracy | mean_absolute_percentage_error |
sparse_categorical_accuracy | mean_squared_logarithmic_error |
top_k_categorical_accuracy | squared_hinge |
sparse_top_k_categorical_accuracy | hinge |
categorical_hinge | |
logcosh | |
categorical_crossentropy | |
sparse_categorical_crossentropy | |
binary_crossentropy | |
kullback_leibler_divergence | |
poisson | |
cosine_proximity |
References
Proxy Datasets for Training Convolutional Neural Networks
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file hyperactive-0.4.1.7-py3-none-any.whl
.
File metadata
- Download URL: hyperactive-0.4.1.7-py3-none-any.whl
- Upload date:
- Size: 64.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63a26a98e757e954def0c40354528dcf3dfca1e104d0c32e404481deaade7b74 |
|
MD5 | d8be2b4c81f89812ab092dd5613fb5e9 |
|
BLAKE2b-256 | ba2ae212f6ee28f6b131f756fa606eb8acd9ddbbe649da360ef8bea1eebb9d96 |