A hyperparameter optimization toolbox for convenient and fast prototyping
Project description
Hyperactive
A hyperparameter optimization toolbox for convenient and fast prototyping
Overview | Performance | Installation | Examples | Advanced Features | Hyperactive API | License
Overview
- Optimize hyperparameters of machine- or deep-learning models, using a simple API.
- Choose from a variety of different optimization techniques to improve your model.
- Never lose progress of previous optimizations: Just pass one or more models as start points and continue optimizing.
- Use transfer learning during the optimization process to build a more accurate model, while saving training and optimization time.
- Utilize multiprocessing for machine learning or your gpu for deep learning models.
Optimization Techniques | Supported Packages |
Local Search:
|
Machine Learning:
|
Performance
The bar chart below shows, that the optimization process itself represents only a small fraction (<0.6%) of the computation time. The 'No Opt'-bar shows the training time of a default Gradient-Boosting-Classifier normalized to 1. The other bars show the computation time relative to 'No Opt'. Each optimizer did 30 runs of 300 iterations, to get a good statistic.
Installation
Hyperactive is developed and tested in python 3:
Hyperactive is available on PyPi:
pip install hyperactive
Examples
Basic sklearn example:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from hyperactive import SimulatedAnnealingOptimizer
iris_data = load_iris()
X = iris_data.data
y = iris_data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# this defines the model and hyperparameter search space
search_config = {
"sklearn.ensemble.RandomForestClassifier": {
"n_estimators": range(10, 100, 10),
"max_depth": [3, 4, 5, 6],
"criterion": ["gini", "entropy"],
"min_samples_split": range(2, 21),
"min_samples_leaf": range(2, 21),
}
}
Optimizer = SimulatedAnnealingOptimizer(search_config, n_iter=100, n_jobs=4)
# search best hyperparameter for given data
Optimizer.fit(X_train, y_train)
# predict from test data
prediction = Optimizer.predict(X_test)
# calculate accuracy score
score = Optimizer.score(X_test, y_test)
Example with a multi-layer-perceptron in keras:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from hyperactive import ParticleSwarmOptimizer
breast_cancer_data = load_breast_cancer()
X = breast_cancer_data.data
y = breast_cancer_data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# this defines the structure of the model and the search space in each layer
search_config = {
"keras.compile.0": {"loss": ["binary_crossentropy"], "optimizer": ["adam"]},
"keras.fit.0": {"epochs": [3], "batch_size": [100], "verbose": [0]},
"keras.layers.Dense.1": {
"units": range(5, 15),
"activation": ["relu"],
"kernel_initializer": ["uniform"],
},
"keras.layers.Dense.2": {
"units": range(5, 15),
"activation": ["relu"],
"kernel_initializer": ["uniform"],
},
"keras.layers.Dense.3": {"units": [1], "activation": ["sigmoid"]},
}
Optimizer = ParticleSwarmOptimizer(
search_config, n_iter=3, metric=["mean_absolute_error"], verbosity=0
)
# search best hyperparameter for given data
Optimizer.fit(X_train, y_train)
Example with a convolutional neural network in keras:
import numpy as np
from keras.datasets import mnist
from keras.utils import to_categorical
from hyperactive import RandomSearchOptimizer
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(60000, 28, 28, 1)
X_test = X_test.reshape(10000, 28, 28, 1)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
# this defines the structure of the model and the search space in each layer
search_config = {
"keras.compile.0": {"loss": ["categorical_crossentropy"], "optimizer": ["adam"]},
"keras.fit.0": {"epochs": [20], "batch_size": [500], "verbose": [2]},
"keras.layers.Conv2D.1": {
"filters": [32, 64, 128],
"kernel_size": range(3, 4),
"activation": ["relu"],
"input_shape": [(28, 28, 1)],
},
"keras.layers.MaxPooling2D.2": {"pool_size": [(2, 2)]},
"keras.layers.Conv2D.3": {
"filters": [16, 32, 64],
"kernel_size": [3],
"activation": ["relu"],
},
"keras.layers.MaxPooling2D.4": {"pool_size": [(2, 2)]},
"keras.layers.Flatten.5": {},
"keras.layers.Dense.6": {"units": range(30, 200, 10), "activation": ["softmax"]},
"keras.layers.Dropout.7": {"rate": list(np.arange(0.4, 0.8, 0.1))},
"keras.layers.Dense.8": {"units": [10], "activation": ["softmax"]},
}
Optimizer = RandomSearchOptimizer(search_config, n_iter=20)
# search best hyperparameter for given data
Optimizer.fit(X_train, y_train)
# predict from test data
prediction = Optimizer.predict(X_test)
# calculate accuracy score
score = Optimizer.score(X_test, y_test)
Advanced Features
The features listed below can be activated during the instantiation of the optimizer (see API) and works with every optimizer in the hyperactive package.
Memory
After the evaluation of a model the position (in the hyperparameter search dictionary) and the cross-validation score are written to a dictionary. If the optimizer tries to evaluate this position again it can quickly lookup if a score for this position is present and use it instead of going through the extensive training and prediction process.
Scatter-Initialization
This technique was inspired by the 'Hyperband Optimization' and aims to find a good initial position for the optimization. It does so by evaluating n random positions with a training subset of 1/n the size of the original dataset. The position that achieves the best score is used as the starting position for the optimization.
Multiprocessing
The multiprocessing in hyperactive works by creating additional searches, that run in parallel without any shared memory. This provides the possibility of hyperparameter-tuning of different models at the same time. If one single model should be tuned as fast as possible n_jobs in the optimizer should be set to '1', while n_jobs (of the model) in the search_config should be set to '-1'.
Two searches with eight cpu-cores:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from hyperactive import RandomSearchOptimizer
iris_data = load_iris()
X = iris_data.data
y = iris_data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# this defines the model and hyperparameter search space
search_config = {
"sklearn.ensemble.RandomForestClassifier": {
"n_estimators": range(10, 100, 10),
"max_depth": [3, 4, 5, 6],
"criterion": ["gini", "entropy"],
"n_jobs": [4],
},
"sklearn.ensemble.GradientBoostingClassifier": {
"n_estimators": range(10, 100, 10),
"max_depth": range(1, 11),
"min_samples_split": range(2, 21),
"n_jobs": [4],
},
}
Optimizer = RandomSearchOptimizer(search_config, n_iter=300, n_jobs=2, verbosity=0)
# search best hyperparameter for given data
Optimizer.fit(X_train, y_train)
One search with all cpu-cores:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from hyperactive import RandomSearchOptimizer
iris_data = load_iris()
X = iris_data.data
y = iris_data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# this defines the model and hyperparameter search space
search_config = {
"sklearn.ensemble.RandomForestClassifier": {
"n_estimators": range(10, 100, 10),
"max_depth": [3, 4, 5, 6],
"criterion": ["gini", "entropy"],
"n_jobs": [-1],
},
}
Optimizer = RandomSearchOptimizer(search_config, n_iter=300, n_jobs=1, verbosity=0)
# search best hyperparameter for given data
Optimizer.fit(X_train, y_train)
Multiple searches with all cpu-cores:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from hyperactive import RandomSearchOptimizer
iris_data = load_iris()
X = iris_data.data
y = iris_data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# this defines the model and hyperparameter search space
search_config = {
"sklearn.ensemble.RandomForestClassifier": {
"n_estimators": range(10, 100, 10),
"max_depth": [3, 4, 5, 6],
"criterion": ["gini", "entropy"],
"min_samples_split": range(2, 21),
"min_samples_leaf": range(2, 21),
},
"sklearn.neighbors.KNeighborsClassifier": {
"n_neighbors": range(1, 10),
"weights": ["uniform", "distance"],
"p": [1, 2],
},
"sklearn.ensemble.GradientBoostingClassifier": {
"n_estimators": range(10, 100, 10),
"learning_rate": [1e-3, 1e-2, 1e-1, 0.5, 1.0],
"max_depth": range(1, 11),
"min_samples_split": range(2, 21),
"min_samples_leaf": range(1, 21),
"subsample": np.arange(0.05, 1.01, 0.05),
"max_features": np.arange(0.05, 1.01, 0.05),
},
"sklearn.tree.DecisionTreeClassifier": {
"criterion": ["gini", "entropy"],
"max_depth": range(1, 11),
"min_samples_split": range(2, 21),
"min_samples_leaf": range(1, 21),
},
}
Optimizer = RandomSearchOptimizer(search_config, n_iter=300, n_jobs=-1, verbosity=0)
Transfer-Learning
In the current implementation transfer-learning works by using a predefined model (with optional pretrained weights) provided by the keras package. The import path can be inserted as a layer (with its parameters in an sub-dictionary), like in a regular search dictionary. The following snippet provides an example:
Transfer-learning example:
from keras.datasets import cifar10
from keras.utils import to_categorical
from hyperactive import SimulatedAnnealingOptimizer
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
# this defines the structure of the model and the search space in each layer
search_config = {
"keras.compile.0": {"loss": ["binary_crossentropy"], "optimizer": ["adam"]},
"keras.fit.0": {"epochs": [1], "batch_size": [300], "verbose": [0]},
# just add the pretrained model as a layer like this:
"keras.applications.MobileNet.1": {
"weights": ["imagenet"],
"input_shape": [(32, 32, 3)],
"include_top": [False],
},
"keras.layers.Flatten.2": {},
"keras.layers.Dense.3": {
"units": range(5, 15),
"activation": ["relu"],
"kernel_initializer": ["uniform"],
},
"keras.layers.Dense.4": {"units": [10], "activation": ["sigmoid"]},
}
Optimizer = SimulatedAnnealingOptimizer(
search_config, n_iter=3, warm_start=False, verbosity=0
)
Warm-Start
When a search is finished the warm-start-dictionary for the best position in the hyperparameter search space (and its metric) is printed in the command line (at verbosity=1). If multiple searches ran in parallel the warm-start-dictionaries are sorted by the best metric in decreasing order. If the start position in the warm-start-dictionary is not within the search space defined in the search_config an error will occure.
Warm-start example for sklearn model:
start_point = {
"sklearn.ensemble.RandomForestClassifier.0": {
"n_estimators": [30],
"max_depth": [6],
"criterion": ["entropy"],
"min_samples_split": [12],
"min_samples_leaf": [16],
},
"sklearn.ensemble.RandomForestClassifier.1": {
"n_estimators": [50],
"max_depth": [3],
"criterion": ["entropy"],
},
}
Warm-start example for keras model (cnn):
start_point = {
"keras.compile.0": {"loss": ["categorical_crossentropy"], "optimizer": ["adam"]},
"keras.fit.0": {"epochs": [3], "batch_size": [500], "verbose": [0]},
"keras.layers.Conv2D.1": {
"filters": [64],
"kernel_size": [3],
"activation": ["relu"],
"input_shape": [(28, 28, 1)],
},
"keras.layers.MaxPooling2D.2": {"pool_size": [(2, 2)]},
"keras.layers.Conv2D.3": {
"filters": [32],
"kernel_size": [3],
"activation": ["relu"],
"input_shape": [(28, 28, 1)],
},
"keras.layers.MaxPooling2D.4": {"pool_size": [(2, 2)]},
"keras.layers.Conv2D.5": {
"filters": [32],
"kernel_size": [3],
"activation": ["relu"],
"input_shape": [(28, 28, 1)],
},
"keras.layers.MaxPooling2D.6": {"pool_size": [(2, 2)]},
"keras.layers.Flatten.7": {},
"keras.layers.Dense.8": {"units": [50], "activation": ["softmax"]},
"keras.layers.Dropout.9": {"rate": [0.4]},
"keras.layers.Dense.10": {"units": [10], "activation": ["softmax"]},
}
Hyperactive API
Classes:
HillClimbingOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=5, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False, eps=1, r=1e-6)
StochasticHillClimbingOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=5, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False)
TabuOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=5, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False, eps=1, tabu_memory=[3, 6, 9])
RandomSearchOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=5, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False)
RandomRestartHillClimbingOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=5, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False, n_restarts=10)
RandomAnnealingOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=5, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False, eps=100, t_rate=0.98)
SimulatedAnnealingOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=5, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False, eps=1, t_rate=0.98)
StochasticTunnelingOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=5, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False, eps=1, t_rate=0.98, n_neighbours=1, gamma=1)
ParallelTemperingOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=5, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False, eps=1, t_rate=0.98, n_neighbours=1, system_temps=[0.1, 0.2, 0.01], n_swaps=10)
ParticleSwarmOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=5, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False, n_part=4, w=0.5, c_k=0.5, c_s=0.9)
EvolutionStrategyOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=5, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False, individuals=10, mutation_rate=0.7, crossover_rate=0.3)
BayesianOptimizer(search_config, n_iter, metric="accuracy", n_jobs=1, cv=5, verbosity=1, random_state=None, warm_start=False, memory=True, scatter_init=False)
General positional argument:
Argument | Type | Description |
---|---|---|
search_config | dict | hyperparameter search space to explore by the optimizer |
n_iter | int | number of iterations to perform |
General keyword arguments:
Argument | Type | Default | Description |
---|---|---|---|
metric | str | "accuracy" | metric for model evaluation |
n_jobs | int | 1 | number of jobs to run in parallel (-1 for maximum) |
cv | int | 5 | cross-validation |
verbosity | int | 1 | Shows model and metric information |
random_state | int | None | The seed for random number generator |
warm_start | dict | None | Hyperparameter configuration to start from |
memory | bool | True | Stores explored evaluations in a dictionary to save computing time |
scatter_init | int | False | Chooses better initial position by training on multiple random positions with smaller training dataset (split into int subsets) |
Specific keyword arguments:
Hill Climbing
Argument | Type | Default | Description |
---|---|---|---|
eps | int | 1 | epsilon |
Stochastic Hill Climbing
Argument | Type | Default | Description |
---|---|---|---|
eps | int | 1 | epsilon |
r | float | 1e-6 | acceptance factor |
Tabu Search
Argument | Type | Default | Description |
---|---|---|---|
eps | int | 1 | epsilon |
tabu_memory | list | [3, 6, 9] | length of short/mid/long-term memory |
Random Restart Hill Climbing
Argument | Type | Default | Description |
---|---|---|---|
eps | int | 1 | epsilon |
n_restarts | int | 10 | number of restarts |
Random Annealing
Argument | Type | Default | Description |
---|---|---|---|
eps | int | 100 | epsilon |
t_rate | float | 0.98 | cooling rate |
Simulated Annealing
Argument | Type | Default | Description |
---|---|---|---|
eps | int | 1 | epsilon |
t_rate | float | 0.98 | cooling rate |
Stochastic Tunneling
Argument | Type | Default | Description |
---|---|---|---|
eps | int | 1 | epsilon |
t_rate | float | 0.98 | cooling rate |
gamma | float | 1 | tunneling factor |
Parallel Tempering
Argument | Type | Default | Description |
---|---|---|---|
eps | int | 1 | epsilon |
t_rate | float | 0.98 | cooling rate |
system_temps | list | [0.1, 0.2, 0.01] | initial temperatures (number of elements defines number of systems) |
n_swaps | int | 10 | number of swaps |
Particle Swarm Optimization
Argument | Type | Default | Description |
---|---|---|---|
n_part | int | 1 | number of particles |
w | float | 0.5 | intertia factor |
c_k | float | 0.8 | cognitive factor |
c_s | float | 0.9 | social factor |
Evolution Strategy
Argument | Type | Default | Description |
---|---|---|---|
individuals | int | 10 | number of individuals |
mutation_rate | float | 0.7 | mutation rate |
crossover_rate | float | 0.3 | crossover rate |
Bayesian Optimization
Argument | Type | Default | Description |
---|---|---|---|
kernel | class | Matern | Kernel used for the gaussian process |
General methods:
fit(self, X_train, y_train)
Argument | Type | Description |
---|---|---|
X_train | array-like | training input features |
y_train | array-like | training target |
predict(self, X_test)
Argument | Type | Description |
---|---|---|
X_test | array-like | testing input features |
score(self, X_test, y_test)
Argument | Type | Description |
---|---|---|
X_test | array-like | testing input features |
y_test | array-like | true values |
export(self, filename)
Argument | Type | Description |
---|---|---|
filename | str | file name and path for model export |
Available Metrics:
Scikit-learn
Machine Learning Scores | Machine Learning Losses |
---|---|
accuracy_score | brier_score_loss |
balanced_accuracy_score | log_loss |
average_precision_score | max_error |
f1_score | mean_absolute_error |
recall_score | mean_squared_error |
jaccard_score | mean_squared_log_error |
roc_auc_score | median_absolute_error |
explained_variance_score |
Keras
Deep Learning Scores | Deep Learning Losses |
---|---|
accuracy | mean_squared_error |
binary_accuracy | mean_absolute_error |
categorical_accuracy | mean_absolute_percentage_error |
sparse_categorical_accuracy | mean_squared_logarithmic_error |
top_k_categorical_accuracy | squared_hinge |
sparse_top_k_categorical_accuracy | hinge |
categorical_hinge | |
logcosh | |
categorical_crossentropy | |
sparse_categorical_crossentropy | |
binary_crossentropy | |
kullback_leibler_divergence | |
poisson | |
cosine_proximity |
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file hyperactive-0.4.1.2-py3-none-any.whl
.
File metadata
- Download URL: hyperactive-0.4.1.2-py3-none-any.whl
- Upload date:
- Size: 50.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e551fad06a1e53ae1c0fd564813774dc3630a6e39093c43cadc7d7d1341d48af |
|
MD5 | 9c6e602fefa5646a69c6d4ab4c823eaf |
|
BLAKE2b-256 | 93c5633b7cf58af1c1caaa734a80180b6bfcc0a44e0efbf788f10bd40a538df1 |